Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386876114> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W4386876114 abstract "We present RECAP (REtrieval-Augmented Audio CAPtioning), a novel and effective audio captioning system that generates captions conditioned on an input audio and other captions similar to the audio retrieved from a datastore. Additionally, our proposed method can transfer to any domain without the need for any additional fine-tuning. To generate a caption for an audio sample, we leverage an audio-text model CLAP to retrieve captions similar to it from a replaceable datastore, which are then used to construct a prompt. Next, we feed this prompt to a GPT-2 decoder and introduce cross-attention layers between the CLAP encoder and GPT-2 to condition the audio for caption generation. Experiments on two benchmark datasets, Clotho and AudioCaps, show that RECAP achieves competitive performance in in-domain settings and significant improvements in out-of-domain settings. Additionally, due to its capability to exploit a large text-captions-only datastore in a textit{training-free} fashion, RECAP shows unique capabilities of captioning novel audio events never seen during training and compositional audios with multiple events. To promote research in this space, we also release 150,000+ new weakly labeled captions for AudioSet, AudioCaps, and Clotho." @default.
- W4386876114 created "2023-09-20" @default.
- W4386876114 creator A5013222310 @default.
- W4386876114 creator A5033408639 @default.
- W4386876114 creator A5042883150 @default.
- W4386876114 creator A5087811514 @default.
- W4386876114 creator A5092810058 @default.
- W4386876114 date "2023-09-18" @default.
- W4386876114 modified "2023-10-18" @default.
- W4386876114 title "RECAP: Retrieval-Augmented Audio Captioning" @default.
- W4386876114 doi "https://doi.org/10.48550/arxiv.2309.09836" @default.
- W4386876114 hasPublicationYear "2023" @default.
- W4386876114 type Work @default.
- W4386876114 citedByCount "0" @default.
- W4386876114 crossrefType "posted-content" @default.
- W4386876114 hasAuthorship W4386876114A5013222310 @default.
- W4386876114 hasAuthorship W4386876114A5033408639 @default.
- W4386876114 hasAuthorship W4386876114A5042883150 @default.
- W4386876114 hasAuthorship W4386876114A5087811514 @default.
- W4386876114 hasAuthorship W4386876114A5092810058 @default.
- W4386876114 hasBestOaLocation W43868761141 @default.
- W4386876114 hasConcept C111919701 @default.
- W4386876114 hasConcept C115961682 @default.
- W4386876114 hasConcept C118505674 @default.
- W4386876114 hasConcept C13280743 @default.
- W4386876114 hasConcept C134306372 @default.
- W4386876114 hasConcept C153083717 @default.
- W4386876114 hasConcept C154945302 @default.
- W4386876114 hasConcept C157657479 @default.
- W4386876114 hasConcept C165696696 @default.
- W4386876114 hasConcept C185798385 @default.
- W4386876114 hasConcept C204321447 @default.
- W4386876114 hasConcept C205649164 @default.
- W4386876114 hasConcept C28490314 @default.
- W4386876114 hasConcept C33923547 @default.
- W4386876114 hasConcept C36503486 @default.
- W4386876114 hasConcept C38652104 @default.
- W4386876114 hasConcept C41008148 @default.
- W4386876114 hasConceptScore W4386876114C111919701 @default.
- W4386876114 hasConceptScore W4386876114C115961682 @default.
- W4386876114 hasConceptScore W4386876114C118505674 @default.
- W4386876114 hasConceptScore W4386876114C13280743 @default.
- W4386876114 hasConceptScore W4386876114C134306372 @default.
- W4386876114 hasConceptScore W4386876114C153083717 @default.
- W4386876114 hasConceptScore W4386876114C154945302 @default.
- W4386876114 hasConceptScore W4386876114C157657479 @default.
- W4386876114 hasConceptScore W4386876114C165696696 @default.
- W4386876114 hasConceptScore W4386876114C185798385 @default.
- W4386876114 hasConceptScore W4386876114C204321447 @default.
- W4386876114 hasConceptScore W4386876114C205649164 @default.
- W4386876114 hasConceptScore W4386876114C28490314 @default.
- W4386876114 hasConceptScore W4386876114C33923547 @default.
- W4386876114 hasConceptScore W4386876114C36503486 @default.
- W4386876114 hasConceptScore W4386876114C38652104 @default.
- W4386876114 hasConceptScore W4386876114C41008148 @default.
- W4386876114 hasLocation W43868761141 @default.
- W4386876114 hasOpenAccess W4386876114 @default.
- W4386876114 hasPrimaryLocation W43868761141 @default.
- W4386876114 hasRelatedWork W2547835662 @default.
- W4386876114 hasRelatedWork W2611862713 @default.
- W4386876114 hasRelatedWork W2905654560 @default.
- W4386876114 hasRelatedWork W2923366293 @default.
- W4386876114 hasRelatedWork W2963992143 @default.
- W4386876114 hasRelatedWork W3008515501 @default.
- W4386876114 hasRelatedWork W3183824823 @default.
- W4386876114 hasRelatedWork W4307856881 @default.
- W4386876114 hasRelatedWork W4320016117 @default.
- W4386876114 hasRelatedWork W4386072117 @default.
- W4386876114 isParatext "false" @default.
- W4386876114 isRetracted "false" @default.
- W4386876114 workType "article" @default.