Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386302439> ?p ?o ?g. }
Showing items 1 to 61 of
61
with 100 items per page.
- W4386302439 abstract "Automated Audio Captioning (AAC) aims to develop systems capable of describing an audio recording using a textual sentence. In contrast, Audio-Text Retrieval (ATR) systems seek to find the best matching audio recording(s) for a given textual query (Text-to-Audio) or vice versa (Audio-to-Text). These tasks require different types of systems: AAC employs a sequence-to-sequence model, while ATR utilizes a ranking model that compares audio and text representations within a shared projection subspace. However, this work investigates the relationship between AAC and ATR by exploring the ATR capabilities of an unmodified AAC system, without fine-tuning for the new task. Our AAC system consists of an audio encoder (ConvNeXt-Tiny) trained on AudioSet for audio tagging, and a transformer decoder responsible for generating sentences. For AAC, it achieves a high SPIDEr-FL score of 0.298 on Clotho and 0.472 on AudioCaps on average. For ATR, we propose using the standard Cross-Entropy loss values obtained for any audio/caption pair. Experimental results on the Clotho and AudioCaps datasets demonstrate decent recall values using this simple approach. For instance, we obtained a Text-to-Audio R@1 value of 0.382 for Au-dioCaps, which is above the current state-of-the-art method without external data. Interestingly, we observe that normalizing the loss values was necessary for Audio-to-Text retrieval." @default.
- W4386302439 created "2023-08-31" @default.
- W4386302439 creator A5018248112 @default.
- W4386302439 creator A5032338621 @default.
- W4386302439 creator A5088037163 @default.
- W4386302439 date "2023-08-29" @default.
- W4386302439 modified "2023-09-27" @default.
- W4386302439 title "Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?" @default.
- W4386302439 doi "https://doi.org/10.48550/arxiv.2308.15090" @default.
- W4386302439 hasPublicationYear "2023" @default.
- W4386302439 type Work @default.
- W4386302439 citedByCount "0" @default.
- W4386302439 crossrefType "posted-content" @default.
- W4386302439 hasAuthorship W4386302439A5018248112 @default.
- W4386302439 hasAuthorship W4386302439A5032338621 @default.
- W4386302439 hasAuthorship W4386302439A5088037163 @default.
- W4386302439 hasBestOaLocation W43863024391 @default.
- W4386302439 hasConcept C111919701 @default.
- W4386302439 hasConcept C115961682 @default.
- W4386302439 hasConcept C118505674 @default.
- W4386302439 hasConcept C121332964 @default.
- W4386302439 hasConcept C154945302 @default.
- W4386302439 hasConcept C157657479 @default.
- W4386302439 hasConcept C165801399 @default.
- W4386302439 hasConcept C204321447 @default.
- W4386302439 hasConcept C2777530160 @default.
- W4386302439 hasConcept C28490314 @default.
- W4386302439 hasConcept C32834561 @default.
- W4386302439 hasConcept C41008148 @default.
- W4386302439 hasConcept C62520636 @default.
- W4386302439 hasConcept C66322947 @default.
- W4386302439 hasConceptScore W4386302439C111919701 @default.
- W4386302439 hasConceptScore W4386302439C115961682 @default.
- W4386302439 hasConceptScore W4386302439C118505674 @default.
- W4386302439 hasConceptScore W4386302439C121332964 @default.
- W4386302439 hasConceptScore W4386302439C154945302 @default.
- W4386302439 hasConceptScore W4386302439C157657479 @default.
- W4386302439 hasConceptScore W4386302439C165801399 @default.
- W4386302439 hasConceptScore W4386302439C204321447 @default.
- W4386302439 hasConceptScore W4386302439C2777530160 @default.
- W4386302439 hasConceptScore W4386302439C28490314 @default.
- W4386302439 hasConceptScore W4386302439C32834561 @default.
- W4386302439 hasConceptScore W4386302439C41008148 @default.
- W4386302439 hasConceptScore W4386302439C62520636 @default.
- W4386302439 hasConceptScore W4386302439C66322947 @default.
- W4386302439 hasLocation W43863024391 @default.
- W4386302439 hasOpenAccess W4386302439 @default.
- W4386302439 hasPrimaryLocation W43863024391 @default.
- W4386302439 hasRelatedWork W2547835662 @default.
- W4386302439 hasRelatedWork W3025136821 @default.
- W4386302439 hasRelatedWork W3035237998 @default.
- W4386302439 hasRelatedWork W4224046780 @default.
- W4386302439 hasRelatedWork W4281560470 @default.
- W4386302439 hasRelatedWork W4312545247 @default.
- W4386302439 hasRelatedWork W4312845724 @default.
- W4386302439 hasRelatedWork W4364297074 @default.
- W4386302439 hasRelatedWork W4384210086 @default.
- W4386302439 hasRelatedWork W4385606240 @default.
- W4386302439 isParatext "false" @default.
- W4386302439 isRetracted "false" @default.
- W4386302439 workType "article" @default.