Matches in SemOpenAlex for { <https://semopenalex.org/work/W4361864998> ?p ?o ?g. }
Showing items 1 to 81 of
81
with 100 items per page.
- W4361864998 abstract "The advancement of audio-language (AL) multimodal learning tasks has been significant in recent years. However, researchers face challenges due to the costly and time-consuming collection process of existing audio-language datasets, which are limited in size. To address this data scarcity issue, we introduce WavCaps, the first large-scale weakly-labelled audio captioning dataset, comprising approximately 400k audio clips with paired captions. We sourced audio clips and their raw descriptions from web sources and a sound event detection dataset. However, the online-harvested raw descriptions are highly noisy and unsuitable for direct use in tasks such as automated audio captioning. To overcome this issue, we propose a three-stage processing pipeline for filtering noisy data and generating high-quality captions, where ChatGPT, a large language model, is leveraged to filter and transform raw descriptions automatically. We conduct a comprehensive analysis of the characteristics of WavCaps dataset and evaluate it on multiple downstream audio-language multimodal learning tasks. The systems trained on WavCaps outperform previous state-of-the-art (SOTA) models by a significant margin. Our aspiration is for the WavCaps dataset we have proposed to facilitate research in audio-language multimodal learning and demonstrate the potential of utilizing ChatGPT to enhance academic research. Our dataset and codes are available at https://github.com/XinhaoMei/WavCaps." @default.
- W4361864998 created "2023-04-05" @default.
- W4361864998 creator A5002795838 @default.
- W4361864998 creator A5017907143 @default.
- W4361864998 creator A5037691180 @default.
- W4361864998 creator A5038062913 @default.
- W4361864998 creator A5066967599 @default.
- W4361864998 creator A5069433018 @default.
- W4361864998 creator A5070892237 @default.
- W4361864998 creator A5072482416 @default.
- W4361864998 creator A5089924305 @default.
- W4361864998 date "2023-03-30" @default.
- W4361864998 modified "2023-09-29" @default.
- W4361864998 title "WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research" @default.
- W4361864998 doi "https://doi.org/10.48550/arxiv.2303.17395" @default.
- W4361864998 hasPublicationYear "2023" @default.
- W4361864998 type Work @default.
- W4361864998 citedByCount "0" @default.
- W4361864998 crossrefType "posted-content" @default.
- W4361864998 hasAuthorship W4361864998A5002795838 @default.
- W4361864998 hasAuthorship W4361864998A5017907143 @default.
- W4361864998 hasAuthorship W4361864998A5037691180 @default.
- W4361864998 hasAuthorship W4361864998A5038062913 @default.
- W4361864998 hasAuthorship W4361864998A5066967599 @default.
- W4361864998 hasAuthorship W4361864998A5069433018 @default.
- W4361864998 hasAuthorship W4361864998A5070892237 @default.
- W4361864998 hasAuthorship W4361864998A5072482416 @default.
- W4361864998 hasAuthorship W4361864998A5089924305 @default.
- W4361864998 hasBestOaLocation W43618649981 @default.
- W4361864998 hasConcept C106131492 @default.
- W4361864998 hasConcept C111919701 @default.
- W4361864998 hasConcept C115961682 @default.
- W4361864998 hasConcept C119857082 @default.
- W4361864998 hasConcept C127220857 @default.
- W4361864998 hasConcept C13895895 @default.
- W4361864998 hasConcept C154945302 @default.
- W4361864998 hasConcept C157657479 @default.
- W4361864998 hasConcept C160372630 @default.
- W4361864998 hasConcept C199360897 @default.
- W4361864998 hasConcept C204321447 @default.
- W4361864998 hasConcept C28490314 @default.
- W4361864998 hasConcept C31972630 @default.
- W4361864998 hasConcept C41008148 @default.
- W4361864998 hasConcept C43521106 @default.
- W4361864998 hasConcept C64922751 @default.
- W4361864998 hasConcept C774472 @default.
- W4361864998 hasConcept C98045186 @default.
- W4361864998 hasConceptScore W4361864998C106131492 @default.
- W4361864998 hasConceptScore W4361864998C111919701 @default.
- W4361864998 hasConceptScore W4361864998C115961682 @default.
- W4361864998 hasConceptScore W4361864998C119857082 @default.
- W4361864998 hasConceptScore W4361864998C127220857 @default.
- W4361864998 hasConceptScore W4361864998C13895895 @default.
- W4361864998 hasConceptScore W4361864998C154945302 @default.
- W4361864998 hasConceptScore W4361864998C157657479 @default.
- W4361864998 hasConceptScore W4361864998C160372630 @default.
- W4361864998 hasConceptScore W4361864998C199360897 @default.
- W4361864998 hasConceptScore W4361864998C204321447 @default.
- W4361864998 hasConceptScore W4361864998C28490314 @default.
- W4361864998 hasConceptScore W4361864998C31972630 @default.
- W4361864998 hasConceptScore W4361864998C41008148 @default.
- W4361864998 hasConceptScore W4361864998C43521106 @default.
- W4361864998 hasConceptScore W4361864998C64922751 @default.
- W4361864998 hasConceptScore W4361864998C774472 @default.
- W4361864998 hasConceptScore W4361864998C98045186 @default.
- W4361864998 hasLocation W43618649981 @default.
- W4361864998 hasOpenAccess W4361864998 @default.
- W4361864998 hasPrimaryLocation W43618649981 @default.
- W4361864998 hasRelatedWork W1603949574 @default.
- W4361864998 hasRelatedWork W2020540721 @default.
- W4361864998 hasRelatedWork W2170815394 @default.
- W4361864998 hasRelatedWork W2353318413 @default.
- W4361864998 hasRelatedWork W2379113420 @default.
- W4361864998 hasRelatedWork W2604447241 @default.
- W4361864998 hasRelatedWork W2897411159 @default.
- W4361864998 hasRelatedWork W3107474891 @default.
- W4361864998 hasRelatedWork W4309129408 @default.
- W4361864998 hasRelatedWork W4315705606 @default.
- W4361864998 isParatext "false" @default.
- W4361864998 isRetracted "false" @default.
- W4361864998 workType "article" @default.