Matches in SemOpenAlex for { <https://semopenalex.org/work/W4385488967> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W4385488967 abstract "Text-to-audio grounding (TAG) aims to detect sound events described by natural language in an audio clip. Strongly-supervised TAG requires extensive human annotations of the events’ on- and off-sets. To mitigate the reliance on strongly-annotated data, weakly-supervised TAG (WSTAG) is proposed to train TAG on audio captioning data based on contrastive learning. However, crucial components in WSTAG, namely pooling strategies and loss functions, remain unexplored. Directly bringing their corresponding ones in closely-related tasks, such as sound event detection (SED) and audio-text retrieval, do not necessarily fit this task due to TAG’s unique requirement of fine-grained alignment via free text. In this work, we first improve the TAG dataset to obtain a more reliable TAG performance indicator, AudioGrounding v2. Then we extensively investigate the effects of these components on WSTAG. The result on the refined dataset demonstrates that the pooling strategy is crucial to the model performance while the loss function presents much less influence. By combining proper pooling strategies and loss functions, we explore a more effective WSTAG framework that significantly enhances the ability to detect events, especially for short-duration ones <sup xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sup> . <sup xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sup> The code and data are available athttps://github.com/wsntxxn/TextToAudioGrounding" @default.
- W4385488967 created "2023-08-03" @default.
- W4385488967 creator A5025827045 @default.
- W4385488967 creator A5043098653 @default.
- W4385488967 creator A5081865665 @default.
- W4385488967 date "2023-06-04" @default.
- W4385488967 modified "2023-10-14" @default.
- W4385488967 title "Investigating Pooling Strategies and Loss Functions for Weakly-Supervised Text-to-Audio Grounding via Contrastive Learning" @default.
- W4385488967 cites W1773149199 @default.
- W4385488967 cites W1905882502 @default.
- W4385488967 cites W2135342008 @default.
- W4385488967 cites W2591013610 @default.
- W4385488967 cites W2593116425 @default.
- W4385488967 cites W2963610932 @default.
- W4385488967 cites W2965809241 @default.
- W4385488967 cites W2997525715 @default.
- W4385488967 cites W3006275583 @default.
- W4385488967 cites W3015190346 @default.
- W4385488967 cites W3016059657 @default.
- W4385488967 cites W3035212740 @default.
- W4385488967 cites W3162999565 @default.
- W4385488967 cites W3163843406 @default.
- W4385488967 cites W3204267711 @default.
- W4385488967 cites W4221157007 @default.
- W4385488967 cites W4224920041 @default.
- W4385488967 doi "https://doi.org/10.1109/icasspw59220.2023.10192960" @default.
- W4385488967 hasPublicationYear "2023" @default.
- W4385488967 type Work @default.
- W4385488967 citedByCount "0" @default.
- W4385488967 crossrefType "proceedings-article" @default.
- W4385488967 hasAuthorship W4385488967A5025827045 @default.
- W4385488967 hasAuthorship W4385488967A5043098653 @default.
- W4385488967 hasAuthorship W4385488967A5081865665 @default.
- W4385488967 hasConcept C115961682 @default.
- W4385488967 hasConcept C119857082 @default.
- W4385488967 hasConcept C14036430 @default.
- W4385488967 hasConcept C154945302 @default.
- W4385488967 hasConcept C157657479 @default.
- W4385488967 hasConcept C162324750 @default.
- W4385488967 hasConcept C177264268 @default.
- W4385488967 hasConcept C187736073 @default.
- W4385488967 hasConcept C195324797 @default.
- W4385488967 hasConcept C199360897 @default.
- W4385488967 hasConcept C204321447 @default.
- W4385488967 hasConcept C23123220 @default.
- W4385488967 hasConcept C2776760102 @default.
- W4385488967 hasConcept C2780451532 @default.
- W4385488967 hasConcept C28490314 @default.
- W4385488967 hasConcept C41008148 @default.
- W4385488967 hasConcept C70437156 @default.
- W4385488967 hasConcept C78458016 @default.
- W4385488967 hasConcept C86803240 @default.
- W4385488967 hasConceptScore W4385488967C115961682 @default.
- W4385488967 hasConceptScore W4385488967C119857082 @default.
- W4385488967 hasConceptScore W4385488967C14036430 @default.
- W4385488967 hasConceptScore W4385488967C154945302 @default.
- W4385488967 hasConceptScore W4385488967C157657479 @default.
- W4385488967 hasConceptScore W4385488967C162324750 @default.
- W4385488967 hasConceptScore W4385488967C177264268 @default.
- W4385488967 hasConceptScore W4385488967C187736073 @default.
- W4385488967 hasConceptScore W4385488967C195324797 @default.
- W4385488967 hasConceptScore W4385488967C199360897 @default.
- W4385488967 hasConceptScore W4385488967C204321447 @default.
- W4385488967 hasConceptScore W4385488967C23123220 @default.
- W4385488967 hasConceptScore W4385488967C2776760102 @default.
- W4385488967 hasConceptScore W4385488967C2780451532 @default.
- W4385488967 hasConceptScore W4385488967C28490314 @default.
- W4385488967 hasConceptScore W4385488967C41008148 @default.
- W4385488967 hasConceptScore W4385488967C70437156 @default.
- W4385488967 hasConceptScore W4385488967C78458016 @default.
- W4385488967 hasConceptScore W4385488967C86803240 @default.
- W4385488967 hasLocation W43854889671 @default.
- W4385488967 hasOpenAccess W4385488967 @default.
- W4385488967 hasPrimaryLocation W43854889671 @default.
- W4385488967 hasRelatedWork W159132833 @default.
- W4385488967 hasRelatedWork W2293457016 @default.
- W4385488967 hasRelatedWork W2502722637 @default.
- W4385488967 hasRelatedWork W2735824434 @default.
- W4385488967 hasRelatedWork W2963898017 @default.
- W4385488967 hasRelatedWork W2977842567 @default.
- W4385488967 hasRelatedWork W3090988983 @default.
- W4385488967 hasRelatedWork W3093454656 @default.
- W4385488967 hasRelatedWork W4283368658 @default.
- W4385488967 hasRelatedWork W1872130062 @default.
- W4385488967 isParatext "false" @default.
- W4385488967 isRetracted "false" @default.
- W4385488967 workType "article" @default.