Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386072368> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W4386072368 abstract "Sound source localization is a typical and challenging task that predicts the location of sound sources in a video. Previous single-source methods mainly used the audio-visual association as clues to localize sounding objects in each image. Due to the mixed property of multiple sound sources in the original space, there exist rare multi-source approaches to localizing multiple sources simultaneously, except for one recent work using a contrastive random walk in the graph with images and separated sound as nodes. Despite their promising performance, they can only handle a fixed number of sources, and they cannot learn compact class-aware representations for individual sources. To alleviate this shortcoming, in this paper, we propose a novel audio-visual grouping network, namely AVGN, that can directly learn category-wise semantic features for each source from the input audio mixture and image to localize multiple sources simultaneously. Specifically, our AVGN leverages learnable audio-visual class tokens to aggregate class-aware source features. Then, the aggregated semantic features for each source can be used as guidance to localize the corresponding visual regions. Compared to existing multi-source methods, our new framework can localize a flexible number of sources and disentangle category-aware audio-visual representations for individual sound sources. We conduct extensive experiments on MUSIC, VGGSound-Instruments, and VGG-Sound Sources benchmarks. The results demonstrate that the proposed AVGN can achieve state-of-the-art sounding object localization performance on both single-source and multi-source scenarios. Code is available at https://github.com/stoneMo/AVGN." @default.
- W4386072368 created "2023-08-23" @default.
- W4386072368 creator A5042783792 @default.
- W4386072368 creator A5063943130 @default.
- W4386072368 date "2023-06-01" @default.
- W4386072368 modified "2023-09-27" @default.
- W4386072368 title "Audio-Visual Grouping Network for Sound Localization from Mixtures" @default.
- W4386072368 cites W2105582566 @default.
- W4386072368 cites W2108598243 @default.
- W4386072368 cites W2194775991 @default.
- W4386072368 cites W2619697695 @default.
- W4386072368 cites W2931433835 @default.
- W4386072368 cites W2963680395 @default.
- W4386072368 cites W2981816492 @default.
- W4386072368 cites W2981851635 @default.
- W4386072368 cites W2982619606 @default.
- W4386072368 cites W2982624843 @default.
- W4386072368 cites W2990113535 @default.
- W4386072368 cites W3015371781 @default.
- W4386072368 cites W3017343282 @default.
- W4386072368 cites W3110606395 @default.
- W4386072368 cites W3169318522 @default.
- W4386072368 cites W3170088426 @default.
- W4386072368 cites W3175300676 @default.
- W4386072368 cites W3175335326 @default.
- W4386072368 cites W4214497471 @default.
- W4386072368 cites W4224925617 @default.
- W4386072368 cites W4289665794 @default.
- W4386072368 cites W4312926266 @default.
- W4386072368 cites W4312980231 @default.
- W4386072368 doi "https://doi.org/10.1109/cvpr52729.2023.01018" @default.
- W4386072368 hasPublicationYear "2023" @default.
- W4386072368 type Work @default.
- W4386072368 citedByCount "0" @default.
- W4386072368 crossrefType "proceedings-article" @default.
- W4386072368 hasAuthorship W4386072368A5042783792 @default.
- W4386072368 hasAuthorship W4386072368A5063943130 @default.
- W4386072368 hasConcept C111919701 @default.
- W4386072368 hasConcept C114793014 @default.
- W4386072368 hasConcept C127313418 @default.
- W4386072368 hasConcept C13895895 @default.
- W4386072368 hasConcept C154945302 @default.
- W4386072368 hasConcept C203718221 @default.
- W4386072368 hasConcept C2776864781 @default.
- W4386072368 hasConcept C2777212361 @default.
- W4386072368 hasConcept C2781238097 @default.
- W4386072368 hasConcept C28490314 @default.
- W4386072368 hasConcept C3017588708 @default.
- W4386072368 hasConcept C36464697 @default.
- W4386072368 hasConcept C41008148 @default.
- W4386072368 hasConcept C43126263 @default.
- W4386072368 hasConcept C49774154 @default.
- W4386072368 hasConcept C64922751 @default.
- W4386072368 hasConcept C93240960 @default.
- W4386072368 hasConceptScore W4386072368C111919701 @default.
- W4386072368 hasConceptScore W4386072368C114793014 @default.
- W4386072368 hasConceptScore W4386072368C127313418 @default.
- W4386072368 hasConceptScore W4386072368C13895895 @default.
- W4386072368 hasConceptScore W4386072368C154945302 @default.
- W4386072368 hasConceptScore W4386072368C203718221 @default.
- W4386072368 hasConceptScore W4386072368C2776864781 @default.
- W4386072368 hasConceptScore W4386072368C2777212361 @default.
- W4386072368 hasConceptScore W4386072368C2781238097 @default.
- W4386072368 hasConceptScore W4386072368C28490314 @default.
- W4386072368 hasConceptScore W4386072368C3017588708 @default.
- W4386072368 hasConceptScore W4386072368C36464697 @default.
- W4386072368 hasConceptScore W4386072368C41008148 @default.
- W4386072368 hasConceptScore W4386072368C43126263 @default.
- W4386072368 hasConceptScore W4386072368C49774154 @default.
- W4386072368 hasConceptScore W4386072368C64922751 @default.
- W4386072368 hasConceptScore W4386072368C93240960 @default.
- W4386072368 hasLocation W43860723681 @default.
- W4386072368 hasOpenAccess W4386072368 @default.
- W4386072368 hasPrimaryLocation W43860723681 @default.
- W4386072368 hasRelatedWork W2117408192 @default.
- W4386072368 hasRelatedWork W2136644350 @default.
- W4386072368 hasRelatedWork W2282641350 @default.
- W4386072368 hasRelatedWork W2562176306 @default.
- W4386072368 hasRelatedWork W3120099295 @default.
- W4386072368 hasRelatedWork W3133898458 @default.
- W4386072368 hasRelatedWork W3160244659 @default.
- W4386072368 hasRelatedWork W4361865606 @default.
- W4386072368 hasRelatedWork W4381327731 @default.
- W4386072368 hasRelatedWork W4386072368 @default.
- W4386072368 isParatext "false" @default.
- W4386072368 isRetracted "false" @default.
- W4386072368 workType "article" @default.