Matches in SemOpenAlex for { <https://semopenalex.org/work/W4385848275> ?p ?o ?g. }
Showing items 1 to 83 of
83
with 100 items per page.
- W4385848275 abstract "The objective of the sound source localization task is to enable machines to detect the location of sound-making objects within a visual scene. While the audio modality provides spatial cues to locate the sound source, existing approaches only use audio as an auxiliary role to compare spatial regions of the visual modality. Humans, on the other hand, utilize both audio and visual modalities as spatial cues to locate sound sources. In this paper, we propose an audio-visual spatial integration network that integrates spatial cues from both modalities to mimic human behavior when detecting sound-making objects. Additionally, we introduce a recursive attention network to mimic human behavior of iterative focusing on objects, resulting in more accurate attention regions. To effectively encode spatial information from both modalities, we propose audio-visual pair matching loss and spatial region alignment loss. By utilizing the spatial cues of audio-visual modalities and recursively focusing objects, our method can perform more robust sound source localization. Comprehensive experimental results on the Flickr SoundNet and VGG-Sound Source datasets demonstrate the superiority of our proposed method over existing approaches. Our code is available at: https://github.com/VisualAIKHU/SIRA-SSL" @default.
- W4385848275 created "2023-08-16" @default.
- W4385848275 creator A5036936141 @default.
- W4385848275 creator A5047089963 @default.
- W4385848275 creator A5092645089 @default.
- W4385848275 date "2023-08-11" @default.
- W4385848275 modified "2023-09-25" @default.
- W4385848275 title "Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source Localization" @default.
- W4385848275 doi "https://doi.org/10.48550/arxiv.2308.06087" @default.
- W4385848275 hasPublicationYear "2023" @default.
- W4385848275 type Work @default.
- W4385848275 citedByCount "0" @default.
- W4385848275 crossrefType "posted-content" @default.
- W4385848275 hasAuthorship W4385848275A5036936141 @default.
- W4385848275 hasAuthorship W4385848275A5047089963 @default.
- W4385848275 hasAuthorship W4385848275A5092645089 @default.
- W4385848275 hasBestOaLocation W43858482751 @default.
- W4385848275 hasConcept C105795698 @default.
- W4385848275 hasConcept C111370547 @default.
- W4385848275 hasConcept C111919701 @default.
- W4385848275 hasConcept C114793014 @default.
- W4385848275 hasConcept C127313418 @default.
- W4385848275 hasConcept C144024400 @default.
- W4385848275 hasConcept C154945302 @default.
- W4385848275 hasConcept C159620131 @default.
- W4385848275 hasConcept C165064840 @default.
- W4385848275 hasConcept C177264268 @default.
- W4385848275 hasConcept C199360897 @default.
- W4385848275 hasConcept C203718221 @default.
- W4385848275 hasConcept C2776760102 @default.
- W4385848275 hasConcept C2779903281 @default.
- W4385848275 hasConcept C2780226545 @default.
- W4385848275 hasConcept C28490314 @default.
- W4385848275 hasConcept C3017588708 @default.
- W4385848275 hasConcept C31972630 @default.
- W4385848275 hasConcept C33923547 @default.
- W4385848275 hasConcept C36289849 @default.
- W4385848275 hasConcept C41008148 @default.
- W4385848275 hasConcept C43126263 @default.
- W4385848275 hasConcept C49774154 @default.
- W4385848275 hasConcept C62649853 @default.
- W4385848275 hasConcept C93240960 @default.
- W4385848275 hasConceptScore W4385848275C105795698 @default.
- W4385848275 hasConceptScore W4385848275C111370547 @default.
- W4385848275 hasConceptScore W4385848275C111919701 @default.
- W4385848275 hasConceptScore W4385848275C114793014 @default.
- W4385848275 hasConceptScore W4385848275C127313418 @default.
- W4385848275 hasConceptScore W4385848275C144024400 @default.
- W4385848275 hasConceptScore W4385848275C154945302 @default.
- W4385848275 hasConceptScore W4385848275C159620131 @default.
- W4385848275 hasConceptScore W4385848275C165064840 @default.
- W4385848275 hasConceptScore W4385848275C177264268 @default.
- W4385848275 hasConceptScore W4385848275C199360897 @default.
- W4385848275 hasConceptScore W4385848275C203718221 @default.
- W4385848275 hasConceptScore W4385848275C2776760102 @default.
- W4385848275 hasConceptScore W4385848275C2779903281 @default.
- W4385848275 hasConceptScore W4385848275C2780226545 @default.
- W4385848275 hasConceptScore W4385848275C28490314 @default.
- W4385848275 hasConceptScore W4385848275C3017588708 @default.
- W4385848275 hasConceptScore W4385848275C31972630 @default.
- W4385848275 hasConceptScore W4385848275C33923547 @default.
- W4385848275 hasConceptScore W4385848275C36289849 @default.
- W4385848275 hasConceptScore W4385848275C41008148 @default.
- W4385848275 hasConceptScore W4385848275C43126263 @default.
- W4385848275 hasConceptScore W4385848275C49774154 @default.
- W4385848275 hasConceptScore W4385848275C62649853 @default.
- W4385848275 hasConceptScore W4385848275C93240960 @default.
- W4385848275 hasLocation W43858482751 @default.
- W4385848275 hasOpenAccess W4385848275 @default.
- W4385848275 hasPrimaryLocation W43858482751 @default.
- W4385848275 hasRelatedWork W1976606981 @default.
- W4385848275 hasRelatedWork W1995188412 @default.
- W4385848275 hasRelatedWork W2391245565 @default.
- W4385848275 hasRelatedWork W2786306966 @default.
- W4385848275 hasRelatedWork W2897922457 @default.
- W4385848275 hasRelatedWork W2972976269 @default.
- W4385848275 hasRelatedWork W3090765191 @default.
- W4385848275 hasRelatedWork W4280504625 @default.
- W4385848275 hasRelatedWork W4320843456 @default.
- W4385848275 hasRelatedWork W4385373813 @default.
- W4385848275 isParatext "false" @default.
- W4385848275 isRetracted "false" @default.
- W4385848275 workType "article" @default.