Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387560908> ?p ?o ?g. }
Showing items 1 to 85 of
85
with 100 items per page.
- W4387560908 abstract "Audio-Visual Segmentation (AVS) aims to extract the sounding object from a video frame, which is represented by a pixel-wise segmentation mask. The pioneering work conducts this task through dense feature-level audio-visual interaction, which ignores the dimension gap between different modalities. More specifically, the audio clip could only provide a textit{Global} semantic label in each sequence, but the video frame covers multiple semantic objects across different textit{Local} regions. In this paper, we propose a Cross-modal Cognitive Consensus guided Network (C3N) to align the audio-visual semantics from the global dimension and progressively inject them into the local regions via an attention mechanism. Firstly, a Cross-modal Cognitive Consensus Inference Module (C3IM) is developed to extract a unified-modal label by integrating audio/visual classification confidence and similarities of modality-specific label embeddings. Then, we feed the unified-modal label back to the visual backbone as the explicit semantic-level guidance via a Cognitive Consensus guided Attention Module (CCAM), which highlights the local features corresponding to the interested object. Extensive experiments on the Single Sound Source Segmentation (S4) setting and Multiple Sound Source Segmentation (MS3) setting of the AVSBench dataset demonstrate the effectiveness of the proposed method, which achieves state-of-the-art performance." @default.
- W4387560908 created "2023-10-12" @default.
- W4387560908 creator A5019769019 @default.
- W4387560908 creator A5023114270 @default.
- W4387560908 creator A5044954967 @default.
- W4387560908 creator A5075737786 @default.
- W4387560908 creator A5077165659 @default.
- W4387560908 date "2023-10-09" @default.
- W4387560908 modified "2023-10-13" @default.
- W4387560908 title "Cross-modal Cognitive Consensus guided Audio-Visual Segmentation" @default.
- W4387560908 doi "https://doi.org/10.48550/arxiv.2310.06259" @default.
- W4387560908 hasPublicationYear "2023" @default.
- W4387560908 type Work @default.
- W4387560908 citedByCount "0" @default.
- W4387560908 crossrefType "posted-content" @default.
- W4387560908 hasAuthorship W4387560908A5019769019 @default.
- W4387560908 hasAuthorship W4387560908A5023114270 @default.
- W4387560908 hasAuthorship W4387560908A5044954967 @default.
- W4387560908 hasAuthorship W4387560908A5075737786 @default.
- W4387560908 hasAuthorship W4387560908A5077165659 @default.
- W4387560908 hasBestOaLocation W43875609081 @default.
- W4387560908 hasConcept C126042441 @default.
- W4387560908 hasConcept C138885662 @default.
- W4387560908 hasConcept C153180895 @default.
- W4387560908 hasConcept C154945302 @default.
- W4387560908 hasConcept C169760540 @default.
- W4387560908 hasConcept C184337299 @default.
- W4387560908 hasConcept C185592680 @default.
- W4387560908 hasConcept C188027245 @default.
- W4387560908 hasConcept C199360897 @default.
- W4387560908 hasConcept C26760741 @default.
- W4387560908 hasConcept C2776214188 @default.
- W4387560908 hasConcept C2776401178 @default.
- W4387560908 hasConcept C2780103172 @default.
- W4387560908 hasConcept C2781238097 @default.
- W4387560908 hasConcept C28490314 @default.
- W4387560908 hasConcept C3017588708 @default.
- W4387560908 hasConcept C31972630 @default.
- W4387560908 hasConcept C41008148 @default.
- W4387560908 hasConcept C41895202 @default.
- W4387560908 hasConcept C49774154 @default.
- W4387560908 hasConcept C71139939 @default.
- W4387560908 hasConcept C76155785 @default.
- W4387560908 hasConcept C86803240 @default.
- W4387560908 hasConcept C89600930 @default.
- W4387560908 hasConceptScore W4387560908C126042441 @default.
- W4387560908 hasConceptScore W4387560908C138885662 @default.
- W4387560908 hasConceptScore W4387560908C153180895 @default.
- W4387560908 hasConceptScore W4387560908C154945302 @default.
- W4387560908 hasConceptScore W4387560908C169760540 @default.
- W4387560908 hasConceptScore W4387560908C184337299 @default.
- W4387560908 hasConceptScore W4387560908C185592680 @default.
- W4387560908 hasConceptScore W4387560908C188027245 @default.
- W4387560908 hasConceptScore W4387560908C199360897 @default.
- W4387560908 hasConceptScore W4387560908C26760741 @default.
- W4387560908 hasConceptScore W4387560908C2776214188 @default.
- W4387560908 hasConceptScore W4387560908C2776401178 @default.
- W4387560908 hasConceptScore W4387560908C2780103172 @default.
- W4387560908 hasConceptScore W4387560908C2781238097 @default.
- W4387560908 hasConceptScore W4387560908C28490314 @default.
- W4387560908 hasConceptScore W4387560908C3017588708 @default.
- W4387560908 hasConceptScore W4387560908C31972630 @default.
- W4387560908 hasConceptScore W4387560908C41008148 @default.
- W4387560908 hasConceptScore W4387560908C41895202 @default.
- W4387560908 hasConceptScore W4387560908C49774154 @default.
- W4387560908 hasConceptScore W4387560908C71139939 @default.
- W4387560908 hasConceptScore W4387560908C76155785 @default.
- W4387560908 hasConceptScore W4387560908C86803240 @default.
- W4387560908 hasConceptScore W4387560908C89600930 @default.
- W4387560908 hasLocation W43875609081 @default.
- W4387560908 hasOpenAccess W4387560908 @default.
- W4387560908 hasPrimaryLocation W43875609081 @default.
- W4387560908 hasRelatedWork W2074916782 @default.
- W4387560908 hasRelatedWork W2271369634 @default.
- W4387560908 hasRelatedWork W2330333072 @default.
- W4387560908 hasRelatedWork W2350550760 @default.
- W4387560908 hasRelatedWork W2380912101 @default.
- W4387560908 hasRelatedWork W2393726419 @default.
- W4387560908 hasRelatedWork W3137890128 @default.
- W4387560908 hasRelatedWork W4245955731 @default.
- W4387560908 hasRelatedWork W578794879 @default.
- W4387560908 hasRelatedWork W2625296515 @default.
- W4387560908 isParatext "false" @default.
- W4387560908 isRetracted "false" @default.
- W4387560908 workType "article" @default.