Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386159996> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W4386159996 abstract "Audio-visual zero-shot learning (ZSL), which learns to classify video data from the classes not being observed during training, is challenging. In audio-visual ZSL, both semantic and temporal information from different modalities is relevant to each other. However, effectively extracting and fusing information from audio and visual remains an open challenge. In this work, we propose an Audio-Visual Modality-fusion Spiking Transformer network (AVMST) for audio-visual ZSL. To be more specific, AVMST provides a spiking neural network (SNN) module for extracting conspicuous temporal information of each modality, a cross-attention block to effectively fuse the temporal and semantic information, and a transformer reasoning module to further explore the interrelationships of fusion features. To provide robust temporal features, the spiking threshold of the SNN module is adjusted dynamically based on the semantic cues of different modalities. The generated feature map is in accordance with the zero-shot learning property thanks to our proposed spiking transformer’s ability to combine the robustness of SNN feature extraction and the precision of transformer feature inference. Extensive experiments on three benchmark audio-visual datasets (i.e., VGGSound, UCF and ActivityNet) validate that the proposed AVMST outperforms existing state-of-the-art methods by a significant margin. The code and pre-trained models are available at https://github.com/liwr-hit/ICME23_AVMST." @default.
- W4386159996 created "2023-08-26" @default.
- W4386159996 creator A5046442204 @default.
- W4386159996 creator A5072721151 @default.
- W4386159996 creator A5079107646 @default.
- W4386159996 creator A5079412089 @default.
- W4386159996 creator A5088224232 @default.
- W4386159996 date "2023-07-01" @default.
- W4386159996 modified "2023-09-27" @default.
- W4386159996 title "Modality-Fusion Spiking Transformer Network for Audio-Visual Zero-Shot Learning" @default.
- W4386159996 cites W101771737 @default.
- W4386159996 cites W1927052826 @default.
- W4386159996 cites W2513853720 @default.
- W4386159996 cites W2621826044 @default.
- W4386159996 cites W2897067191 @default.
- W4386159996 cites W2924476266 @default.
- W4386159996 cites W3015371781 @default.
- W4386159996 cites W3025520547 @default.
- W4386159996 cites W3102040318 @default.
- W4386159996 cites W3102087395 @default.
- W4386159996 cites W3119136678 @default.
- W4386159996 cites W3188165077 @default.
- W4386159996 cites W4313046728 @default.
- W4386159996 cites W4382240632 @default.
- W4386159996 doi "https://doi.org/10.1109/icme55011.2023.00080" @default.
- W4386159996 hasPublicationYear "2023" @default.
- W4386159996 type Work @default.
- W4386159996 citedByCount "0" @default.
- W4386159996 crossrefType "proceedings-article" @default.
- W4386159996 hasAuthorship W4386159996A5046442204 @default.
- W4386159996 hasAuthorship W4386159996A5072721151 @default.
- W4386159996 hasAuthorship W4386159996A5079107646 @default.
- W4386159996 hasAuthorship W4386159996A5079412089 @default.
- W4386159996 hasAuthorship W4386159996A5088224232 @default.
- W4386159996 hasConcept C104317684 @default.
- W4386159996 hasConcept C119599485 @default.
- W4386159996 hasConcept C119857082 @default.
- W4386159996 hasConcept C127413603 @default.
- W4386159996 hasConcept C153180895 @default.
- W4386159996 hasConcept C154945302 @default.
- W4386159996 hasConcept C165801399 @default.
- W4386159996 hasConcept C185592680 @default.
- W4386159996 hasConcept C2776214188 @default.
- W4386159996 hasConcept C28490314 @default.
- W4386159996 hasConcept C3017588708 @default.
- W4386159996 hasConcept C36464697 @default.
- W4386159996 hasConcept C41008148 @default.
- W4386159996 hasConcept C49774154 @default.
- W4386159996 hasConcept C52622490 @default.
- W4386159996 hasConcept C55493867 @default.
- W4386159996 hasConcept C63479239 @default.
- W4386159996 hasConcept C66322947 @default.
- W4386159996 hasConceptScore W4386159996C104317684 @default.
- W4386159996 hasConceptScore W4386159996C119599485 @default.
- W4386159996 hasConceptScore W4386159996C119857082 @default.
- W4386159996 hasConceptScore W4386159996C127413603 @default.
- W4386159996 hasConceptScore W4386159996C153180895 @default.
- W4386159996 hasConceptScore W4386159996C154945302 @default.
- W4386159996 hasConceptScore W4386159996C165801399 @default.
- W4386159996 hasConceptScore W4386159996C185592680 @default.
- W4386159996 hasConceptScore W4386159996C2776214188 @default.
- W4386159996 hasConceptScore W4386159996C28490314 @default.
- W4386159996 hasConceptScore W4386159996C3017588708 @default.
- W4386159996 hasConceptScore W4386159996C36464697 @default.
- W4386159996 hasConceptScore W4386159996C41008148 @default.
- W4386159996 hasConceptScore W4386159996C49774154 @default.
- W4386159996 hasConceptScore W4386159996C52622490 @default.
- W4386159996 hasConceptScore W4386159996C55493867 @default.
- W4386159996 hasConceptScore W4386159996C63479239 @default.
- W4386159996 hasConceptScore W4386159996C66322947 @default.
- W4386159996 hasFunder F4320321001 @default.
- W4386159996 hasLocation W43861599961 @default.
- W4386159996 hasOpenAccess W4386159996 @default.
- W4386159996 hasPrimaryLocation W43861599961 @default.
- W4386159996 hasRelatedWork W1964120219 @default.
- W4386159996 hasRelatedWork W2000165426 @default.
- W4386159996 hasRelatedWork W2114557664 @default.
- W4386159996 hasRelatedWork W2144059113 @default.
- W4386159996 hasRelatedWork W2146076056 @default.
- W4386159996 hasRelatedWork W2385132419 @default.
- W4386159996 hasRelatedWork W2772780115 @default.
- W4386159996 hasRelatedWork W2811390910 @default.
- W4386159996 hasRelatedWork W2942471066 @default.
- W4386159996 hasRelatedWork W3003836766 @default.
- W4386159996 isParatext "false" @default.
- W4386159996 isRetracted "false" @default.
- W4386159996 workType "article" @default.