Matches in SemOpenAlex for { <https://semopenalex.org/work/W3174906557> ?p ?o ?g. }
- W3174906557 abstract "Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released." @default.
- W3174906557 created "2021-07-05" @default.
- W3174906557 creator A5006451413 @default.
- W3174906557 creator A5035108037 @default.
- W3174906557 creator A5036002448 @default.
- W3174906557 creator A5045217258 @default.
- W3174906557 creator A5049459342 @default.
- W3174906557 creator A5060145891 @default.
- W3174906557 date "2021-06-30" @default.
- W3174906557 modified "2023-10-16" @default.
- W3174906557 title "Attention Bottlenecks for Multimodal Fusion" @default.
- W3174906557 cites W2002591263 @default.
- W3174906557 cites W2009059481 @default.
- W3174906557 cites W2103104224 @default.
- W3174906557 cites W2108598243 @default.
- W3174906557 cites W2184188583 @default.
- W3174906557 cites W2331143823 @default.
- W3174906557 cites W2526050071 @default.
- W3174906557 cites W2556930864 @default.
- W3174906557 cites W2593116425 @default.
- W3174906557 cites W2619697695 @default.
- W3174906557 cites W2619947201 @default.
- W3174906557 cites W2626778328 @default.
- W3174906557 cites W2747804831 @default.
- W3174906557 cites W2765407302 @default.
- W3174906557 cites W2767290858 @default.
- W3174906557 cites W2770804203 @default.
- W3174906557 cites W2883429621 @default.
- W3174906557 cites W2886078012 @default.
- W3174906557 cites W2914312680 @default.
- W3174906557 cites W2931316642 @default.
- W3174906557 cites W2936774411 @default.
- W3174906557 cites W2946957821 @default.
- W3174906557 cites W2948048211 @default.
- W3174906557 cites W2950971447 @default.
- W3174906557 cites W2962711930 @default.
- W3174906557 cites W2962756039 @default.
- W3174906557 cites W2962960500 @default.
- W3174906557 cites W2963115079 @default.
- W3174906557 cites W2963341956 @default.
- W3174906557 cites W2963524571 @default.
- W3174906557 cites W2968124245 @default.
- W3174906557 cites W2970608575 @default.
- W3174906557 cites W2975357369 @default.
- W3174906557 cites W2981385151 @default.
- W3174906557 cites W2989728968 @default.
- W3174906557 cites W2990152177 @default.
- W3174906557 cites W2990503944 @default.
- W3174906557 cites W2990818246 @default.
- W3174906557 cites W3002552512 @default.
- W3174906557 cites W3015371781 @default.
- W3174906557 cites W3022265721 @default.
- W3174906557 cites W3033663212 @default.
- W3174906557 cites W3034815696 @default.
- W3174906557 cites W3035303837 @default.
- W3174906557 cites W3035333188 @default.
- W3174906557 cites W3037916678 @default.
- W3174906557 cites W3043840704 @default.
- W3174906557 cites W3068510429 @default.
- W3174906557 cites W3094502228 @default.
- W3174906557 cites W3096431533 @default.
- W3174906557 cites W3098357269 @default.
- W3174906557 cites W3100177202 @default.
- W3174906557 cites W3113320078 @default.
- W3174906557 cites W3123318516 @default.
- W3174906557 cites W3123650687 @default.
- W3174906557 cites W3134144764 @default.
- W3174906557 cites W3146639881 @default.
- W3174906557 cites W3147387781 @default.
- W3174906557 cites W3154596443 @default.
- W3174906557 cites W3163937874 @default.
- W3174906557 cites W3170088426 @default.
- W3174906557 cites W2936707910 @default.
- W3174906557 doi "https://doi.org/10.48550/arxiv.2107.00135" @default.
- W3174906557 hasPublicationYear "2021" @default.
- W3174906557 type Work @default.
- W3174906557 sameAs 3174906557 @default.
- W3174906557 citedByCount "3" @default.
- W3174906557 countsByYear W31749065572021 @default.
- W3174906557 countsByYear W31749065572023 @default.
- W3174906557 crossrefType "posted-content" @default.
- W3174906557 hasAuthorship W3174906557A5006451413 @default.
- W3174906557 hasAuthorship W3174906557A5035108037 @default.
- W3174906557 hasAuthorship W3174906557A5036002448 @default.
- W3174906557 hasAuthorship W3174906557A5045217258 @default.
- W3174906557 hasAuthorship W3174906557A5049459342 @default.
- W3174906557 hasAuthorship W3174906557A5060145891 @default.
- W3174906557 hasBestOaLocation W31749065571 @default.
- W3174906557 hasConcept C119857082 @default.
- W3174906557 hasConcept C138885662 @default.
- W3174906557 hasConcept C144024400 @default.
- W3174906557 hasConcept C149635348 @default.
- W3174906557 hasConcept C154945302 @default.
- W3174906557 hasConcept C158525013 @default.
- W3174906557 hasConcept C177264268 @default.
- W3174906557 hasConcept C199360897 @default.
- W3174906557 hasConcept C2776760102 @default.
- W3174906557 hasConcept C2779903281 @default.
- W3174906557 hasConcept C2780226545 @default.
- W3174906557 hasConcept C2780513914 @default.