Matches in SemOpenAlex for { <https://semopenalex.org/work/W4313142416> ?p ?o ?g. }
Showing items 1 to 85 of
85
with 100 items per page.
- W4313142416 abstract "Many adaptations of transformers have emerged to address the single-modal vision tasks, where self-attention modules are stacked to handle input sources like images. Intuitively, feeding multiple modalities of data to vision transformers could improve the performance, yet the innermodal attentive weights may be diluted, which could thus greatly undermine the final performance. In this paper, we propose a multimodal token fusion method (TokenFusion), tailored for transformer-based vision tasks. To effectively fuse multiple modalities, TokenFusion dynamically detects uninformative tokens and substitute these tokens with projected and aggregated inter-modal features. Residual positional alignment is also adopted to enable explicit utilization of the inter-modal alignments after fusion. The design of TokenFusion allows the transformer to learn correlations among multimodal features, while the single-modal transformer architecture remains largely intact. Extensive experiments are conducted on a variety of homogeneous and heterogeneous modalities and demonstrate that TokenFusion surpasses state-of-the-art methods in three typical vision tasks: multimodal image-to-image translation, RGB-depth semantic segmentation, and 3D object detection with point cloud and images. Code will be released <sup xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sup> <sup xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>1</sup> https://github.com/huawei-noah/noah-research <sup xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>2</sup> <sup xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>2</sup> https://gitee.com/mindspore/models/tree/master/research/cv/TokenFusion." @default.
- W4313142416 created "2023-01-06" @default.
- W4313142416 creator A5006817088 @default.
- W4313142416 creator A5032642601 @default.
- W4313142416 creator A5055546056 @default.
- W4313142416 creator A5062581018 @default.
- W4313142416 creator A5072471193 @default.
- W4313142416 creator A5090184412 @default.
- W4313142416 date "2022-06-01" @default.
- W4313142416 modified "2023-10-05" @default.
- W4313142416 title "Multimodal Token Fusion for Vision Transformers" @default.
- W4313142416 cites W1903029394 @default.
- W4313142416 cites W1923184257 @default.
- W4313142416 cites W2067912884 @default.
- W4313142416 cites W2594519801 @default.
- W4313142416 cites W2775906317 @default.
- W4313142416 cites W2910281775 @default.
- W4313142416 cites W2970421518 @default.
- W4313142416 cites W2988715931 @default.
- W4313142416 cites W3002271958 @default.
- W4313142416 cites W3034429258 @default.
- W4313142416 cites W3034430142 @default.
- W4313142416 cites W3034579518 @default.
- W4313142416 cites W3096387236 @default.
- W4313142416 cites W3099155473 @default.
- W4313142416 cites W3138516171 @default.
- W4313142416 cites W3170841864 @default.
- W4313142416 cites W3201844719 @default.
- W4313142416 cites W4214526701 @default.
- W4313142416 doi "https://doi.org/10.1109/cvpr52688.2022.01187" @default.
- W4313142416 hasPublicationYear "2022" @default.
- W4313142416 type Work @default.
- W4313142416 citedByCount "18" @default.
- W4313142416 countsByYear W43131424162023 @default.
- W4313142416 crossrefType "proceedings-article" @default.
- W4313142416 hasAuthorship W4313142416A5006817088 @default.
- W4313142416 hasAuthorship W4313142416A5032642601 @default.
- W4313142416 hasAuthorship W4313142416A5055546056 @default.
- W4313142416 hasAuthorship W4313142416A5062581018 @default.
- W4313142416 hasAuthorship W4313142416A5072471193 @default.
- W4313142416 hasAuthorship W4313142416A5090184412 @default.
- W4313142416 hasBestOaLocation W43131424162 @default.
- W4313142416 hasConcept C119599485 @default.
- W4313142416 hasConcept C127413603 @default.
- W4313142416 hasConcept C144024400 @default.
- W4313142416 hasConcept C154945302 @default.
- W4313142416 hasConcept C165801399 @default.
- W4313142416 hasConcept C2779903281 @default.
- W4313142416 hasConcept C31972630 @default.
- W4313142416 hasConcept C33954974 @default.
- W4313142416 hasConcept C36289849 @default.
- W4313142416 hasConcept C38652104 @default.
- W4313142416 hasConcept C41008148 @default.
- W4313142416 hasConcept C48145219 @default.
- W4313142416 hasConcept C66322947 @default.
- W4313142416 hasConceptScore W4313142416C119599485 @default.
- W4313142416 hasConceptScore W4313142416C127413603 @default.
- W4313142416 hasConceptScore W4313142416C144024400 @default.
- W4313142416 hasConceptScore W4313142416C154945302 @default.
- W4313142416 hasConceptScore W4313142416C165801399 @default.
- W4313142416 hasConceptScore W4313142416C2779903281 @default.
- W4313142416 hasConceptScore W4313142416C31972630 @default.
- W4313142416 hasConceptScore W4313142416C33954974 @default.
- W4313142416 hasConceptScore W4313142416C36289849 @default.
- W4313142416 hasConceptScore W4313142416C38652104 @default.
- W4313142416 hasConceptScore W4313142416C41008148 @default.
- W4313142416 hasConceptScore W4313142416C48145219 @default.
- W4313142416 hasConceptScore W4313142416C66322947 @default.
- W4313142416 hasLocation W43131424161 @default.
- W4313142416 hasLocation W43131424162 @default.
- W4313142416 hasOpenAccess W4313142416 @default.
- W4313142416 hasPrimaryLocation W43131424161 @default.
- W4313142416 hasRelatedWork W1891287906 @default.
- W4313142416 hasRelatedWork W1969923398 @default.
- W4313142416 hasRelatedWork W2036807459 @default.
- W4313142416 hasRelatedWork W2166024367 @default.
- W4313142416 hasRelatedWork W2375389409 @default.
- W4313142416 hasRelatedWork W2565829216 @default.
- W4313142416 hasRelatedWork W2755342338 @default.
- W4313142416 hasRelatedWork W2772917594 @default.
- W4313142416 hasRelatedWork W2775347418 @default.
- W4313142416 hasRelatedWork W3116076068 @default.
- W4313142416 isParatext "false" @default.
- W4313142416 isRetracted "false" @default.
- W4313142416 workType "article" @default.