Matches in SemOpenAlex for { <https://semopenalex.org/work/W3151130473> ?p ?o ?g. }
Showing items 1 to 96 of
96
with 100 items per page.
- W3151130473 abstract "The recently developed vision transformer (ViT) has achieved promising results on image classification compared to convolutional neural networks. Inspired by this, in this paper, we study how to learn multi-scale feature representations in transformer models for image classification. To this end, we propose a dual-branch transformer to com-bine image patches (i.e., tokens in a transformer) of different sizes to produce stronger image features. Our approach processes small-patch and large-patch tokens with two separate branches of different computational complexity and these tokens are then fused purely by attention multiple times to complement each other. Furthermore, to reduce computation, we develop a simple yet effective token fusion module based on cross attention, which uses a single token for each branch as a query to exchange information with other branches. Our proposed cross-attention only requires linear time for both computational and memory complexity instead of quadratic time otherwise. Extensive experiments demonstrate that our approach performs better than or on par with several concurrent works on vision transformer, in addition to efficient CNN models. For example, on the ImageNet1K dataset, with some architectural changes, our approach outperforms the recent DeiT by a large margin of 2% with a small to moderate increase in FLOPs and model parameters. Our source codes and models are available at https://github.com/IBM/CrossViT." @default.
- W3151130473 created "2021-04-13" @default.
- W3151130473 creator A5025006492 @default.
- W3151130473 creator A5049734237 @default.
- W3151130473 creator A5080417608 @default.
- W3151130473 date "2021-10-01" @default.
- W3151130473 modified "2023-10-14" @default.
- W3151130473 title "CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification" @default.
- W3151130473 cites W1977295328 @default.
- W3151130473 cites W1978491093 @default.
- W3151130473 cites W2108598243 @default.
- W3151130473 cites W2150134853 @default.
- W3151130473 cites W2194775991 @default.
- W3151130473 cites W2473156356 @default.
- W3151130473 cites W2549139847 @default.
- W3151130473 cites W2560533888 @default.
- W3151130473 cites W2565639579 @default.
- W3151130473 cites W2752782242 @default.
- W3151130473 cites W2922509574 @default.
- W3151130473 cites W2962843773 @default.
- W3151130473 cites W2963091558 @default.
- W3151130473 cites W2964081807 @default.
- W3151130473 cites W2981413347 @default.
- W3151130473 cites W2983446232 @default.
- W3151130473 cites W2988396473 @default.
- W3151130473 cites W2990503944 @default.
- W3151130473 cites W2992308087 @default.
- W3151130473 cites W2998508940 @default.
- W3151130473 cites W300523764 @default.
- W3151130473 cites W3034399482 @default.
- W3151130473 cites W3034429256 @default.
- W3151130473 cites W3034552520 @default.
- W3151130473 cites W3034885317 @default.
- W3151130473 cites W3035452548 @default.
- W3151130473 cites W3035682985 @default.
- W3151130473 cites W3101156210 @default.
- W3151130473 cites W3121523901 @default.
- W3151130473 cites W3131500599 @default.
- W3151130473 doi "https://doi.org/10.1109/iccv48922.2021.00041" @default.
- W3151130473 hasPublicationYear "2021" @default.
- W3151130473 type Work @default.
- W3151130473 sameAs 3151130473 @default.
- W3151130473 citedByCount "418" @default.
- W3151130473 countsByYear W31511304732020 @default.
- W3151130473 countsByYear W31511304732021 @default.
- W3151130473 countsByYear W31511304732022 @default.
- W3151130473 countsByYear W31511304732023 @default.
- W3151130473 crossrefType "proceedings-article" @default.
- W3151130473 hasAuthorship W3151130473A5025006492 @default.
- W3151130473 hasAuthorship W3151130473A5049734237 @default.
- W3151130473 hasAuthorship W3151130473A5080417608 @default.
- W3151130473 hasBestOaLocation W31511304732 @default.
- W3151130473 hasConcept C11413529 @default.
- W3151130473 hasConcept C121332964 @default.
- W3151130473 hasConcept C153180895 @default.
- W3151130473 hasConcept C154945302 @default.
- W3151130473 hasConcept C165801399 @default.
- W3151130473 hasConcept C179799912 @default.
- W3151130473 hasConcept C38652104 @default.
- W3151130473 hasConcept C41008148 @default.
- W3151130473 hasConcept C45374587 @default.
- W3151130473 hasConcept C48145219 @default.
- W3151130473 hasConcept C62520636 @default.
- W3151130473 hasConcept C66322947 @default.
- W3151130473 hasConcept C81363708 @default.
- W3151130473 hasConceptScore W3151130473C11413529 @default.
- W3151130473 hasConceptScore W3151130473C121332964 @default.
- W3151130473 hasConceptScore W3151130473C153180895 @default.
- W3151130473 hasConceptScore W3151130473C154945302 @default.
- W3151130473 hasConceptScore W3151130473C165801399 @default.
- W3151130473 hasConceptScore W3151130473C179799912 @default.
- W3151130473 hasConceptScore W3151130473C38652104 @default.
- W3151130473 hasConceptScore W3151130473C41008148 @default.
- W3151130473 hasConceptScore W3151130473C45374587 @default.
- W3151130473 hasConceptScore W3151130473C48145219 @default.
- W3151130473 hasConceptScore W3151130473C62520636 @default.
- W3151130473 hasConceptScore W3151130473C66322947 @default.
- W3151130473 hasConceptScore W3151130473C81363708 @default.
- W3151130473 hasLocation W31511304731 @default.
- W3151130473 hasLocation W31511304732 @default.
- W3151130473 hasOpenAccess W3151130473 @default.
- W3151130473 hasPrimaryLocation W31511304731 @default.
- W3151130473 hasRelatedWork W2967478618 @default.
- W3151130473 hasRelatedWork W2970530566 @default.
- W3151130473 hasRelatedWork W2997152889 @default.
- W3151130473 hasRelatedWork W3016124757 @default.
- W3151130473 hasRelatedWork W4287775347 @default.
- W3151130473 hasRelatedWork W4288261899 @default.
- W3151130473 hasRelatedWork W4304700937 @default.
- W3151130473 hasRelatedWork W4307309205 @default.
- W3151130473 hasRelatedWork W4385009901 @default.
- W3151130473 hasRelatedWork W4385572700 @default.
- W3151130473 isParatext "false" @default.
- W3151130473 isRetracted "false" @default.
- W3151130473 magId "3151130473" @default.
- W3151130473 workType "article" @default.