Matches in SemOpenAlex for { <https://semopenalex.org/work/W4295308583> ?p ?o ?g. }
- W4295308583 endingPage "13" @default.
- W4295308583 startingPage "1" @default.
- W4295308583 abstract "Recently, Vision Transformers (ViTs) have been broadly explored in visual recognition. With low efficiency in encoding fine-level features, the performance of ViTs is still inferior to the state-of-the-art CNNs when trained from scratch on a midsize dataset like ImageNet. Through experimental analysis, we find it is because of two reasons: 1) the simple tokenization of input images fails to model the important local structure such as edges and lines, leading to low training sample efficiency; 2) the redundant attention backbone design of ViTs leads to limited feature richness for fixed computation budgets and limited training samples. To overcome such limitations, we present a new simple and generic architecture, termed Vision Outlooker (VOLO), which implements a novel outlook attention operation that dynamically conduct the local feature aggregation mechanism in a sliding window manner across the input image. Unlike self-attention that focuses on modeling global dependencies of local features at a coarse level, our outlook attention targets at encoding finer-level features, which is critical for recognition but ignored by self-attention. Outlook attention breaks the bottleneck of self-attention whose computation cost scales quadratically with the input spatial dimension, and thus is much more memory efficient. Compared to our Tokens-To-Token Vision Transformer (T2T-ViT), VOLO can more efficiently encode fine-level features that are essential for high-performance visual recognition. Experiments show that with only 26.6 M learnable parameters, VOLO achieves 84.2% top-1 accuracy on ImageNet-1 K without using extra training data, 2.7% better than T2T-ViT with a comparable number of parameters. When the model size is scaled up to 296 M parameters, its performance can be further improved to 87.1%, setting a new record for ImageNet-1 K classification. In addition, we also take the proposed VOLO as pretrained models and report superior performance on downstream tasks, such as semantic segmentation. Code is available at https://github.com/sail-sg/volo." @default.
- W4295308583 created "2022-09-12" @default.
- W4295308583 creator A5020496314 @default.
- W4295308583 creator A5033884558 @default.
- W4295308583 creator A5039765869 @default.
- W4295308583 creator A5040392623 @default.
- W4295308583 creator A5072421286 @default.
- W4295308583 date "2022-01-01" @default.
- W4295308583 modified "2023-10-14" @default.
- W4295308583 title "VOLO: Vision Outlooker for Visual Recognition" @default.
- W4295308583 cites W2097117768 @default.
- W4295308583 cites W2108598243 @default.
- W4295308583 cites W2183341477 @default.
- W4295308583 cites W2194775991 @default.
- W4295308583 cites W2340897893 @default.
- W4295308583 cites W2507296351 @default.
- W4295308583 cites W2531409750 @default.
- W4295308583 cites W2549139847 @default.
- W4295308583 cites W2560023338 @default.
- W4295308583 cites W2752782242 @default.
- W4295308583 cites W2799213142 @default.
- W4295308583 cites W2884822772 @default.
- W4295308583 cites W2955058313 @default.
- W4295308583 cites W2963091558 @default.
- W4295308583 cites W2963446712 @default.
- W4295308583 cites W2963495494 @default.
- W4295308583 cites W2963954913 @default.
- W4295308583 cites W2964081807 @default.
- W4295308583 cites W2964350391 @default.
- W4295308583 cites W2970986510 @default.
- W4295308583 cites W2981413347 @default.
- W4295308583 cites W2983446232 @default.
- W4295308583 cites W2992308087 @default.
- W4295308583 cites W3034502973 @default.
- W4295308583 cites W3034756453 @default.
- W4295308583 cites W3034885317 @default.
- W4295308583 cites W3035743198 @default.
- W4295308583 cites W3097065222 @default.
- W4295308583 cites W3099924936 @default.
- W4295308583 cites W3106266119 @default.
- W4295308583 cites W3121523901 @default.
- W4295308583 cites W3131500599 @default.
- W4295308583 cites W3138516171 @default.
- W4295308583 cites W3151130473 @default.
- W4295308583 cites W3170841864 @default.
- W4295308583 cites W3172509117 @default.
- W4295308583 cites W3177052299 @default.
- W4295308583 cites W3177349073 @default.
- W4295308583 cites W3179869055 @default.
- W4295308583 cites W4214493665 @default.
- W4295308583 cites W4214614183 @default.
- W4295308583 cites W4214634256 @default.
- W4295308583 cites W4214636423 @default.
- W4295308583 cites W4214893857 @default.
- W4295308583 cites W4312349930 @default.
- W4295308583 cites W4312769131 @default.
- W4295308583 cites W4313156423 @default.
- W4295308583 doi "https://doi.org/10.1109/tpami.2022.3206108" @default.
- W4295308583 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/36094970" @default.
- W4295308583 hasPublicationYear "2022" @default.
- W4295308583 type Work @default.
- W4295308583 citedByCount "38" @default.
- W4295308583 countsByYear W42953085832022 @default.
- W4295308583 countsByYear W42953085832023 @default.
- W4295308583 crossrefType "journal-article" @default.
- W4295308583 hasAuthorship W4295308583A5020496314 @default.
- W4295308583 hasAuthorship W4295308583A5033884558 @default.
- W4295308583 hasAuthorship W4295308583A5039765869 @default.
- W4295308583 hasAuthorship W4295308583A5040392623 @default.
- W4295308583 hasAuthorship W4295308583A5072421286 @default.
- W4295308583 hasBestOaLocation W42953085832 @default.
- W4295308583 hasConcept C11413529 @default.
- W4295308583 hasConcept C121332964 @default.
- W4295308583 hasConcept C138885662 @default.
- W4295308583 hasConcept C149635348 @default.
- W4295308583 hasConcept C153180895 @default.
- W4295308583 hasConcept C154945302 @default.
- W4295308583 hasConcept C165801399 @default.
- W4295308583 hasConcept C2776401178 @default.
- W4295308583 hasConcept C2780513914 @default.
- W4295308583 hasConcept C38652104 @default.
- W4295308583 hasConcept C41008148 @default.
- W4295308583 hasConcept C41895202 @default.
- W4295308583 hasConcept C45374587 @default.
- W4295308583 hasConcept C48145219 @default.
- W4295308583 hasConcept C62520636 @default.
- W4295308583 hasConcept C66322947 @default.
- W4295308583 hasConceptScore W4295308583C11413529 @default.
- W4295308583 hasConceptScore W4295308583C121332964 @default.
- W4295308583 hasConceptScore W4295308583C138885662 @default.
- W4295308583 hasConceptScore W4295308583C149635348 @default.
- W4295308583 hasConceptScore W4295308583C153180895 @default.
- W4295308583 hasConceptScore W4295308583C154945302 @default.
- W4295308583 hasConceptScore W4295308583C165801399 @default.
- W4295308583 hasConceptScore W4295308583C2776401178 @default.
- W4295308583 hasConceptScore W4295308583C2780513914 @default.
- W4295308583 hasConceptScore W4295308583C38652104 @default.
- W4295308583 hasConceptScore W4295308583C41008148 @default.