Matches in SemOpenAlex for { <https://semopenalex.org/work/W3168489096> ?p ?o ?g. }
- W3168489096 abstract "Vision Transformers (ViTs) and MLPs signal further efforts on replacing hand-wired features or inductive biases with general-purpose neural architectures. Existing works empower the models by massive data, such as large-scale pre-training and/or repeated strong data augmentations, and still report optimization-related problems (e.g., sensitivity to initialization and learning rates). Hence, this paper investigates ViTs and MLP-Mixers from the lens of loss geometry, intending to improve the models' data efficiency at training and generalization at inference. Visualization and Hessian reveal extremely sharp local minima of converged models. By promoting smoothness with a recently proposed sharpness-aware optimizer, we substantially improve the accuracy and robustness of ViTs and MLP-Mixers on various tasks spanning supervised, adversarial, contrastive, and transfer learning (e.g., +5.3% and +11.0% top-1 accuracy on ImageNet for ViT-B/16 and Mixer-B/16, respectively, with the simple Inception-style preprocessing). We show that the improved smoothness attributes to sparser active neurons in the first few layers. The resultant ViTs outperform ResNets of similar size and throughput when trained from scratch on ImageNet without large-scale pre-training or strong data augmentations. Model checkpoints are available at url{https://github.com/google-research/vision_transformer}." @default.
- W3168489096 created "2021-06-22" @default.
- W3168489096 creator A5010841999 @default.
- W3168489096 creator A5017319429 @default.
- W3168489096 creator A5044019481 @default.
- W3168489096 date "2021-06-02" @default.
- W3168489096 modified "2023-09-25" @default.
- W3168489096 title "When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations" @default.
- W3168489096 cites W1977295328 @default.
- W3168489096 cites W2095705004 @default.
- W3168489096 cites W2108598243 @default.
- W3168489096 cites W2183341477 @default.
- W3168489096 cites W2194775991 @default.
- W3168489096 cites W2533598788 @default.
- W3168489096 cites W2626778328 @default.
- W3168489096 cites W2899663614 @default.
- W3168489096 cites W2949736877 @default.
- W3168489096 cites W2962781217 @default.
- W3168489096 cites W2962819303 @default.
- W3168489096 cites W2962843773 @default.
- W3168489096 cites W2962900737 @default.
- W3168489096 cites W2962933129 @default.
- W3168489096 cites W2963060032 @default.
- W3168489096 cites W2963063862 @default.
- W3168489096 cites W2963069632 @default.
- W3168489096 cites W2963317585 @default.
- W3168489096 cites W2963341956 @default.
- W3168489096 cites W2963399829 @default.
- W3168489096 cites W2963509076 @default.
- W3168489096 cites W2963959597 @default.
- W3168489096 cites W2964121744 @default.
- W3168489096 cites W2964253222 @default.
- W3168489096 cites W2970317235 @default.
- W3168489096 cites W2970375336 @default.
- W3168489096 cites W2992308087 @default.
- W3168489096 cites W2995435108 @default.
- W3168489096 cites W2996012599 @default.
- W3168489096 cites W2996564870 @default.
- W3168489096 cites W3034363135 @default.
- W3168489096 cites W3034978746 @default.
- W3168489096 cites W3035524453 @default.
- W3168489096 cites W3035584989 @default.
- W3168489096 cites W3035743198 @default.
- W3168489096 cites W3037492894 @default.
- W3168489096 cites W3047389762 @default.
- W3168489096 cites W3093329015 @default.
- W3168489096 cites W3097217077 @default.
- W3168489096 cites W3100345210 @default.
- W3168489096 cites W3102631365 @default.
- W3168489096 cites W3103385169 @default.
- W3168489096 cites W3116489684 @default.
- W3168489096 cites W3118608800 @default.
- W3168489096 cites W3119786062 @default.
- W3168489096 cites W3121523901 @default.
- W3168489096 cites W3122542623 @default.
- W3168489096 cites W3126536942 @default.
- W3168489096 cites W3126721948 @default.
- W3168489096 cites W3138516171 @default.
- W3168489096 cites W3147387781 @default.
- W3168489096 cites W3154596443 @default.
- W3168489096 cites W3156109214 @default.
- W3168489096 cites W3158846111 @default.
- W3168489096 cites W3159481202 @default.
- W3168489096 cites W3163465952 @default.
- W3168489096 cites W3165088525 @default.
- W3168489096 cites W3175958943 @default.
- W3168489096 cites W3214042613 @default.
- W3168489096 cites W3034194698 @default.
- W3168489096 doi "https://doi.org/10.48550/arxiv.2106.01548" @default.
- W3168489096 hasPublicationYear "2021" @default.
- W3168489096 type Work @default.
- W3168489096 sameAs 3168489096 @default.
- W3168489096 citedByCount "13" @default.
- W3168489096 countsByYear W31684890962021 @default.
- W3168489096 countsByYear W31684890962022 @default.
- W3168489096 crossrefType "posted-content" @default.
- W3168489096 hasAuthorship W3168489096A5010841999 @default.
- W3168489096 hasAuthorship W3168489096A5017319429 @default.
- W3168489096 hasAuthorship W3168489096A5044019481 @default.
- W3168489096 hasBestOaLocation W31684890961 @default.
- W3168489096 hasConcept C104317684 @default.
- W3168489096 hasConcept C114466953 @default.
- W3168489096 hasConcept C119857082 @default.
- W3168489096 hasConcept C121332964 @default.
- W3168489096 hasConcept C153180895 @default.
- W3168489096 hasConcept C154945302 @default.
- W3168489096 hasConcept C165801399 @default.
- W3168489096 hasConcept C185592680 @default.
- W3168489096 hasConcept C199360897 @default.
- W3168489096 hasConcept C203616005 @default.
- W3168489096 hasConcept C2776214188 @default.
- W3168489096 hasConcept C28826006 @default.
- W3168489096 hasConcept C33923547 @default.
- W3168489096 hasConcept C34736171 @default.
- W3168489096 hasConcept C41008148 @default.
- W3168489096 hasConcept C55493867 @default.
- W3168489096 hasConcept C62520636 @default.
- W3168489096 hasConcept C63479239 @default.
- W3168489096 hasConcept C66322947 @default.
- W3168489096 hasConceptScore W3168489096C104317684 @default.