Matches in SemOpenAlex for { <https://semopenalex.org/work/W4315705623> ?p ?o ?g. }
- W4315705623 endingPage "1162" @default.
- W4315705623 startingPage "1141" @default.
- W4315705623 abstract "Vision transformers have shown great potential in various computer vision tasks owing to their strong capability to model long-range dependency using the self-attention mechanism. Nevertheless, they treat an image as a 1D sequence of visual tokens, lacking an intrinsic inductive bias (IB) in modeling local visual structures and dealing with scale variance, which is instead learned implicitly from large-scale training data with longer training schedules. In this paper, we leverage the two IBs and propose the ViTAE transformer, which utilizes a reduction cell for multi-scale feature and a normal cell for locality. The two kinds of cells are stacked in both isotropic and multi-stage manners to formulate two families of ViTAE models, i.e., the vanilla ViTAE and ViTAEv2. Experiments on the ImageNet dataset as well as downstream tasks on the MS COCO, ADE20K, and AP10K datasets validate the superiority of our models over the baseline and representative models. Besides, we scale up our ViTAE model to 644 M parameters and obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 classification accuracy on ImageNet Real validation set, without using extra private data. It demonstrates that the introduced inductive bias still helps when the model size becomes large. The source code and pretrained models are publicly available atcode." @default.
- W4315705623 created "2023-01-12" @default.
- W4315705623 creator A5001819736 @default.
- W4315705623 creator A5019450682 @default.
- W4315705623 creator A5032383728 @default.
- W4315705623 creator A5042154277 @default.
- W4315705623 date "2023-01-12" @default.
- W4315705623 modified "2023-10-18" @default.
- W4315705623 title "ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond" @default.
- W4315705623 cites W1677409904 @default.
- W4315705623 cites W1849277567 @default.
- W4315705623 cites W1861492603 @default.
- W4315705623 cites W1977295328 @default.
- W4315705623 cites W2016163169 @default.
- W4315705623 cites W2097117768 @default.
- W4315705623 cites W2108598243 @default.
- W4315705623 cites W2109255472 @default.
- W4315705623 cites W2109773745 @default.
- W4315705623 cites W2117228865 @default.
- W4315705623 cites W2138011018 @default.
- W4315705623 cites W2143238378 @default.
- W4315705623 cites W2183341477 @default.
- W4315705623 cites W2194775991 @default.
- W4315705623 cites W2507296351 @default.
- W4315705623 cites W2533598788 @default.
- W4315705623 cites W2560023338 @default.
- W4315705623 cites W2565639579 @default.
- W4315705623 cites W2607041014 @default.
- W4315705623 cites W2737258237 @default.
- W4315705623 cites W2884822772 @default.
- W4315705623 cites W2919115771 @default.
- W4315705623 cites W2962816068 @default.
- W4315705623 cites W2962850830 @default.
- W4315705623 cites W2962858109 @default.
- W4315705623 cites W2963150697 @default.
- W4315705623 cites W2963163009 @default.
- W4315705623 cites W2963402313 @default.
- W4315705623 cites W2963446712 @default.
- W4315705623 cites W2963563573 @default.
- W4315705623 cites W2964241181 @default.
- W4315705623 cites W2964350391 @default.
- W4315705623 cites W2989676862 @default.
- W4315705623 cites W3034429256 @default.
- W4315705623 cites W3097217077 @default.
- W4315705623 cites W3109241881 @default.
- W4315705623 cites W3121523901 @default.
- W4315705623 cites W3131500599 @default.
- W4315705623 cites W3136416617 @default.
- W4315705623 cites W3137278571 @default.
- W4315705623 cites W3138516171 @default.
- W4315705623 cites W3139633126 @default.
- W4315705623 cites W3145185940 @default.
- W4315705623 cites W3145450063 @default.
- W4315705623 cites W3151130473 @default.
- W4315705623 cites W3160694286 @default.
- W4315705623 cites W3170841864 @default.
- W4315705623 cites W3172942063 @default.
- W4315705623 cites W3175515048 @default.
- W4315705623 cites W3180562345 @default.
- W4315705623 cites W3190492058 @default.
- W4315705623 cites W4214493665 @default.
- W4315705623 cites W4214588794 @default.
- W4315705623 cites W4214614183 @default.
- W4315705623 cites W4214636423 @default.
- W4315705623 cites W4231697575 @default.
- W4315705623 cites W4312257978 @default.
- W4315705623 cites W4312312750 @default.
- W4315705623 cites W4312349930 @default.
- W4315705623 cites W4312804044 @default.
- W4315705623 cites W4312820606 @default.
- W4315705623 cites W4312960790 @default.
- W4315705623 cites W4312977443 @default.
- W4315705623 cites W4313156423 @default.
- W4315705623 doi "https://doi.org/10.1007/s11263-022-01739-w" @default.
- W4315705623 hasPublicationYear "2023" @default.
- W4315705623 type Work @default.
- W4315705623 citedByCount "26" @default.
- W4315705623 countsByYear W43157056232022 @default.
- W4315705623 countsByYear W43157056232023 @default.
- W4315705623 crossrefType "journal-article" @default.
- W4315705623 hasAuthorship W4315705623A5001819736 @default.
- W4315705623 hasAuthorship W4315705623A5019450682 @default.
- W4315705623 hasAuthorship W4315705623A5032383728 @default.
- W4315705623 hasAuthorship W4315705623A5042154277 @default.
- W4315705623 hasBestOaLocation W43157056232 @default.
- W4315705623 hasConcept C119857082 @default.
- W4315705623 hasConcept C121332964 @default.
- W4315705623 hasConcept C138885662 @default.
- W4315705623 hasConcept C153083717 @default.
- W4315705623 hasConcept C153180895 @default.
- W4315705623 hasConcept C154945302 @default.
- W4315705623 hasConcept C162324750 @default.
- W4315705623 hasConcept C165801399 @default.
- W4315705623 hasConcept C187736073 @default.
- W4315705623 hasConcept C197352929 @default.
- W4315705623 hasConcept C2779808786 @default.
- W4315705623 hasConcept C2780451532 @default.
- W4315705623 hasConcept C28006648 @default.