Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386027557> ?p ?o ?g. }
- W4386027557 abstract "Vision Transformer (ViT) has demonstrated promising performance in various computer vision tasks, and recently attracted a lot of research attention. Many recent works have focused on proposing new architectures to improve ViT and deploying it into real-world applications. However, little effort has been made to analyze and understand ViT’s architecture design space and its implication of hardware-cost on different devices. In this work, by simply scaling ViT’s depth, width, input size, and other basic configurations, we show that a scaled vanilla ViT model without bells and whistles can achieve comparable or superior accuracy-efficiency trade-off than most of the latest ViT variants. Specifically, compared to DeiT-Tiny, our scaled model achieves a (uparrow 1.9% ) higher ImageNet top-1 accuracy under the same FLOPs and a (uparrow 3.7% ) better ImageNet top-1 accuracy under the same latency on an NVIDIA Edge GPU TX2. Motivated by this, we further investigate the extracted scaling strategies from the following two aspects: (1) “ can these scaling strategies be transferred across different real hardware devices ?”; and (2) “ can these scaling strategies be transferred to different ViT variants and tasks ?”. For (1), our exploration, based on various devices with different resource budgets, indicates that the transferability effectiveness depends on the underlying device together with its corresponding deployment tool; for (2), we validate the effective transferability of the aforementioned scaling strategies obtained from a vanilla ViT model on top of an image classification task to the PiT model, a strong ViT variant targeting efficiency, as well as object detection and video classification tasks. In particular, when transferred to PiT, our scaling strategies lead to a boosted ImageNet top-1 accuracy of from (74.6% ) to (76.7% ) ( (uparrow 2.1% ) ) under the same 0.7G FLOPs; and when transferred to the COCO object detection task, the average precision is boosted by (uparrow 0.7% ) under a similar throughput on a V100 GPU." @default.
- W4386027557 created "2023-08-22" @default.
- W4386027557 creator A5019582323 @default.
- W4386027557 creator A5046098787 @default.
- W4386027557 creator A5048668303 @default.
- W4386027557 creator A5057613852 @default.
- W4386027557 creator A5065172226 @default.
- W4386027557 creator A5073512879 @default.
- W4386027557 creator A5074743331 @default.
- W4386027557 creator A5080137972 @default.
- W4386027557 date "2023-08-21" @default.
- W4386027557 modified "2023-09-27" @default.
- W4386027557 title "An Investigation on Hardware-Aware Vision Transformer Scaling" @default.
- W4386027557 cites W1861492603 @default.
- W4386027557 cites W2108598243 @default.
- W4386027557 cites W2148633389 @default.
- W4386027557 cites W2194775991 @default.
- W4386027557 cites W2752782242 @default.
- W4386027557 cites W2892220819 @default.
- W4386027557 cites W2922509574 @default.
- W4386027557 cites W2962843773 @default.
- W4386027557 cites W2963163009 @default.
- W4386027557 cites W2982083293 @default.
- W4386027557 cites W3034429256 @default.
- W4386027557 cites W3034572008 @default.
- W4386027557 cites W3096609285 @default.
- W4386027557 cites W3109946440 @default.
- W4386027557 cites W3121523901 @default.
- W4386027557 cites W3131500599 @default.
- W4386027557 cites W3138516171 @default.
- W4386027557 cites W3168643403 @default.
- W4386027557 cites W4214588794 @default.
- W4386027557 cites W4214614183 @default.
- W4386027557 cites W4214634256 @default.
- W4386027557 doi "https://doi.org/10.1145/3611387" @default.
- W4386027557 hasPublicationYear "2023" @default.
- W4386027557 type Work @default.
- W4386027557 citedByCount "0" @default.
- W4386027557 crossrefType "journal-article" @default.
- W4386027557 hasAuthorship W4386027557A5019582323 @default.
- W4386027557 hasAuthorship W4386027557A5046098787 @default.
- W4386027557 hasAuthorship W4386027557A5048668303 @default.
- W4386027557 hasAuthorship W4386027557A5057613852 @default.
- W4386027557 hasAuthorship W4386027557A5065172226 @default.
- W4386027557 hasAuthorship W4386027557A5073512879 @default.
- W4386027557 hasAuthorship W4386027557A5074743331 @default.
- W4386027557 hasAuthorship W4386027557A5080137972 @default.
- W4386027557 hasBestOaLocation W43860275571 @default.
- W4386027557 hasConcept C105339364 @default.
- W4386027557 hasConcept C111919701 @default.
- W4386027557 hasConcept C113775141 @default.
- W4386027557 hasConcept C118524514 @default.
- W4386027557 hasConcept C119857082 @default.
- W4386027557 hasConcept C121332964 @default.
- W4386027557 hasConcept C140331021 @default.
- W4386027557 hasConcept C154945302 @default.
- W4386027557 hasConcept C165801399 @default.
- W4386027557 hasConcept C173608175 @default.
- W4386027557 hasConcept C186967261 @default.
- W4386027557 hasConcept C2524010 @default.
- W4386027557 hasConcept C33923547 @default.
- W4386027557 hasConcept C3826847 @default.
- W4386027557 hasConcept C41008148 @default.
- W4386027557 hasConcept C42935608 @default.
- W4386027557 hasConcept C61272859 @default.
- W4386027557 hasConcept C62520636 @default.
- W4386027557 hasConcept C66322947 @default.
- W4386027557 hasConcept C76155785 @default.
- W4386027557 hasConcept C82876162 @default.
- W4386027557 hasConcept C9390403 @default.
- W4386027557 hasConcept C99844830 @default.
- W4386027557 hasConceptScore W4386027557C105339364 @default.
- W4386027557 hasConceptScore W4386027557C111919701 @default.
- W4386027557 hasConceptScore W4386027557C113775141 @default.
- W4386027557 hasConceptScore W4386027557C118524514 @default.
- W4386027557 hasConceptScore W4386027557C119857082 @default.
- W4386027557 hasConceptScore W4386027557C121332964 @default.
- W4386027557 hasConceptScore W4386027557C140331021 @default.
- W4386027557 hasConceptScore W4386027557C154945302 @default.
- W4386027557 hasConceptScore W4386027557C165801399 @default.
- W4386027557 hasConceptScore W4386027557C173608175 @default.
- W4386027557 hasConceptScore W4386027557C186967261 @default.
- W4386027557 hasConceptScore W4386027557C2524010 @default.
- W4386027557 hasConceptScore W4386027557C33923547 @default.
- W4386027557 hasConceptScore W4386027557C3826847 @default.
- W4386027557 hasConceptScore W4386027557C41008148 @default.
- W4386027557 hasConceptScore W4386027557C42935608 @default.
- W4386027557 hasConceptScore W4386027557C61272859 @default.
- W4386027557 hasConceptScore W4386027557C62520636 @default.
- W4386027557 hasConceptScore W4386027557C66322947 @default.
- W4386027557 hasConceptScore W4386027557C76155785 @default.
- W4386027557 hasConceptScore W4386027557C82876162 @default.
- W4386027557 hasConceptScore W4386027557C9390403 @default.
- W4386027557 hasConceptScore W4386027557C99844830 @default.
- W4386027557 hasLocation W43860275571 @default.
- W4386027557 hasOpenAccess W4386027557 @default.
- W4386027557 hasPrimaryLocation W43860275571 @default.
- W4386027557 hasRelatedWork W2060736133 @default.
- W4386027557 hasRelatedWork W2063534976 @default.
- W4386027557 hasRelatedWork W2625954420 @default.