Matches in SemOpenAlex for { <https://semopenalex.org/work/W4285483923> ?p ?o ?g. }
Showing items 1 to 82 of
82
with 100 items per page.
- W4285483923 abstract "Due to the complex attention mechanisms and model design, most existing vision Transformers (ViTs) can not perform as efficiently as convolutional neural networks (CNNs) in realistic industrial deployment scenarios, e.g. TensorRT and CoreML. This poses a distinct challenge: Can a visual neural network be designed to infer as fast as CNNs and perform as powerful as ViTs? Recent works have tried to design CNN-Transformer hybrid architectures to address this issue, yet the overall performance of these works is far away from satisfactory. To end these, we propose a next generation vision Transformer for efficient deployment in realistic industrial scenarios, namely Next-ViT, which dominates both CNNs and ViTs from the perspective of latency/accuracy trade-off. In this work, the Next Convolution Block (NCB) and Next Transformer Block (NTB) are respectively developed to capture local and global information with deployment-friendly mechanisms. Then, Next Hybrid Strategy (NHS) is designed to stack NCB and NTB in an efficient hybrid paradigm, which boosts performance in various downstream tasks. Extensive experiments show that Next-ViT significantly outperforms existing CNNs, ViTs and CNN-Transformer hybrid architectures with respect to the latency/accuracy trade-off across various vision tasks. On TensorRT, Next-ViT surpasses ResNet by 5.5 mAP (from 40.4 to 45.9) on COCO detection and 7.7% mIoU (from 38.8% to 46.5%) on ADE20K segmentation under similar latency. Meanwhile, it achieves comparable performance with CSWin, while the inference speed is accelerated by 3.6x. On CoreML, Next-ViT surpasses EfficientFormer by 4.6 mAP (from 42.6 to 47.2) on COCO detection and 3.5% mIoU (from 45.1% to 48.6%) on ADE20K segmentation under similar latency. Our code and models are made public at: https://github.com/bytedance/Next-ViT" @default.
- W4285483923 created "2022-07-15" @default.
- W4285483923 creator A5000432967 @default.
- W4285483923 creator A5014851555 @default.
- W4285483923 creator A5029851581 @default.
- W4285483923 creator A5034191292 @default.
- W4285483923 creator A5035873669 @default.
- W4285483923 creator A5069358349 @default.
- W4285483923 creator A5070812231 @default.
- W4285483923 creator A5083822275 @default.
- W4285483923 creator A5089480544 @default.
- W4285483923 date "2022-07-12" @default.
- W4285483923 modified "2023-09-28" @default.
- W4285483923 title "Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios" @default.
- W4285483923 doi "https://doi.org/10.48550/arxiv.2207.05501" @default.
- W4285483923 hasPublicationYear "2022" @default.
- W4285483923 type Work @default.
- W4285483923 citedByCount "1" @default.
- W4285483923 countsByYear W42854839232023 @default.
- W4285483923 crossrefType "posted-content" @default.
- W4285483923 hasAuthorship W4285483923A5000432967 @default.
- W4285483923 hasAuthorship W4285483923A5014851555 @default.
- W4285483923 hasAuthorship W4285483923A5029851581 @default.
- W4285483923 hasAuthorship W4285483923A5034191292 @default.
- W4285483923 hasAuthorship W4285483923A5035873669 @default.
- W4285483923 hasAuthorship W4285483923A5069358349 @default.
- W4285483923 hasAuthorship W4285483923A5070812231 @default.
- W4285483923 hasAuthorship W4285483923A5083822275 @default.
- W4285483923 hasAuthorship W4285483923A5089480544 @default.
- W4285483923 hasBestOaLocation W42854839231 @default.
- W4285483923 hasConcept C105339364 @default.
- W4285483923 hasConcept C108583219 @default.
- W4285483923 hasConcept C113775141 @default.
- W4285483923 hasConcept C115903868 @default.
- W4285483923 hasConcept C118524514 @default.
- W4285483923 hasConcept C119599485 @default.
- W4285483923 hasConcept C119857082 @default.
- W4285483923 hasConcept C127413603 @default.
- W4285483923 hasConcept C154945302 @default.
- W4285483923 hasConcept C165801399 @default.
- W4285483923 hasConcept C2776214188 @default.
- W4285483923 hasConcept C41008148 @default.
- W4285483923 hasConcept C66322947 @default.
- W4285483923 hasConcept C76155785 @default.
- W4285483923 hasConcept C79403827 @default.
- W4285483923 hasConcept C81363708 @default.
- W4285483923 hasConcept C82876162 @default.
- W4285483923 hasConcept C89600930 @default.
- W4285483923 hasConceptScore W4285483923C105339364 @default.
- W4285483923 hasConceptScore W4285483923C108583219 @default.
- W4285483923 hasConceptScore W4285483923C113775141 @default.
- W4285483923 hasConceptScore W4285483923C115903868 @default.
- W4285483923 hasConceptScore W4285483923C118524514 @default.
- W4285483923 hasConceptScore W4285483923C119599485 @default.
- W4285483923 hasConceptScore W4285483923C119857082 @default.
- W4285483923 hasConceptScore W4285483923C127413603 @default.
- W4285483923 hasConceptScore W4285483923C154945302 @default.
- W4285483923 hasConceptScore W4285483923C165801399 @default.
- W4285483923 hasConceptScore W4285483923C2776214188 @default.
- W4285483923 hasConceptScore W4285483923C41008148 @default.
- W4285483923 hasConceptScore W4285483923C66322947 @default.
- W4285483923 hasConceptScore W4285483923C76155785 @default.
- W4285483923 hasConceptScore W4285483923C79403827 @default.
- W4285483923 hasConceptScore W4285483923C81363708 @default.
- W4285483923 hasConceptScore W4285483923C82876162 @default.
- W4285483923 hasConceptScore W4285483923C89600930 @default.
- W4285483923 hasLocation W42854839231 @default.
- W4285483923 hasOpenAccess W4285483923 @default.
- W4285483923 hasPrimaryLocation W42854839231 @default.
- W4285483923 hasRelatedWork W2415731916 @default.
- W4285483923 hasRelatedWork W2731899572 @default.
- W4285483923 hasRelatedWork W2790662084 @default.
- W4285483923 hasRelatedWork W2795329967 @default.
- W4285483923 hasRelatedWork W2920938200 @default.
- W4285483923 hasRelatedWork W3102253946 @default.
- W4285483923 hasRelatedWork W3215138031 @default.
- W4285483923 hasRelatedWork W4206841102 @default.
- W4285483923 hasRelatedWork W4225300973 @default.
- W4285483923 hasRelatedWork W4226289457 @default.
- W4285483923 isParatext "false" @default.
- W4285483923 isRetracted "false" @default.
- W4285483923 workType "article" @default.