Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386995506> ?p ?o ?g. }
Showing items 1 to 85 of
85
with 100 items per page.
- W4386995506 endingPage "10633" @default.
- W4386995506 startingPage "10633" @default.
- W4386995506 abstract "In the domain of video recognition, video transformers have demonstrated remarkable performance, albeit at significant computational cost. This paper introduces TSNet, an innovative approach for dynamically selecting informative tokens from given video samples. The proposed method involves a lightweight prediction module that assigns importance scores to each token in the video. Tokens with top scores are then utilized for self-attention computation. We apply the Gumbel-softmax technique to sample from the output of the prediction module, enabling end-to-end optimization of the prediction module. We aim to extend our method on hierarchical vision transformers rather than single-scale vision transformers. We use a simple linear module to project the pruned tokens, and the projected result is then concatenated with the output of the self-attention network to maintain the same number of tokens while capturing interactions with the selected tokens. Since feedforward networks (FFNs) contribute significant computation, we also propose linear projection for the pruned tokens to accelerate the model, and the existing FFN layer progresses the selected tokens. Finally, in order to ensure that the structure of the output remains unchanged, the two groups of tokens are reassembled based on their spatial positions in the original feature map. The experiments conducted primarily focus on the Kinetics-400 dataset using UniFormer, a hierarchical video transformer backbone that incorporates convolution in its self-attention block. Our model demonstrates comparable results to the original model while reducing computation by over 13%. Notably, by hierarchically pruning 70% of input tokens, our approach significantly decreases 55.5% of the FLOPs, while the decline in accuracy is confined to 2%. Additional testing of wide applicability and adaptability with other transformers such as the Video Swin Transformer was also performed and indicated its progressive potentials in video recognition benchmarks. By implementing our token sparsification framework, video vision transformers can achieve a remarkable balance between enhanced computational speed and a slight reduction in accuracy." @default.
- W4386995506 created "2023-09-25" @default.
- W4386995506 creator A5038766727 @default.
- W4386995506 creator A5056694985 @default.
- W4386995506 creator A5058899863 @default.
- W4386995506 date "2023-09-24" @default.
- W4386995506 modified "2023-09-27" @default.
- W4386995506 title "TSNet: Token Sparsification for Efficient Video Transformer" @default.
- W4386995506 cites W1522734439 @default.
- W4386995506 cites W2799176631 @default.
- W4386995506 cites W2918626955 @default.
- W4386995506 cites W2963155035 @default.
- W4386995506 cites W2963370182 @default.
- W4386995506 cites W2963524571 @default.
- W4386995506 cites W2963526497 @default.
- W4386995506 cites W2963820951 @default.
- W4386995506 cites W3034572008 @default.
- W4386995506 cites W3035303837 @default.
- W4386995506 cites W3138516171 @default.
- W4386995506 cites W3173621652 @default.
- W4386995506 cites W3210279979 @default.
- W4386995506 cites W4214612132 @default.
- W4386995506 cites W4214614183 @default.
- W4386995506 cites W4226407477 @default.
- W4386995506 cites W4312290555 @default.
- W4386995506 cites W4312340826 @default.
- W4386995506 cites W4312560592 @default.
- W4386995506 cites W4312849330 @default.
- W4386995506 cites W4312947882 @default.
- W4386995506 cites W4362500802 @default.
- W4386995506 doi "https://doi.org/10.3390/app131910633" @default.
- W4386995506 hasPublicationYear "2023" @default.
- W4386995506 type Work @default.
- W4386995506 citedByCount "0" @default.
- W4386995506 crossrefType "journal-article" @default.
- W4386995506 hasAuthorship W4386995506A5038766727 @default.
- W4386995506 hasAuthorship W4386995506A5056694985 @default.
- W4386995506 hasAuthorship W4386995506A5058899863 @default.
- W4386995506 hasBestOaLocation W43869955061 @default.
- W4386995506 hasConcept C11413529 @default.
- W4386995506 hasConcept C121332964 @default.
- W4386995506 hasConcept C153180895 @default.
- W4386995506 hasConcept C154945302 @default.
- W4386995506 hasConcept C165801399 @default.
- W4386995506 hasConcept C188441871 @default.
- W4386995506 hasConcept C31258907 @default.
- W4386995506 hasConcept C41008148 @default.
- W4386995506 hasConcept C45374587 @default.
- W4386995506 hasConcept C48145219 @default.
- W4386995506 hasConcept C50644808 @default.
- W4386995506 hasConcept C62520636 @default.
- W4386995506 hasConcept C66322947 @default.
- W4386995506 hasConceptScore W4386995506C11413529 @default.
- W4386995506 hasConceptScore W4386995506C121332964 @default.
- W4386995506 hasConceptScore W4386995506C153180895 @default.
- W4386995506 hasConceptScore W4386995506C154945302 @default.
- W4386995506 hasConceptScore W4386995506C165801399 @default.
- W4386995506 hasConceptScore W4386995506C188441871 @default.
- W4386995506 hasConceptScore W4386995506C31258907 @default.
- W4386995506 hasConceptScore W4386995506C41008148 @default.
- W4386995506 hasConceptScore W4386995506C45374587 @default.
- W4386995506 hasConceptScore W4386995506C48145219 @default.
- W4386995506 hasConceptScore W4386995506C50644808 @default.
- W4386995506 hasConceptScore W4386995506C62520636 @default.
- W4386995506 hasConceptScore W4386995506C66322947 @default.
- W4386995506 hasIssue "19" @default.
- W4386995506 hasLocation W43869955061 @default.
- W4386995506 hasOpenAccess W4386995506 @default.
- W4386995506 hasPrimaryLocation W43869955061 @default.
- W4386995506 hasRelatedWork W2610906757 @default.
- W4386995506 hasRelatedWork W2743258233 @default.
- W4386995506 hasRelatedWork W2888789309 @default.
- W4386995506 hasRelatedWork W2921182884 @default.
- W4386995506 hasRelatedWork W2938746851 @default.
- W4386995506 hasRelatedWork W2997969508 @default.
- W4386995506 hasRelatedWork W3208883981 @default.
- W4386995506 hasRelatedWork W4307834408 @default.
- W4386995506 hasRelatedWork W4320925816 @default.
- W4386995506 hasRelatedWork W4321091470 @default.
- W4386995506 hasVolume "13" @default.
- W4386995506 isParatext "false" @default.
- W4386995506 isRetracted "false" @default.
- W4386995506 workType "article" @default.