Matches in SemOpenAlex for { <https://semopenalex.org/work/W4225575578> ?p ?o ?g. }
Showing items 1 to 47 of
47
with 100 items per page.
- W4225575578 abstract "Video transformers have recently emerged as an effective alternative to convolutional networks for action classification. However, most prior video transformers adopt either global space-time attention or hand-defined strategies to compare patches within and across frames. These fixed attention schemes not only have high computational cost but, by comparing patches at predetermined locations, they neglect the motion dynamics in the video. In this paper, we introduce the Deformable Video Transformer (DVT), which dynamically predicts a small subset of video patches to attend for each query location based on motion information, thus allowing the model to decide where to look in the video based on correspondences across frames. Crucially, these motion-based correspondences are obtained at zero-cost from information stored in the compressed format of the video. Our deformable attention mechanism is optimised directly with respect to classification performance, thus eliminating the need for suboptimal hand-design of attention strategies. Experiments on four large-scale video benchmarks (Kinetics-400, Something-Something-V2, EPIC-KITCHENS and Diving-48) demonstrate that, compared to existing video transformers, our model achieves higher accuracy at the same or lower computational cost, and it attains state-of-the-art results on these four datasets." @default.
- W4225575578 created "2022-05-05" @default.
- W4225575578 creator A5076036593 @default.
- W4225575578 creator A5082736347 @default.
- W4225575578 date "2022-03-31" @default.
- W4225575578 modified "2023-10-17" @default.
- W4225575578 title "Deformable Video Transformer" @default.
- W4225575578 doi "https://doi.org/10.48550/arxiv.2203.16795" @default.
- W4225575578 hasPublicationYear "2022" @default.
- W4225575578 type Work @default.
- W4225575578 citedByCount "0" @default.
- W4225575578 crossrefType "posted-content" @default.
- W4225575578 hasAuthorship W4225575578A5076036593 @default.
- W4225575578 hasAuthorship W4225575578A5082736347 @default.
- W4225575578 hasBestOaLocation W42255755781 @default.
- W4225575578 hasConcept C121332964 @default.
- W4225575578 hasConcept C128840427 @default.
- W4225575578 hasConcept C154945302 @default.
- W4225575578 hasConcept C165801399 @default.
- W4225575578 hasConcept C31972630 @default.
- W4225575578 hasConcept C41008148 @default.
- W4225575578 hasConcept C62520636 @default.
- W4225575578 hasConcept C66322947 @default.
- W4225575578 hasConceptScore W4225575578C121332964 @default.
- W4225575578 hasConceptScore W4225575578C128840427 @default.
- W4225575578 hasConceptScore W4225575578C154945302 @default.
- W4225575578 hasConceptScore W4225575578C165801399 @default.
- W4225575578 hasConceptScore W4225575578C31972630 @default.
- W4225575578 hasConceptScore W4225575578C41008148 @default.
- W4225575578 hasConceptScore W4225575578C62520636 @default.
- W4225575578 hasConceptScore W4225575578C66322947 @default.
- W4225575578 hasLocation W42255755781 @default.
- W4225575578 hasOpenAccess W4225575578 @default.
- W4225575578 hasPrimaryLocation W42255755781 @default.
- W4225575578 hasRelatedWork W2046284878 @default.
- W4225575578 hasRelatedWork W2063377350 @default.
- W4225575578 hasRelatedWork W2131710332 @default.
- W4225575578 hasRelatedWork W2158825824 @default.
- W4225575578 hasRelatedWork W2311002319 @default.
- W4225575578 hasRelatedWork W2352862238 @default.
- W4225575578 hasRelatedWork W2363279238 @default.
- W4225575578 hasRelatedWork W4206673776 @default.
- W4225575578 hasRelatedWork W578691330 @default.
- W4225575578 hasRelatedWork W851866491 @default.
- W4225575578 isParatext "false" @default.
- W4225575578 isRetracted "false" @default.
- W4225575578 workType "article" @default.