Matches in SemOpenAlex for { <https://semopenalex.org/work/W4300455064> ?p ?o ?g. }
Showing items 1 to 61 of
61
with 100 items per page.
- W4300455064 abstract "Video recognition has been dominated by the end-to-end learning paradigm -- first initializing a video recognition model with weights of a pretrained image model and then conducting end-to-end training on videos. This enables the video network to benefit from the pretrained image model. However, this requires substantial computation and memory resources for finetuning on videos and the alternative of directly using pretrained image features without finetuning the image backbone leads to subpar results. Fortunately, recent advances in Contrastive Vision-Language Pre-training (CLIP) pave the way for a new route for visual recognition tasks. Pretrained on large open-vocabulary image-text pair data, these models learn powerful visual representations with rich semantics. In this paper, we present Efficient Video Learning (EVL) -- an efficient framework for directly training high-quality video recognition models with frozen CLIP features. Specifically, we employ a lightweight Transformer decoder and learn a query token to dynamically collect frame-level spatial features from the CLIP image encoder. Furthermore, we adopt a local temporal module in each decoder layer to discover temporal clues from adjacent frames and their attention maps. We show that despite being efficient to train with a frozen backbone, our models learn high quality video representations on a variety of video recognition datasets. Code is available at https://github.com/OpenGVLab/efficient-video-recognition." @default.
- W4300455064 created "2022-10-03" @default.
- W4300455064 creator A5026944066 @default.
- W4300455064 creator A5027948034 @default.
- W4300455064 creator A5065073978 @default.
- W4300455064 creator A5072793702 @default.
- W4300455064 creator A5072861783 @default.
- W4300455064 creator A5075057638 @default.
- W4300455064 creator A5079410647 @default.
- W4300455064 creator A5080973846 @default.
- W4300455064 creator A5085818578 @default.
- W4300455064 date "2022-08-06" @default.
- W4300455064 modified "2023-09-27" @default.
- W4300455064 title "Frozen CLIP Models are Efficient Video Learners" @default.
- W4300455064 doi "https://doi.org/10.48550/arxiv.2208.03550" @default.
- W4300455064 hasPublicationYear "2022" @default.
- W4300455064 type Work @default.
- W4300455064 citedByCount "0" @default.
- W4300455064 crossrefType "posted-content" @default.
- W4300455064 hasAuthorship W4300455064A5026944066 @default.
- W4300455064 hasAuthorship W4300455064A5027948034 @default.
- W4300455064 hasAuthorship W4300455064A5065073978 @default.
- W4300455064 hasAuthorship W4300455064A5072793702 @default.
- W4300455064 hasAuthorship W4300455064A5072861783 @default.
- W4300455064 hasAuthorship W4300455064A5075057638 @default.
- W4300455064 hasAuthorship W4300455064A5079410647 @default.
- W4300455064 hasAuthorship W4300455064A5080973846 @default.
- W4300455064 hasAuthorship W4300455064A5085818578 @default.
- W4300455064 hasBestOaLocation W43004550641 @default.
- W4300455064 hasConcept C111919701 @default.
- W4300455064 hasConcept C118505674 @default.
- W4300455064 hasConcept C138885662 @default.
- W4300455064 hasConcept C154945302 @default.
- W4300455064 hasConcept C2777601683 @default.
- W4300455064 hasConcept C31972630 @default.
- W4300455064 hasConcept C41008148 @default.
- W4300455064 hasConcept C41895202 @default.
- W4300455064 hasConceptScore W4300455064C111919701 @default.
- W4300455064 hasConceptScore W4300455064C118505674 @default.
- W4300455064 hasConceptScore W4300455064C138885662 @default.
- W4300455064 hasConceptScore W4300455064C154945302 @default.
- W4300455064 hasConceptScore W4300455064C2777601683 @default.
- W4300455064 hasConceptScore W4300455064C31972630 @default.
- W4300455064 hasConceptScore W4300455064C41008148 @default.
- W4300455064 hasConceptScore W4300455064C41895202 @default.
- W4300455064 hasLocation W43004550641 @default.
- W4300455064 hasOpenAccess W4300455064 @default.
- W4300455064 hasPrimaryLocation W43004550641 @default.
- W4300455064 hasRelatedWork W1891287906 @default.
- W4300455064 hasRelatedWork W1969923398 @default.
- W4300455064 hasRelatedWork W2021592657 @default.
- W4300455064 hasRelatedWork W2036807459 @default.
- W4300455064 hasRelatedWork W2229312674 @default.
- W4300455064 hasRelatedWork W2275988210 @default.
- W4300455064 hasRelatedWork W2356875448 @default.
- W4300455064 hasRelatedWork W2755342338 @default.
- W4300455064 hasRelatedWork W2772917594 @default.
- W4300455064 hasRelatedWork W3116076068 @default.
- W4300455064 isParatext "false" @default.
- W4300455064 isRetracted "false" @default.
- W4300455064 workType "article" @default.