Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386076661> ?p ?o ?g. }
- W4386076661 abstract "Mainstream Video-Language Pre-training (VLP) models [10, 26, 64] consist of three parts, a video encoder, a text encoder, and a video-text fusion Transformer. They pursue better performance via utilizing heavier unimodal encoders or multimodal fusion Transformers, resulting in increased parameters with lower efficiency in downstream tasks. In this work, we for the first time introduce an end-to-end VLP model, namely all-in-one Transformer, that embeds raw video and textual signals into joint representations using a unified backbone architecture. We argue that the unique temporal information of video data turns out to be a key barrier hindering the design of a modality-agnostic Transformer. To overcome the challenge, we introduce a novel and effective token rolling operation to encode temporal representations from video clips in a non-parametric manner. The careful design enables the representation learning of both video-text multimodal inputs and unimodal inputs using a unified model. Our pretrained ali-in-one Transformer is transferred to various downstream video-text tasks after fine-tuning, including text-video retrieval, video-question answering, multiple choice and video captioning. State-of-the-art performances with the minimal model FLOPs on ten datasets demonstrate the superiority of our method compared to the competitive counterparts. The code and pretrained models are available at https://github.com/showlab/all-in-one." @default.
- W4386076661 created "2023-08-23" @default.
- W4386076661 creator A5001949272 @default.
- W4386076661 creator A5017578261 @default.
- W4386076661 creator A5035927942 @default.
- W4386076661 creator A5047802386 @default.
- W4386076661 creator A5054349864 @default.
- W4386076661 creator A5057256093 @default.
- W4386076661 creator A5062830848 @default.
- W4386076661 creator A5067244216 @default.
- W4386076661 creator A5068937750 @default.
- W4386076661 creator A5072608526 @default.
- W4386076661 creator A5083481253 @default.
- W4386076661 creator A5088475144 @default.
- W4386076661 date "2023-06-01" @default.
- W4386076661 modified "2023-10-06" @default.
- W4386076661 title "All in One: Exploring Unified Video-Language Pre-Training" @default.
- W4386076661 cites W2277195237 @default.
- W4386076661 cites W2425121537 @default.
- W4386076661 cites W2606982687 @default.
- W4386076661 cites W2765716052 @default.
- W4386076661 cites W2886641317 @default.
- W4386076661 cites W2963524571 @default.
- W4386076661 cites W2963916161 @default.
- W4386076661 cites W2969127500 @default.
- W4386076661 cites W2981851019 @default.
- W4386076661 cites W2984008963 @default.
- W4386076661 cites W2990152177 @default.
- W4386076661 cites W2997805943 @default.
- W4386076661 cites W3035265375 @default.
- W4386076661 cites W3035635319 @default.
- W4386076661 cites W3168640669 @default.
- W4386076661 cites W3176398504 @default.
- W4386076661 cites W3197457832 @default.
- W4386076661 cites W3204588463 @default.
- W4386076661 cites W3204670646 @default.
- W4386076661 cites W4312372834 @default.
- W4386076661 cites W4312864639 @default.
- W4386076661 doi "https://doi.org/10.1109/cvpr52729.2023.00638" @default.
- W4386076661 hasPublicationYear "2023" @default.
- W4386076661 type Work @default.
- W4386076661 citedByCount "2" @default.
- W4386076661 countsByYear W43860766612023 @default.
- W4386076661 crossrefType "proceedings-article" @default.
- W4386076661 hasAuthorship W4386076661A5001949272 @default.
- W4386076661 hasAuthorship W4386076661A5017578261 @default.
- W4386076661 hasAuthorship W4386076661A5035927942 @default.
- W4386076661 hasAuthorship W4386076661A5047802386 @default.
- W4386076661 hasAuthorship W4386076661A5054349864 @default.
- W4386076661 hasAuthorship W4386076661A5057256093 @default.
- W4386076661 hasAuthorship W4386076661A5062830848 @default.
- W4386076661 hasAuthorship W4386076661A5067244216 @default.
- W4386076661 hasAuthorship W4386076661A5068937750 @default.
- W4386076661 hasAuthorship W4386076661A5072608526 @default.
- W4386076661 hasAuthorship W4386076661A5083481253 @default.
- W4386076661 hasAuthorship W4386076661A5088475144 @default.
- W4386076661 hasConcept C104317684 @default.
- W4386076661 hasConcept C111919701 @default.
- W4386076661 hasConcept C115961682 @default.
- W4386076661 hasConcept C118505674 @default.
- W4386076661 hasConcept C121332964 @default.
- W4386076661 hasConcept C137293760 @default.
- W4386076661 hasConcept C154945302 @default.
- W4386076661 hasConcept C157657479 @default.
- W4386076661 hasConcept C165801399 @default.
- W4386076661 hasConcept C185592680 @default.
- W4386076661 hasConcept C28490314 @default.
- W4386076661 hasConcept C38652104 @default.
- W4386076661 hasConcept C41008148 @default.
- W4386076661 hasConcept C44291984 @default.
- W4386076661 hasConcept C48145219 @default.
- W4386076661 hasConcept C55493867 @default.
- W4386076661 hasConcept C62520636 @default.
- W4386076661 hasConcept C66322947 @default.
- W4386076661 hasConcept C66746571 @default.
- W4386076661 hasConceptScore W4386076661C104317684 @default.
- W4386076661 hasConceptScore W4386076661C111919701 @default.
- W4386076661 hasConceptScore W4386076661C115961682 @default.
- W4386076661 hasConceptScore W4386076661C118505674 @default.
- W4386076661 hasConceptScore W4386076661C121332964 @default.
- W4386076661 hasConceptScore W4386076661C137293760 @default.
- W4386076661 hasConceptScore W4386076661C154945302 @default.
- W4386076661 hasConceptScore W4386076661C157657479 @default.
- W4386076661 hasConceptScore W4386076661C165801399 @default.
- W4386076661 hasConceptScore W4386076661C185592680 @default.
- W4386076661 hasConceptScore W4386076661C28490314 @default.
- W4386076661 hasConceptScore W4386076661C38652104 @default.
- W4386076661 hasConceptScore W4386076661C41008148 @default.
- W4386076661 hasConceptScore W4386076661C44291984 @default.
- W4386076661 hasConceptScore W4386076661C48145219 @default.
- W4386076661 hasConceptScore W4386076661C55493867 @default.
- W4386076661 hasConceptScore W4386076661C62520636 @default.
- W4386076661 hasConceptScore W4386076661C66322947 @default.
- W4386076661 hasConceptScore W4386076661C66746571 @default.
- W4386076661 hasLocation W43860766611 @default.
- W4386076661 hasOpenAccess W4386076661 @default.
- W4386076661 hasPrimaryLocation W43860766611 @default.
- W4386076661 hasRelatedWork W2547835662 @default.
- W4386076661 hasRelatedWork W2975706270 @default.
- W4386076661 hasRelatedWork W3098382480 @default.