Matches in SemOpenAlex for { <https://semopenalex.org/work/W4313186260> ?p ?o ?g. }
Showing items 1 to 96 of
96
with 100 items per page.
- W4313186260 abstract "Yidco-and-language pre-training has shown promising improvements on various downstream tasks. Most previous methods capture cross-modal interactions with a standard transformer-based multimodal encoder, not fully addressing the misalignment between unimodal video and text features. Besides, learning finegrained visual-language alignment usually requires off-the-shelf object detectors to provide object information, which is bottlenecked by the detector's limited vocabulary and expensive computation cost. In this paper, we propose Align and Prompt: a new video-and-language pre-training framework (AlPro), which operates on sparsely-sampled video frames and achieves more effective cross-modal alignment without explicit object detectors. First, we introduce a video-text contrastive (VTC) loss to align unimodal video-text features at the instance level, which eases the modeling of cross-modal interactions. Then, we propose a novel visually-grounded pre-training task, prompting entity modeling (PEM), which learns finegrained alignment between visual region and text entity via an entity prompter module in a self-supervised way. Finally, we pretrain the video-and-language transformer models on large webly-source video-text pairs using the proposed VTC and PEM losses as well as two standard losses of masked language modeling (MLM) and video-text matching (VTM). The resulting pre-trained model achieves state-of-the-art performance on both text-video retrieval and videoQA, outperforming prior work by a substantial margin. Implementation and pre-trained models are available at https://github.com/salesforce/ALPRO." @default.
- W4313186260 created "2023-01-06" @default.
- W4313186260 creator A5004557807 @default.
- W4313186260 creator A5018518655 @default.
- W4313186260 creator A5043446520 @default.
- W4313186260 creator A5061507436 @default.
- W4313186260 creator A5074834854 @default.
- W4313186260 date "2022-06-01" @default.
- W4313186260 modified "2023-10-18" @default.
- W4313186260 title "Align and Prompt: Video-and-Language Pre-training with Entity Prompts" @default.
- W4313186260 cites W2098411764 @default.
- W4313186260 cites W2277195237 @default.
- W4313186260 cites W2293605478 @default.
- W4313186260 cites W2425121537 @default.
- W4313186260 cites W2562153041 @default.
- W4313186260 cites W2606982687 @default.
- W4313186260 cites W2765716052 @default.
- W4313186260 cites W2885775891 @default.
- W4313186260 cites W2886641317 @default.
- W4313186260 cites W2897439619 @default.
- W4313186260 cites W2954199749 @default.
- W4313186260 cites W2962949233 @default.
- W4313186260 cites W2963017553 @default.
- W4313186260 cites W2963541336 @default.
- W4313186260 cites W2981851019 @default.
- W4313186260 cites W2984008963 @default.
- W4313186260 cites W2997344006 @default.
- W4313186260 cites W2997805943 @default.
- W4313186260 cites W2998166190 @default.
- W4313186260 cites W3035265375 @default.
- W4313186260 cites W3035635319 @default.
- W4313186260 cites W3090449556 @default.
- W4313186260 cites W3105232955 @default.
- W4313186260 cites W3153005511 @default.
- W4313186260 cites W3172523222 @default.
- W4313186260 cites W3176398504 @default.
- W4313186260 cites W3197457832 @default.
- W4313186260 cites W3203711169 @default.
- W4313186260 cites W3204588463 @default.
- W4313186260 doi "https://doi.org/10.1109/cvpr52688.2022.00490" @default.
- W4313186260 hasPublicationYear "2022" @default.
- W4313186260 type Work @default.
- W4313186260 citedByCount "25" @default.
- W4313186260 countsByYear W43131862602023 @default.
- W4313186260 crossrefType "proceedings-article" @default.
- W4313186260 hasAuthorship W4313186260A5004557807 @default.
- W4313186260 hasAuthorship W4313186260A5018518655 @default.
- W4313186260 hasAuthorship W4313186260A5043446520 @default.
- W4313186260 hasAuthorship W4313186260A5061507436 @default.
- W4313186260 hasAuthorship W4313186260A5074834854 @default.
- W4313186260 hasBestOaLocation W43131862602 @default.
- W4313186260 hasConcept C111919701 @default.
- W4313186260 hasConcept C118505674 @default.
- W4313186260 hasConcept C121332964 @default.
- W4313186260 hasConcept C137293760 @default.
- W4313186260 hasConcept C138885662 @default.
- W4313186260 hasConcept C154945302 @default.
- W4313186260 hasConcept C165801399 @default.
- W4313186260 hasConcept C204321447 @default.
- W4313186260 hasConcept C2777601683 @default.
- W4313186260 hasConcept C28490314 @default.
- W4313186260 hasConcept C41008148 @default.
- W4313186260 hasConcept C41895202 @default.
- W4313186260 hasConcept C62520636 @default.
- W4313186260 hasConcept C66322947 @default.
- W4313186260 hasConceptScore W4313186260C111919701 @default.
- W4313186260 hasConceptScore W4313186260C118505674 @default.
- W4313186260 hasConceptScore W4313186260C121332964 @default.
- W4313186260 hasConceptScore W4313186260C137293760 @default.
- W4313186260 hasConceptScore W4313186260C138885662 @default.
- W4313186260 hasConceptScore W4313186260C154945302 @default.
- W4313186260 hasConceptScore W4313186260C165801399 @default.
- W4313186260 hasConceptScore W4313186260C204321447 @default.
- W4313186260 hasConceptScore W4313186260C2777601683 @default.
- W4313186260 hasConceptScore W4313186260C28490314 @default.
- W4313186260 hasConceptScore W4313186260C41008148 @default.
- W4313186260 hasConceptScore W4313186260C41895202 @default.
- W4313186260 hasConceptScore W4313186260C62520636 @default.
- W4313186260 hasConceptScore W4313186260C66322947 @default.
- W4313186260 hasLocation W43131862601 @default.
- W4313186260 hasLocation W43131862602 @default.
- W4313186260 hasOpenAccess W4313186260 @default.
- W4313186260 hasPrimaryLocation W43131862601 @default.
- W4313186260 hasRelatedWork W2547835662 @default.
- W4313186260 hasRelatedWork W2759980945 @default.
- W4313186260 hasRelatedWork W2892009249 @default.
- W4313186260 hasRelatedWork W2945824677 @default.
- W4313186260 hasRelatedWork W3097571385 @default.
- W4313186260 hasRelatedWork W3098382480 @default.
- W4313186260 hasRelatedWork W3104417388 @default.
- W4313186260 hasRelatedWork W3161911362 @default.
- W4313186260 hasRelatedWork W3196747313 @default.
- W4313186260 hasRelatedWork W4287598411 @default.
- W4313186260 isParatext "false" @default.
- W4313186260 isRetracted "false" @default.
- W4313186260 workType "article" @default.