Matches in SemOpenAlex for { <https://semopenalex.org/work/W3186513731> ?p ?o ?g. }
- W3186513731 endingPage "19" @default.
- W3186513731 startingPage "1" @default.
- W3186513731 abstract "Vision-and-language (V-L) tasks require the system to understand both vision content and natural language, thus learning fine-grained joint representations of vision and language (a.k.a. V-L representations) is of paramount importance. Recently, various pre-trained V-L models are proposed to learn V-L representations and achieve improved results in many tasks. However, the mainstream models process both vision and language inputs with the same set of attention matrices. As a result, the generated V-L representations are entangled in one common latent space . To tackle this problem, we propose DiMBERT (short for Di sentangled M ultimodal-Attention BERT ), which is a novel framework that applies separated attention spaces for vision and language, and the representations of multi-modalities can thus be disentangled explicitly. To enhance the correlation between vision and language in disentangled spaces, we introduce the visual concepts to DiMBERT which represent visual information in textual format. In this manner, visual concepts help to bridge the gap between the two modalities. We pre-train DiMBERT on a large amount of image–sentence pairs on two tasks: bidirectional language modeling and sequence-to-sequence language modeling. After pre-train, DiMBERT is further fine-tuned for the downstream tasks. Experiments show that DiMBERT sets new state-of-the-art performance on three tasks (over four datasets), including both generation tasks (image captioning and visual storytelling) and classification tasks (referring expressions). The proposed DiM (short for Di sentangled M ultimodal-Attention) module can be easily incorporated into existing pre-trained V-L models to boost their performance, up to a 5% increase on the representative task. Finally, we conduct a systematic analysis and demonstrate the effectiveness of our DiM and the introduced visual concepts." @default.
- W3186513731 created "2021-08-02" @default.
- W3186513731 creator A5002795838 @default.
- W3186513731 creator A5006497264 @default.
- W3186513731 creator A5016393389 @default.
- W3186513731 creator A5018805166 @default.
- W3186513731 creator A5021769901 @default.
- W3186513731 creator A5038836690 @default.
- W3186513731 creator A5049239373 @default.
- W3186513731 date "2021-07-20" @default.
- W3186513731 modified "2023-10-16" @default.
- W3186513731 title "DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention" @default.
- W3186513731 cites W1905882502 @default.
- W3186513731 cites W1956340063 @default.
- W3186513731 cites W2064675550 @default.
- W3186513731 cites W2097117768 @default.
- W3186513731 cites W2101105183 @default.
- W3186513731 cites W2108598243 @default.
- W3186513731 cites W2110485445 @default.
- W3186513731 cites W2150824314 @default.
- W3186513731 cites W2185175083 @default.
- W3186513731 cites W2277195237 @default.
- W3186513731 cites W2552161745 @default.
- W3186513731 cites W2745461083 @default.
- W3186513731 cites W2885013662 @default.
- W3186513731 cites W2886641317 @default.
- W3186513731 cites W2887585070 @default.
- W3186513731 cites W2890531016 @default.
- W3186513731 cites W2953486038 @default.
- W3186513731 cites W2963033554 @default.
- W3186513731 cites W2963084599 @default.
- W3186513731 cites W2963101956 @default.
- W3186513731 cites W2964018924 @default.
- W3186513731 cites W2968101724 @default.
- W3186513731 cites W2970869018 @default.
- W3186513731 cites W2986670728 @default.
- W3186513731 cites W2997591391 @default.
- W3186513731 cites W2998356391 @default.
- W3186513731 cites W3035485997 @default.
- W3186513731 cites W3093374703 @default.
- W3186513731 cites W3098605233 @default.
- W3186513731 cites W3101703188 @default.
- W3186513731 cites W3104152799 @default.
- W3186513731 cites W3106784008 @default.
- W3186513731 cites W3116651605 @default.
- W3186513731 cites W3168640669 @default.
- W3186513731 doi "https://doi.org/10.1145/3447685" @default.
- W3186513731 hasPublicationYear "2021" @default.
- W3186513731 type Work @default.
- W3186513731 sameAs 3186513731 @default.
- W3186513731 citedByCount "1" @default.
- W3186513731 countsByYear W31865137312023 @default.
- W3186513731 crossrefType "journal-article" @default.
- W3186513731 hasAuthorship W3186513731A5002795838 @default.
- W3186513731 hasAuthorship W3186513731A5006497264 @default.
- W3186513731 hasAuthorship W3186513731A5016393389 @default.
- W3186513731 hasAuthorship W3186513731A5018805166 @default.
- W3186513731 hasAuthorship W3186513731A5021769901 @default.
- W3186513731 hasAuthorship W3186513731A5038836690 @default.
- W3186513731 hasAuthorship W3186513731A5049239373 @default.
- W3186513731 hasBestOaLocation W31865137312 @default.
- W3186513731 hasConcept C111919701 @default.
- W3186513731 hasConcept C115961682 @default.
- W3186513731 hasConcept C154945302 @default.
- W3186513731 hasConcept C157657479 @default.
- W3186513731 hasConcept C177264268 @default.
- W3186513731 hasConcept C195324797 @default.
- W3186513731 hasConcept C199360897 @default.
- W3186513731 hasConcept C204321447 @default.
- W3186513731 hasConcept C2777530160 @default.
- W3186513731 hasConcept C41008148 @default.
- W3186513731 hasConcept C98045186 @default.
- W3186513731 hasConceptScore W3186513731C111919701 @default.
- W3186513731 hasConceptScore W3186513731C115961682 @default.
- W3186513731 hasConceptScore W3186513731C154945302 @default.
- W3186513731 hasConceptScore W3186513731C157657479 @default.
- W3186513731 hasConceptScore W3186513731C177264268 @default.
- W3186513731 hasConceptScore W3186513731C195324797 @default.
- W3186513731 hasConceptScore W3186513731C199360897 @default.
- W3186513731 hasConceptScore W3186513731C204321447 @default.
- W3186513731 hasConceptScore W3186513731C2777530160 @default.
- W3186513731 hasConceptScore W3186513731C41008148 @default.
- W3186513731 hasConceptScore W3186513731C98045186 @default.
- W3186513731 hasIssue "1" @default.
- W3186513731 hasLocation W31865137311 @default.
- W3186513731 hasLocation W31865137312 @default.
- W3186513731 hasOpenAccess W3186513731 @default.
- W3186513731 hasPrimaryLocation W31865137311 @default.
- W3186513731 hasRelatedWork W159132833 @default.
- W3186513731 hasRelatedWork W2033261979 @default.
- W3186513731 hasRelatedWork W2293457016 @default.
- W3186513731 hasRelatedWork W2411652523 @default.
- W3186513731 hasRelatedWork W2502722637 @default.
- W3186513731 hasRelatedWork W2567044968 @default.
- W3186513731 hasRelatedWork W2977842567 @default.
- W3186513731 hasRelatedWork W4297803820 @default.
- W3186513731 hasRelatedWork W87581401 @default.
- W3186513731 hasRelatedWork W1872130062 @default.