Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386065765> ?p ?o ?g. }
Showing items 1 to 95 of
95
with 100 items per page.
- W4386065765 abstract "While generative modeling on multimodal image-text data has been actively developed with large-scale paired datasets, there have been limited attempts to generate both image and text data by a single model rather than a generation of one fixed modality conditioned on the other modality. In this paper, we explore a unified generative vision-and-language (VL) model that can produce both images and text sequences. Especially, we propose a generative VL transformer based on the non-autoregressive mask prediction, named MAGVLT, and compare it with an autoregressive generative VL transformer (ARGVLT). In comparison to ARGVLT, the proposed MAGVLT enables bidirectional context encoding, fast decoding by parallel token predictions in an iterative refinement, and extended editing capabilities such as image and text infilling. For rigorous training of our MAGVLT with image-text pairs from scratch, we combine the image-to-text, text-to-image, and joint image-and-text mask prediction tasks. Moreover, we devise two additional tasks based on the step-unrolled mask prediction and the selective prediction on the mixture of two image-text pairs. Experimental results on various downstream generation tasks of VL benchmarks show that our MAGVLT outperforms ARGVLT by a large margin even with significant inference speedup. Particularly, MAGVLT achieves competitive results on both zero-shot image-to-text and text-to-image generation tasks from MS-COCO by one moderate-sized model (fewer than 500M parameters) even without the use of monomodal data and networks." @default.
- W4386065765 created "2023-08-23" @default.
- W4386065765 creator A5003130642 @default.
- W4386065765 creator A5033558218 @default.
- W4386065765 creator A5065309573 @default.
- W4386065765 creator A5087294413 @default.
- W4386065765 date "2023-06-01" @default.
- W4386065765 modified "2023-09-27" @default.
- W4386065765 title "MAGVLT: Masked Generative Vision-and-Language Transformer" @default.
- W4386065765 cites W1773149199 @default.
- W4386065765 cites W2560730294 @default.
- W4386065765 cites W2886641317 @default.
- W4386065765 cites W2962784628 @default.
- W4386065765 cites W2970231061 @default.
- W4386065765 cites W2988975212 @default.
- W4386065765 cites W2992308087 @default.
- W4386065765 cites W3034999214 @default.
- W4386065765 cites W3104279398 @default.
- W4386065765 cites W3156892778 @default.
- W4386065765 cites W3173220247 @default.
- W4386065765 cites W3176641147 @default.
- W4386065765 cites W3176824248 @default.
- W4386065765 cites W3180355996 @default.
- W4386065765 cites W4226452284 @default.
- W4386065765 cites W4312933868 @default.
- W4386065765 cites W4312938727 @default.
- W4386065765 cites W4313021454 @default.
- W4386065765 cites W4313036289 @default.
- W4386065765 doi "https://doi.org/10.1109/cvpr52729.2023.02235" @default.
- W4386065765 hasPublicationYear "2023" @default.
- W4386065765 type Work @default.
- W4386065765 citedByCount "0" @default.
- W4386065765 crossrefType "proceedings-article" @default.
- W4386065765 hasAuthorship W4386065765A5003130642 @default.
- W4386065765 hasAuthorship W4386065765A5033558218 @default.
- W4386065765 hasAuthorship W4386065765A5065309573 @default.
- W4386065765 hasAuthorship W4386065765A5087294413 @default.
- W4386065765 hasConcept C115961682 @default.
- W4386065765 hasConcept C119857082 @default.
- W4386065765 hasConcept C121332964 @default.
- W4386065765 hasConcept C149782125 @default.
- W4386065765 hasConcept C153180895 @default.
- W4386065765 hasConcept C154945302 @default.
- W4386065765 hasConcept C159877910 @default.
- W4386065765 hasConcept C165801399 @default.
- W4386065765 hasConcept C167966045 @default.
- W4386065765 hasConcept C204321447 @default.
- W4386065765 hasConcept C2776214188 @default.
- W4386065765 hasConcept C28490314 @default.
- W4386065765 hasConcept C33923547 @default.
- W4386065765 hasConcept C38652104 @default.
- W4386065765 hasConcept C39890363 @default.
- W4386065765 hasConcept C41008148 @default.
- W4386065765 hasConcept C48145219 @default.
- W4386065765 hasConcept C62520636 @default.
- W4386065765 hasConcept C66322947 @default.
- W4386065765 hasConcept C774472 @default.
- W4386065765 hasConceptScore W4386065765C115961682 @default.
- W4386065765 hasConceptScore W4386065765C119857082 @default.
- W4386065765 hasConceptScore W4386065765C121332964 @default.
- W4386065765 hasConceptScore W4386065765C149782125 @default.
- W4386065765 hasConceptScore W4386065765C153180895 @default.
- W4386065765 hasConceptScore W4386065765C154945302 @default.
- W4386065765 hasConceptScore W4386065765C159877910 @default.
- W4386065765 hasConceptScore W4386065765C165801399 @default.
- W4386065765 hasConceptScore W4386065765C167966045 @default.
- W4386065765 hasConceptScore W4386065765C204321447 @default.
- W4386065765 hasConceptScore W4386065765C2776214188 @default.
- W4386065765 hasConceptScore W4386065765C28490314 @default.
- W4386065765 hasConceptScore W4386065765C33923547 @default.
- W4386065765 hasConceptScore W4386065765C38652104 @default.
- W4386065765 hasConceptScore W4386065765C39890363 @default.
- W4386065765 hasConceptScore W4386065765C41008148 @default.
- W4386065765 hasConceptScore W4386065765C48145219 @default.
- W4386065765 hasConceptScore W4386065765C62520636 @default.
- W4386065765 hasConceptScore W4386065765C66322947 @default.
- W4386065765 hasConceptScore W4386065765C774472 @default.
- W4386065765 hasFunder F4320321373 @default.
- W4386065765 hasFunder F4320328359 @default.
- W4386065765 hasLocation W43860657651 @default.
- W4386065765 hasOpenAccess W4386065765 @default.
- W4386065765 hasPrimaryLocation W43860657651 @default.
- W4386065765 hasRelatedWork W2104924585 @default.
- W4386065765 hasRelatedWork W2952148308 @default.
- W4386065765 hasRelatedWork W3011817866 @default.
- W4386065765 hasRelatedWork W3042228302 @default.
- W4386065765 hasRelatedWork W3165012362 @default.
- W4386065765 hasRelatedWork W3209239055 @default.
- W4386065765 hasRelatedWork W4287825816 @default.
- W4386065765 hasRelatedWork W4303874710 @default.
- W4386065765 hasRelatedWork W4319653417 @default.
- W4386065765 hasRelatedWork W4372347456 @default.
- W4386065765 isParatext "false" @default.
- W4386065765 isRetracted "false" @default.
- W4386065765 workType "article" @default.