Matches in SemOpenAlex for { <https://semopenalex.org/work/W4312206189> ?p ?o ?g. }
Showing items 1 to 67 of
67
with 100 items per page.
- W4312206189 abstract "The field of multimodal research focusing on the comprehension and creation of both images and text has witnessed significant strides. This progress is exemplified by the emergence of sophisticated models dedicated to image captioning at scale, such as the notable Flamingo model and text-to-image generative models, with DALL-E serving as a prominent example. An interesting question worth exploring in this domain is whether Flamingo and DALL-E understand each other. To study this question, we propose a reconstruction task where Flamingo generates a description for a given image and DALL-E uses this description as input to synthesize a new image. We argue that these models understand each other if the generated image is similar to the given image. Specifically, we study the relationship between the quality of the image reconstruction and that of the text generation. We find that an optimal description of an image is one that gives rise to a generated image similar to the original one. The finding motivates us to propose a unified framework to finetune the text-to-image and image-to-text models. Concretely, the reconstruction part forms a regularization loss to guide the tuning of the models. Extensive experiments on multiple datasets with different image captioning and image generation models validate our findings and demonstrate the effectiveness of our proposed unified framework. As DALL-E and Flamingo are not publicly available, we use Stable Diffusion and BLIP in the remaining work. Project website: https://dalleflamingo.github.io." @default.
- W4312206189 created "2023-01-04" @default.
- W4312206189 creator A5039442998 @default.
- W4312206189 creator A5047843370 @default.
- W4312206189 creator A5055994909 @default.
- W4312206189 creator A5074808403 @default.
- W4312206189 creator A5086275760 @default.
- W4312206189 date "2022-12-23" @default.
- W4312206189 modified "2023-09-27" @default.
- W4312206189 title "Do DALL-E and Flamingo Understand Each Other?" @default.
- W4312206189 doi "https://doi.org/10.48550/arxiv.2212.12249" @default.
- W4312206189 hasPublicationYear "2022" @default.
- W4312206189 type Work @default.
- W4312206189 citedByCount "0" @default.
- W4312206189 crossrefType "posted-content" @default.
- W4312206189 hasAuthorship W4312206189A5039442998 @default.
- W4312206189 hasAuthorship W4312206189A5047843370 @default.
- W4312206189 hasAuthorship W4312206189A5055994909 @default.
- W4312206189 hasAuthorship W4312206189A5074808403 @default.
- W4312206189 hasAuthorship W4312206189A5086275760 @default.
- W4312206189 hasBestOaLocation W43122061891 @default.
- W4312206189 hasConcept C115961682 @default.
- W4312206189 hasConcept C134306372 @default.
- W4312206189 hasConcept C154945302 @default.
- W4312206189 hasConcept C157657479 @default.
- W4312206189 hasConcept C167966045 @default.
- W4312206189 hasConcept C199360897 @default.
- W4312206189 hasConcept C202444582 @default.
- W4312206189 hasConcept C23123220 @default.
- W4312206189 hasConcept C2776135515 @default.
- W4312206189 hasConcept C33923547 @default.
- W4312206189 hasConcept C36503486 @default.
- W4312206189 hasConcept C39890363 @default.
- W4312206189 hasConcept C41008148 @default.
- W4312206189 hasConcept C511192102 @default.
- W4312206189 hasConcept C9652623 @default.
- W4312206189 hasConceptScore W4312206189C115961682 @default.
- W4312206189 hasConceptScore W4312206189C134306372 @default.
- W4312206189 hasConceptScore W4312206189C154945302 @default.
- W4312206189 hasConceptScore W4312206189C157657479 @default.
- W4312206189 hasConceptScore W4312206189C167966045 @default.
- W4312206189 hasConceptScore W4312206189C199360897 @default.
- W4312206189 hasConceptScore W4312206189C202444582 @default.
- W4312206189 hasConceptScore W4312206189C23123220 @default.
- W4312206189 hasConceptScore W4312206189C2776135515 @default.
- W4312206189 hasConceptScore W4312206189C33923547 @default.
- W4312206189 hasConceptScore W4312206189C36503486 @default.
- W4312206189 hasConceptScore W4312206189C39890363 @default.
- W4312206189 hasConceptScore W4312206189C41008148 @default.
- W4312206189 hasConceptScore W4312206189C511192102 @default.
- W4312206189 hasConceptScore W4312206189C9652623 @default.
- W4312206189 hasLocation W43122061891 @default.
- W4312206189 hasOpenAccess W4312206189 @default.
- W4312206189 hasPrimaryLocation W43122061891 @default.
- W4312206189 hasRelatedWork W2560207749 @default.
- W4312206189 hasRelatedWork W2767577934 @default.
- W4312206189 hasRelatedWork W2781244421 @default.
- W4312206189 hasRelatedWork W2982558732 @default.
- W4312206189 hasRelatedWork W3113062513 @default.
- W4312206189 hasRelatedWork W4285886406 @default.
- W4312206189 hasRelatedWork W4288087560 @default.
- W4312206189 hasRelatedWork W4297080010 @default.
- W4312206189 hasRelatedWork W4321854007 @default.
- W4312206189 hasRelatedWork W4375870165 @default.
- W4312206189 isParatext "false" @default.
- W4312206189 isRetracted "false" @default.
- W4312206189 workType "article" @default.