Matches in SemOpenAlex for { <https://semopenalex.org/work/W4295883711> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W4295883711 abstract "Recent advances in text-to-image synthesis have led to large pretrained transformers with excellent capabilities to generate visualizations from a given text. However, these models are ill-suited for specialized tasks like story visualization, which requires an agent to produce a sequence of images given a corresponding sequence of captions, forming a narrative. Moreover, we find that the story visualization task fails to accommodate generalization to unseen plots and characters in new narratives. Hence, we first propose the task of story continuation, where the generated visual story is conditioned on a source image, allowing for better generalization to narratives with new characters. Then, we enhance or 'retro-fit' the pretrained text-to-image synthesis models with task-specific modules for (a) sequential image generation and (b) copying relevant elements from an initial frame. Then, we explore full-model finetuning, as well as prompt-based tuning for parameter-efficient adaptation, of the pre-trained model. We evaluate our approach StoryDALL-E on two existing datasets, PororoSV and FlintstonesSV, and introduce a new dataset DiDeMoSV collected from a video-captioning dataset. We also develop a model StoryGANc based on Generative Adversarial Networks (GAN) for story continuation, and compare it with the StoryDALL-E model to demonstrate the advantages of our approach. We show that our retro-fitting approach outperforms GAN-based models for story continuation and facilitates copying of visual elements from the source image, thereby improving continuity in the generated visual story. Finally, our analysis suggests that pretrained transformers struggle to comprehend narratives containing several characters. Overall, our work demonstrates that pretrained text-to-image synthesis models can be adapted for complex and low-resource tasks like story continuation." @default.
- W4295883711 created "2022-09-16" @default.
- W4295883711 creator A5001987532 @default.
- W4295883711 creator A5022020152 @default.
- W4295883711 creator A5048025332 @default.
- W4295883711 date "2022-09-13" @default.
- W4295883711 modified "2023-09-30" @default.
- W4295883711 title "StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation" @default.
- W4295883711 doi "https://doi.org/10.48550/arxiv.2209.06192" @default.
- W4295883711 hasPublicationYear "2022" @default.
- W4295883711 type Work @default.
- W4295883711 citedByCount "0" @default.
- W4295883711 crossrefType "posted-content" @default.
- W4295883711 hasAuthorship W4295883711A5001987532 @default.
- W4295883711 hasAuthorship W4295883711A5022020152 @default.
- W4295883711 hasAuthorship W4295883711A5048025332 @default.
- W4295883711 hasBestOaLocation W42958837111 @default.
- W4295883711 hasConcept C115961682 @default.
- W4295883711 hasConcept C121332964 @default.
- W4295883711 hasConcept C134306372 @default.
- W4295883711 hasConcept C138885662 @default.
- W4295883711 hasConcept C154945302 @default.
- W4295883711 hasConcept C157657479 @default.
- W4295883711 hasConcept C165801399 @default.
- W4295883711 hasConcept C177148314 @default.
- W4295883711 hasConcept C17744445 @default.
- W4295883711 hasConcept C199033989 @default.
- W4295883711 hasConcept C199360897 @default.
- W4295883711 hasConcept C199539241 @default.
- W4295883711 hasConcept C204321447 @default.
- W4295883711 hasConcept C2779151265 @default.
- W4295883711 hasConcept C33923547 @default.
- W4295883711 hasConcept C36464697 @default.
- W4295883711 hasConcept C41008148 @default.
- W4295883711 hasConcept C41895202 @default.
- W4295883711 hasConcept C62520636 @default.
- W4295883711 hasConcept C66322947 @default.
- W4295883711 hasConcept C88626702 @default.
- W4295883711 hasConceptScore W4295883711C115961682 @default.
- W4295883711 hasConceptScore W4295883711C121332964 @default.
- W4295883711 hasConceptScore W4295883711C134306372 @default.
- W4295883711 hasConceptScore W4295883711C138885662 @default.
- W4295883711 hasConceptScore W4295883711C154945302 @default.
- W4295883711 hasConceptScore W4295883711C157657479 @default.
- W4295883711 hasConceptScore W4295883711C165801399 @default.
- W4295883711 hasConceptScore W4295883711C177148314 @default.
- W4295883711 hasConceptScore W4295883711C17744445 @default.
- W4295883711 hasConceptScore W4295883711C199033989 @default.
- W4295883711 hasConceptScore W4295883711C199360897 @default.
- W4295883711 hasConceptScore W4295883711C199539241 @default.
- W4295883711 hasConceptScore W4295883711C204321447 @default.
- W4295883711 hasConceptScore W4295883711C2779151265 @default.
- W4295883711 hasConceptScore W4295883711C33923547 @default.
- W4295883711 hasConceptScore W4295883711C36464697 @default.
- W4295883711 hasConceptScore W4295883711C41008148 @default.
- W4295883711 hasConceptScore W4295883711C41895202 @default.
- W4295883711 hasConceptScore W4295883711C62520636 @default.
- W4295883711 hasConceptScore W4295883711C66322947 @default.
- W4295883711 hasConceptScore W4295883711C88626702 @default.
- W4295883711 hasLocation W42958837111 @default.
- W4295883711 hasOpenAccess W4295883711 @default.
- W4295883711 hasPrimaryLocation W42958837111 @default.
- W4295883711 hasRelatedWork W2172888184 @default.
- W4295883711 hasRelatedWork W2795359650 @default.
- W4295883711 hasRelatedWork W2923366293 @default.
- W4295883711 hasRelatedWork W3008515501 @default.
- W4295883711 hasRelatedWork W3102877762 @default.
- W4295883711 hasRelatedWork W3107474891 @default.
- W4295883711 hasRelatedWork W3125494348 @default.
- W4295883711 hasRelatedWork W3212300380 @default.
- W4295883711 hasRelatedWork W4287366279 @default.
- W4295883711 hasRelatedWork W4299801216 @default.
- W4295883711 isParatext "false" @default.
- W4295883711 isRetracted "false" @default.
- W4295883711 workType "article" @default.