Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387156645> ?p ?o ?g. }
Showing items 1 to 51 of
51
with 100 items per page.
- W4387156645 abstract "Although recent text-to-video (T2V) generation methods have seen significant advancements, most of these works focus on producing short video clips of a single event with a single background (i.e., single-scene videos). Meanwhile, recent large language models (LLMs) have demonstrated their capability in generating layouts and programs to control downstream visual modules such as image generation models. This raises an important question: can we leverage the knowledge embedded in these LLMs for temporally consistent long video generation? In this paper, we propose VideoDirectorGPT, a novel framework for consistent multi-scene video generation that uses the knowledge of LLMs for video content planning and grounded video generation. Specifically, given a single text prompt, we first ask our video planner LLM (GPT-4) to expand it into a 'video plan', which involves generating the scene descriptions, the entities with their respective layouts, the background for each scene, and consistency groupings of the entities and backgrounds. Next, guided by this output from the video planner, our video generator, Layout2Vid, has explicit control over spatial layouts and can maintain temporal consistency of entities/backgrounds across scenes, while only trained with image-level annotations. Our experiments demonstrate that VideoDirectorGPT framework substantially improves layout and movement control in both single- and multi-scene video generation and can generate multi-scene videos with visual consistency across scenes, while achieving competitive performance with SOTAs in open-domain single-scene T2V generation. We also demonstrate that our framework can dynamically control the strength for layout guidance and can also generate videos with user-provided images. We hope our framework can inspire future work on better integrating the planning ability of LLMs into consistent long video generation." @default.
- W4387156645 created "2023-09-30" @default.
- W4387156645 creator A5001987532 @default.
- W4387156645 creator A5022274239 @default.
- W4387156645 creator A5052864910 @default.
- W4387156645 creator A5054048046 @default.
- W4387156645 date "2023-09-26" @default.
- W4387156645 modified "2023-09-30" @default.
- W4387156645 title "VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning" @default.
- W4387156645 doi "https://doi.org/10.48550/arxiv.2309.15091" @default.
- W4387156645 hasPublicationYear "2023" @default.
- W4387156645 type Work @default.
- W4387156645 citedByCount "0" @default.
- W4387156645 crossrefType "posted-content" @default.
- W4387156645 hasAuthorship W4387156645A5001987532 @default.
- W4387156645 hasAuthorship W4387156645A5022274239 @default.
- W4387156645 hasAuthorship W4387156645A5052864910 @default.
- W4387156645 hasAuthorship W4387156645A5054048046 @default.
- W4387156645 hasBestOaLocation W43871566451 @default.
- W4387156645 hasConcept C153083717 @default.
- W4387156645 hasConcept C154945302 @default.
- W4387156645 hasConcept C202474056 @default.
- W4387156645 hasConcept C2776436953 @default.
- W4387156645 hasConcept C31972630 @default.
- W4387156645 hasConcept C41008148 @default.
- W4387156645 hasConcept C49774154 @default.
- W4387156645 hasConcept C65483669 @default.
- W4387156645 hasConceptScore W4387156645C153083717 @default.
- W4387156645 hasConceptScore W4387156645C154945302 @default.
- W4387156645 hasConceptScore W4387156645C202474056 @default.
- W4387156645 hasConceptScore W4387156645C2776436953 @default.
- W4387156645 hasConceptScore W4387156645C31972630 @default.
- W4387156645 hasConceptScore W4387156645C41008148 @default.
- W4387156645 hasConceptScore W4387156645C49774154 @default.
- W4387156645 hasConceptScore W4387156645C65483669 @default.
- W4387156645 hasLocation W43871566451 @default.
- W4387156645 hasOpenAccess W4387156645 @default.
- W4387156645 hasPrimaryLocation W43871566451 @default.
- W4387156645 hasRelatedWork W1966005655 @default.
- W4387156645 hasRelatedWork W2067511866 @default.
- W4387156645 hasRelatedWork W2136595788 @default.
- W4387156645 hasRelatedWork W2385949326 @default.
- W4387156645 hasRelatedWork W2740242884 @default.
- W4387156645 hasRelatedWork W2789220062 @default.
- W4387156645 hasRelatedWork W2811496562 @default.
- W4387156645 hasRelatedWork W3135795035 @default.
- W4387156645 hasRelatedWork W4315836309 @default.
- W4387156645 hasRelatedWork W2185534064 @default.
- W4387156645 isParatext "false" @default.
- W4387156645 isRetracted "false" @default.
- W4387156645 workType "article" @default.