Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387477625> ?p ?o ?g. }
Showing items 1 to 92 of
92
with 100 items per page.
- W4387477625 endingPage "102552" @default.
- W4387477625 startingPage "102552" @default.
- W4387477625 abstract "Audio-visual cross-modality generation refers to the generation of audio or visual content based on input from another modality. One of the key tasks in this field is the generation of realistic talking facial videos from audio and head pose information, which has significant applications in human-computer interaction, virtual reality, and video production. However, previous work has limitations such as the inability to generate natural head poses or interact with audio, which compromises the realism and expressive power of the generated videos. This paper aims to address these issues and improve the state-of-the-art in this field. To this end, we propose an autoregressive generation method called Flow2Flow and collect a large-scale in-the-wild solo-singing-themed audio-visual dataset called AVVS to investigate the rhythmic head movement patterns. The Flow2Flow model involves a multimodal transformer block with cross-attention, which can encode audio features and historical head poses to establish potential audio-visual motion entanglement and uses normalizing flows to generate future facial motion representation sequences. The generated motion representations are identity-independent, allowing the method to be transferred to any face identity. We model the motion of image content using warping flows generated from 3D keypoints based on the facial motion representation sequences, carefully manipulate animation generation, and estimate dense motion fields based on deformation flows using a neural rendering model to present photo-realistic talking facial videos. Experimental results show that our proposed method generates photo-realistic videos with natural head poses and lip-syncing, and we validate the effectiveness of our method compared to state-of-the-art methods on two public datasets." @default.
- W4387477625 created "2023-10-11" @default.
- W4387477625 creator A5029398570 @default.
- W4387477625 creator A5060715114 @default.
- W4387477625 creator A5061193568 @default.
- W4387477625 creator A5085405957 @default.
- W4387477625 date "2023-12-01" @default.
- W4387477625 modified "2023-10-18" @default.
- W4387477625 title "Flow2Flow: Audio-Visual Cross-Modality Generation for Talking Face Videos with Rhythmic Head" @default.
- W4387477625 cites W2015143272 @default.
- W4387477625 cites W2118607666 @default.
- W4387477625 cites W2324260108 @default.
- W4387477625 cites W2325939864 @default.
- W4387477625 cites W2533370895 @default.
- W4387477625 cites W2738406145 @default.
- W4387477625 cites W2794680924 @default.
- W4387477625 cites W2803193013 @default.
- W4387477625 cites W2808631503 @default.
- W4387477625 cites W2884460600 @default.
- W4387477625 cites W2894938704 @default.
- W4387477625 cites W2963324747 @default.
- W4387477625 cites W2963644257 @default.
- W4387477625 cites W2963800363 @default.
- W4387477625 cites W3002482304 @default.
- W4387477625 cites W3009042479 @default.
- W4387477625 cites W3022710784 @default.
- W4387477625 cites W3107914916 @default.
- W4387477625 cites W3172598908 @default.
- W4387477625 cites W3174992416 @default.
- W4387477625 cites W3177150198 @default.
- W4387477625 cites W3201904914 @default.
- W4387477625 cites W3204138343 @default.
- W4387477625 cites W3204221554 @default.
- W4387477625 cites W3206086363 @default.
- W4387477625 cites W4200526174 @default.
- W4387477625 cites W4225272741 @default.
- W4387477625 cites W4283073790 @default.
- W4387477625 cites W4283210480 @default.
- W4387477625 cites W4295213539 @default.
- W4387477625 cites W4312890925 @default.
- W4387477625 cites W4322730980 @default.
- W4387477625 cites W4360603643 @default.
- W4387477625 cites W4364361300 @default.
- W4387477625 cites W4372348803 @default.
- W4387477625 doi "https://doi.org/10.1016/j.displa.2023.102552" @default.
- W4387477625 hasPublicationYear "2023" @default.
- W4387477625 type Work @default.
- W4387477625 citedByCount "0" @default.
- W4387477625 crossrefType "journal-article" @default.
- W4387477625 hasAuthorship W4387477625A5029398570 @default.
- W4387477625 hasAuthorship W4387477625A5060715114 @default.
- W4387477625 hasAuthorship W4387477625A5061193568 @default.
- W4387477625 hasAuthorship W4387477625A5085405957 @default.
- W4387477625 hasConcept C121684516 @default.
- W4387477625 hasConcept C138591656 @default.
- W4387477625 hasConcept C154945302 @default.
- W4387477625 hasConcept C205711294 @default.
- W4387477625 hasConcept C2780226545 @default.
- W4387477625 hasConcept C28490314 @default.
- W4387477625 hasConcept C31972630 @default.
- W4387477625 hasConcept C41008148 @default.
- W4387477625 hasConcept C502989409 @default.
- W4387477625 hasConcept C69369342 @default.
- W4387477625 hasConceptScore W4387477625C121684516 @default.
- W4387477625 hasConceptScore W4387477625C138591656 @default.
- W4387477625 hasConceptScore W4387477625C154945302 @default.
- W4387477625 hasConceptScore W4387477625C205711294 @default.
- W4387477625 hasConceptScore W4387477625C2780226545 @default.
- W4387477625 hasConceptScore W4387477625C28490314 @default.
- W4387477625 hasConceptScore W4387477625C31972630 @default.
- W4387477625 hasConceptScore W4387477625C41008148 @default.
- W4387477625 hasConceptScore W4387477625C502989409 @default.
- W4387477625 hasConceptScore W4387477625C69369342 @default.
- W4387477625 hasLocation W43874776251 @default.
- W4387477625 hasOpenAccess W4387477625 @default.
- W4387477625 hasPrimaryLocation W43874776251 @default.
- W4387477625 hasRelatedWork W1544039745 @default.
- W4387477625 hasRelatedWork W1976926596 @default.
- W4387477625 hasRelatedWork W2121378366 @default.
- W4387477625 hasRelatedWork W2156310872 @default.
- W4387477625 hasRelatedWork W2356609371 @default.
- W4387477625 hasRelatedWork W2532377291 @default.
- W4387477625 hasRelatedWork W2535923857 @default.
- W4387477625 hasRelatedWork W2989004599 @default.
- W4387477625 hasRelatedWork W2999276620 @default.
- W4387477625 hasRelatedWork W3094080214 @default.
- W4387477625 hasVolume "80" @default.
- W4387477625 isParatext "false" @default.
- W4387477625 isRetracted "false" @default.
- W4387477625 workType "article" @default.