Matches in SemOpenAlex for { <https://semopenalex.org/work/W4285019302> ?p ?o ?g. }
Showing items 1 to 79 of
79
with 100 items per page.
- W4285019302 abstract "Unconstrained lip-to-speech synthesis aims to generate corresponding speeches from silent videos of talking faces with no restriction on head poses or vocabulary. Current works mainly use sequence-to-sequence models to solve this problem, either in an autoregressive architecture or a flow-based non-autoregressive architecture. However, these models suffer from several drawbacks: 1) Instead of directly generating audios, they use a two-stage pipeline that first generates mel-spectrograms and then reconstructs audios from the spectrograms. This causes cumbersome deployment and degradation of speech quality due to error propagation; 2) The audio reconstruction algorithm used by these models limits the inference speed and audio quality, while neural vocoders are not available for these models since their output spectrograms are not accurate enough; 3) The autoregressive model suffers from high inference latency, while the flow-based model has high memory occupancy: neither of them is efficient enough in both time and memory usage. To tackle these problems, we propose FastLTS, a non-autoregressive end-to-end model which can directly synthesize high-quality speech audios from unconstrained talking videos with low latency, and has a relatively small model size. Besides, different from the widely used 3D-CNN visual frontend for lip movement encoding, we for the first time propose a transformer-based visual frontend for this task. Experiments show that our model achieves $19.76times$ speedup for audio waveform generation compared with the current autoregressive model on input sequences of 3 seconds, and obtains superior audio quality." @default.
- W4285019302 created "2022-07-12" @default.
- W4285019302 creator A5068548095 @default.
- W4285019302 creator A5079260216 @default.
- W4285019302 date "2022-10-10" @default.
- W4285019302 modified "2023-09-30" @default.
- W4285019302 title "FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis" @default.
- W4285019302 cites W1522734439 @default.
- W4285019302 cites W1552314771 @default.
- W4285019302 cites W2015143272 @default.
- W4285019302 cites W2029199293 @default.
- W4285019302 cites W2120847449 @default.
- W4285019302 cites W2131774270 @default.
- W4285019302 cites W2133665775 @default.
- W4285019302 cites W2141998673 @default.
- W4285019302 cites W2516001803 @default.
- W4285019302 cites W2562637781 @default.
- W4285019302 cites W2585824449 @default.
- W4285019302 cites W2962969034 @default.
- W4285019302 cites W2963019222 @default.
- W4285019302 cites W2964199361 @default.
- W4285019302 cites W2964243274 @default.
- W4285019302 cites W2964352155 @default.
- W4285019302 cites W3136416617 @default.
- W4285019302 cites W3160305627 @default.
- W4285019302 cites W3210279979 @default.
- W4285019302 cites W4283809657 @default.
- W4285019302 doi "https://doi.org/10.1145/3503161.3548194" @default.
- W4285019302 hasPublicationYear "2022" @default.
- W4285019302 type Work @default.
- W4285019302 citedByCount "2" @default.
- W4285019302 countsByYear W42850193022023 @default.
- W4285019302 crossrefType "proceedings-article" @default.
- W4285019302 hasAuthorship W4285019302A5068548095 @default.
- W4285019302 hasAuthorship W4285019302A5079260216 @default.
- W4285019302 hasBestOaLocation W42850193022 @default.
- W4285019302 hasConcept C149782125 @default.
- W4285019302 hasConcept C154945302 @default.
- W4285019302 hasConcept C159877910 @default.
- W4285019302 hasConcept C162324750 @default.
- W4285019302 hasConcept C2776214188 @default.
- W4285019302 hasConcept C28490314 @default.
- W4285019302 hasConcept C41008148 @default.
- W4285019302 hasConcept C42536954 @default.
- W4285019302 hasConcept C45273575 @default.
- W4285019302 hasConcept C50644808 @default.
- W4285019302 hasConcept C76155785 @default.
- W4285019302 hasConcept C82876162 @default.
- W4285019302 hasConceptScore W4285019302C149782125 @default.
- W4285019302 hasConceptScore W4285019302C154945302 @default.
- W4285019302 hasConceptScore W4285019302C159877910 @default.
- W4285019302 hasConceptScore W4285019302C162324750 @default.
- W4285019302 hasConceptScore W4285019302C2776214188 @default.
- W4285019302 hasConceptScore W4285019302C28490314 @default.
- W4285019302 hasConceptScore W4285019302C41008148 @default.
- W4285019302 hasConceptScore W4285019302C42536954 @default.
- W4285019302 hasConceptScore W4285019302C45273575 @default.
- W4285019302 hasConceptScore W4285019302C50644808 @default.
- W4285019302 hasConceptScore W4285019302C76155785 @default.
- W4285019302 hasConceptScore W4285019302C82876162 @default.
- W4285019302 hasFunder F4320321001 @default.
- W4285019302 hasLocation W42850193021 @default.
- W4285019302 hasLocation W42850193022 @default.
- W4285019302 hasLocation W42850193023 @default.
- W4285019302 hasOpenAccess W4285019302 @default.
- W4285019302 hasPrimaryLocation W42850193021 @default.
- W4285019302 hasRelatedWork W2089574997 @default.
- W4285019302 hasRelatedWork W2177401844 @default.
- W4285019302 hasRelatedWork W2336868063 @default.
- W4285019302 hasRelatedWork W2902707689 @default.
- W4285019302 hasRelatedWork W2904357295 @default.
- W4285019302 hasRelatedWork W2946749708 @default.
- W4285019302 hasRelatedWork W2990899954 @default.
- W4285019302 hasRelatedWork W3125481789 @default.
- W4285019302 hasRelatedWork W4225851526 @default.
- W4285019302 hasRelatedWork W4327796184 @default.
- W4285019302 isParatext "false" @default.
- W4285019302 isRetracted "false" @default.
- W4285019302 workType "article" @default.