Matches in SemOpenAlex for { <https://semopenalex.org/work/W3204661432> ?p ?o ?g. }
Showing items 1 to 96 of
96
with 100 items per page.
- W3204661432 abstract "Non-autoregressive text-to-speech (NAR-TTS) models such as FastSpeech 2 and Glow-TTS can synthesize high-quality speech from the given text in parallel. After analyzing two kinds of generative NAR-TTS models (VAE and normalizing flow), we find that: VAE is good at capturing the long-range semantics features (e.g., prosody) even with small model size but suffers from blurry and unnatural results; and normalizing flow is good at reconstructing the frequency bin-wise details but performs poorly when the number of model parameters is limited. Inspired by these observations, to generate diverse speech with natural details and rich prosody using a lightweight architecture, we propose PortaSpeech, a portable and high-quality generative text-to-speech model. Specifically, 1) to model both the prosody and mel-spectrogram details accurately, we adopt a lightweight VAE with an enhanced prior followed by a flow-based post-net with strong conditional inputs as the main architecture. 2) To further compress the model size and memory footprint, we introduce the grouped parameter sharing mechanism to the affine coupling layers in the post-net. 3) To improve the expressiveness of synthesized speech and reduce the dependency on accurate fine-grained alignment between text and speech, we propose a linguistic encoder with mixture alignment combining hard inter-word alignment and soft intra-word alignment, which explicitly extracts word-level semantic information. Experimental results show that PortaSpeech outperforms other TTS models in both voice quality and prosody modeling in terms of subjective and objective evaluation metrics, and shows only a slight performance degradation when reducing the model parameters to 6.7M (about 4x model size and 3x runtime memory compression ratio compared with FastSpeech 2). Our extensive ablation studies demonstrate that each design in PortaSpeech is effective." @default.
- W3204661432 created "2021-10-11" @default.
- W3204661432 creator A5065126806 @default.
- W3204661432 creator A5079260216 @default.
- W3204661432 creator A5088179161 @default.
- W3204661432 date "2021-09-30" @default.
- W3204661432 modified "2023-09-27" @default.
- W3204661432 title "PortaSpeech: Portable and High-Quality Generative Text-to-Speech" @default.
- W3204661432 cites W1583912456 @default.
- W3204661432 cites W2409550820 @default.
- W3204661432 cites W2517513811 @default.
- W3204661432 cites W2519091744 @default.
- W3204661432 cites W2591927543 @default.
- W3204661432 cites W2608207374 @default.
- W3204661432 cites W2766812927 @default.
- W3204661432 cites W2789541106 @default.
- W3204661432 cites W2903739847 @default.
- W3204661432 cites W2962738009 @default.
- W3204661432 cites W2963139417 @default.
- W3204661432 cites W2963145887 @default.
- W3204661432 cites W2963300588 @default.
- W3204661432 cites W2963403868 @default.
- W3204661432 cites W2964000524 @default.
- W3204661432 cites W2964243274 @default.
- W3204661432 cites W2970730223 @default.
- W3204661432 cites W299440670 @default.
- W3204661432 cites W3005556965 @default.
- W3204661432 cites W3025793647 @default.
- W3204661432 cites W3026874504 @default.
- W3204661432 cites W3032317609 @default.
- W3204661432 cites W3033913438 @default.
- W3204661432 cites W3034949308 @default.
- W3204661432 cites W3035083561 @default.
- W3204661432 cites W3035289074 @default.
- W3204661432 cites W3048173247 @default.
- W3204661432 cites W3098403858 @default.
- W3204661432 cites W3125481789 @default.
- W3204661432 cites W3127477255 @default.
- W3204661432 cites W3130016944 @default.
- W3204661432 hasPublicationYear "2021" @default.
- W3204661432 type Work @default.
- W3204661432 sameAs 3204661432 @default.
- W3204661432 citedByCount "0" @default.
- W3204661432 crossrefType "posted-content" @default.
- W3204661432 hasAuthorship W3204661432A5065126806 @default.
- W3204661432 hasAuthorship W3204661432A5079260216 @default.
- W3204661432 hasAuthorship W3204661432A5088179161 @default.
- W3204661432 hasConcept C138885662 @default.
- W3204661432 hasConcept C14999030 @default.
- W3204661432 hasConcept C154945302 @default.
- W3204661432 hasConcept C167966045 @default.
- W3204661432 hasConcept C204321447 @default.
- W3204661432 hasConcept C28490314 @default.
- W3204661432 hasConcept C39890363 @default.
- W3204661432 hasConcept C41008148 @default.
- W3204661432 hasConcept C41895202 @default.
- W3204661432 hasConcept C542774811 @default.
- W3204661432 hasConcept C90805587 @default.
- W3204661432 hasConceptScore W3204661432C138885662 @default.
- W3204661432 hasConceptScore W3204661432C14999030 @default.
- W3204661432 hasConceptScore W3204661432C154945302 @default.
- W3204661432 hasConceptScore W3204661432C167966045 @default.
- W3204661432 hasConceptScore W3204661432C204321447 @default.
- W3204661432 hasConceptScore W3204661432C28490314 @default.
- W3204661432 hasConceptScore W3204661432C39890363 @default.
- W3204661432 hasConceptScore W3204661432C41008148 @default.
- W3204661432 hasConceptScore W3204661432C41895202 @default.
- W3204661432 hasConceptScore W3204661432C542774811 @default.
- W3204661432 hasConceptScore W3204661432C90805587 @default.
- W3204661432 hasLocation W32046614321 @default.
- W3204661432 hasOpenAccess W3204661432 @default.
- W3204661432 hasPrimaryLocation W32046614321 @default.
- W3204661432 hasRelatedWork W1544896742 @default.
- W3204661432 hasRelatedWork W2185741472 @default.
- W3204661432 hasRelatedWork W2796010067 @default.
- W3204661432 hasRelatedWork W2945993525 @default.
- W3204661432 hasRelatedWork W2973158936 @default.
- W3204661432 hasRelatedWork W2975414524 @default.
- W3204661432 hasRelatedWork W3015440759 @default.
- W3204661432 hasRelatedWork W3025528898 @default.
- W3204661432 hasRelatedWork W3026874504 @default.
- W3204661432 hasRelatedWork W3112470437 @default.
- W3204661432 hasRelatedWork W3132220150 @default.
- W3204661432 hasRelatedWork W3133185996 @default.
- W3204661432 hasRelatedWork W3134835907 @default.
- W3204661432 hasRelatedWork W3163677684 @default.
- W3204661432 hasRelatedWork W3164064632 @default.
- W3204661432 hasRelatedWork W3169905056 @default.
- W3204661432 hasRelatedWork W3170320568 @default.
- W3204661432 hasRelatedWork W3196001064 @default.
- W3204661432 hasRelatedWork W3206353686 @default.
- W3204661432 hasRelatedWork W3211889635 @default.
- W3204661432 isParatext "false" @default.
- W3204661432 isRetracted "false" @default.
- W3204661432 magId "3204661432" @default.
- W3204661432 workType "article" @default.