Matches in SemOpenAlex for { <https://semopenalex.org/work/W4281743943> ?p ?o ?g. }
Showing items 1 to 67 of
67
with 100 items per page.
- W4281743943 abstract "In this paper, we propose a novel unsupervised text-to-speech (UTTS) framework which does not require text-audio pairs for the TTS acoustic modeling (AM). UTTS is a multi-speaker speech synthesizer that supports zero-shot voice cloning, it is developed from a perspective of disentangled speech representation learning. The framework offers a flexible choice of a speaker's duration model, timbre feature (identity) and content for TTS inference. We leverage recent advancements in self-supervised speech representation learning as well as speech synthesis front-end techniques for system development. Specifically, we employ our recently formulated Conditional Disentangled Sequential Variational Auto-encoder (C-DSVAE) as the backbone UTTS AM, which offers well-structured content representations given unsupervised alignment (UA) as condition during training. For UTTS inference, we utilize a lexicon to map input text to the phoneme sequence, which is expanded to the frame-level forced alignment (FA) with a speaker-dependent duration model. Then, we develop an alignment mapping module that converts FA to UA. Finally, the C-DSVAE, serving as the self-supervised TTS AM, takes the predicted UA and a target speaker embedding to generate the mel spectrogram, which is ultimately converted to waveform with a neural vocoder. We show how our method enables speech synthesis without using a paired TTS corpus. Experiments demonstrate that UTTS can synthesize speech of high naturalness and intelligibility measured by human and objective evaluations. Audio samples are available at our demo page https://neurtts.github.io/utts_demo." @default.
- W4281743943 created "2022-06-13" @default.
- W4281743943 creator A5008786434 @default.
- W4281743943 creator A5034476404 @default.
- W4281743943 creator A5068922218 @default.
- W4281743943 creator A5088059015 @default.
- W4281743943 date "2022-06-06" @default.
- W4281743943 modified "2023-09-30" @default.
- W4281743943 title "UTTS: Unsupervised TTS with Conditional Disentangled Sequential Variational Auto-encoder" @default.
- W4281743943 doi "https://doi.org/10.48550/arxiv.2206.02512" @default.
- W4281743943 hasPublicationYear "2022" @default.
- W4281743943 type Work @default.
- W4281743943 citedByCount "0" @default.
- W4281743943 crossrefType "posted-content" @default.
- W4281743943 hasAuthorship W4281743943A5008786434 @default.
- W4281743943 hasAuthorship W4281743943A5034476404 @default.
- W4281743943 hasAuthorship W4281743943A5068922218 @default.
- W4281743943 hasAuthorship W4281743943A5088059015 @default.
- W4281743943 hasBestOaLocation W42817439431 @default.
- W4281743943 hasConcept C101738243 @default.
- W4281743943 hasConcept C111472728 @default.
- W4281743943 hasConcept C111919701 @default.
- W4281743943 hasConcept C118505674 @default.
- W4281743943 hasConcept C121332964 @default.
- W4281743943 hasConcept C134537474 @default.
- W4281743943 hasConcept C138885662 @default.
- W4281743943 hasConcept C154945302 @default.
- W4281743943 hasConcept C2776214188 @default.
- W4281743943 hasConcept C28490314 @default.
- W4281743943 hasConcept C41008148 @default.
- W4281743943 hasConcept C45273575 @default.
- W4281743943 hasConcept C50644808 @default.
- W4281743943 hasConcept C59404180 @default.
- W4281743943 hasConcept C60048801 @default.
- W4281743943 hasConcept C62520636 @default.
- W4281743943 hasConceptScore W4281743943C101738243 @default.
- W4281743943 hasConceptScore W4281743943C111472728 @default.
- W4281743943 hasConceptScore W4281743943C111919701 @default.
- W4281743943 hasConceptScore W4281743943C118505674 @default.
- W4281743943 hasConceptScore W4281743943C121332964 @default.
- W4281743943 hasConceptScore W4281743943C134537474 @default.
- W4281743943 hasConceptScore W4281743943C138885662 @default.
- W4281743943 hasConceptScore W4281743943C154945302 @default.
- W4281743943 hasConceptScore W4281743943C2776214188 @default.
- W4281743943 hasConceptScore W4281743943C28490314 @default.
- W4281743943 hasConceptScore W4281743943C41008148 @default.
- W4281743943 hasConceptScore W4281743943C45273575 @default.
- W4281743943 hasConceptScore W4281743943C50644808 @default.
- W4281743943 hasConceptScore W4281743943C59404180 @default.
- W4281743943 hasConceptScore W4281743943C60048801 @default.
- W4281743943 hasConceptScore W4281743943C62520636 @default.
- W4281743943 hasLocation W42817439431 @default.
- W4281743943 hasOpenAccess W4281743943 @default.
- W4281743943 hasPrimaryLocation W42817439431 @default.
- W4281743943 hasRelatedWork W2928664166 @default.
- W4281743943 hasRelatedWork W2959758584 @default.
- W4281743943 hasRelatedWork W3136048210 @default.
- W4281743943 hasRelatedWork W4286588216 @default.
- W4281743943 hasRelatedWork W4288282363 @default.
- W4281743943 hasRelatedWork W4300631627 @default.
- W4281743943 hasRelatedWork W4308242712 @default.
- W4281743943 hasRelatedWork W4312095844 @default.
- W4281743943 hasRelatedWork W4323520692 @default.
- W4281743943 hasRelatedWork W4372348584 @default.
- W4281743943 isParatext "false" @default.
- W4281743943 isRetracted "false" @default.
- W4281743943 workType "article" @default.