Matches in SemOpenAlex for { <https://semopenalex.org/work/W4294721141> ?p ?o ?g. }
Showing items 1 to 77 of
77
with 100 items per page.
- W4294721141 abstract "Expressive synthetic speech is essential for many human-computer interaction and audio broadcast scenarios, and thus synthesizing expressive speech has attracted much attention in recent years. Previous methods performed the expressive speech synthesis either with explicit labels or with a fixed-length style embedding extracted from reference audio, both of which can only learn an average style and thus ignores the multi-scale nature of speech prosody. In this paper, we propose MsEmoTTS, a multi-scale emotional speech synthesis framework, to model the emotion from different levels. Specifically, the proposed method is a typical attention-based sequence-to-sequence model and with proposed three modules, including global-level emotion presenting module (GM), utterance-level emotion presenting module (UM), and local-level emotion presenting module (LM), to model the global emotion category, utterance-level emotion variation, and syllable-level emotion strength, respectively. In addition to modeling the emotion from different levels, the proposed method also allows us to synthesize emotional speech in different ways, i.e., transferring the emotion from reference audio, predicting the emotion from input text, and controlling the emotion strength manually. Extensive experiments conducted on a Chinese emotional speech corpus demonstrate that the proposed method outperforms the compared reference audio-based and text-based emotional speech synthesis methods on the emotion transfer speech synthesis and text-based emotion prediction speech synthesis respectively. Besides, the experiments also show that the proposed method can control the emotion expressions flexibly. Detailed analysis shows the effectiveness of each module and the good design of the proposed method." @default.
- W4294721141 created "2022-09-06" @default.
- W4294721141 creator A5001699296 @default.
- W4294721141 creator A5029817092 @default.
- W4294721141 creator A5038835055 @default.
- W4294721141 creator A5072886327 @default.
- W4294721141 date "2022-01-17" @default.
- W4294721141 modified "2023-10-16" @default.
- W4294721141 title "MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis" @default.
- W4294721141 doi "https://doi.org/10.48550/arxiv.2201.06460" @default.
- W4294721141 hasPublicationYear "2022" @default.
- W4294721141 type Work @default.
- W4294721141 citedByCount "0" @default.
- W4294721141 crossrefType "posted-content" @default.
- W4294721141 hasAuthorship W4294721141A5001699296 @default.
- W4294721141 hasAuthorship W4294721141A5029817092 @default.
- W4294721141 hasAuthorship W4294721141A5038835055 @default.
- W4294721141 hasAuthorship W4294721141A5072886327 @default.
- W4294721141 hasBestOaLocation W42947211411 @default.
- W4294721141 hasConcept C109089402 @default.
- W4294721141 hasConcept C121332964 @default.
- W4294721141 hasConcept C14999030 @default.
- W4294721141 hasConcept C154945302 @default.
- W4294721141 hasConcept C166957645 @default.
- W4294721141 hasConcept C204321447 @default.
- W4294721141 hasConcept C206310091 @default.
- W4294721141 hasConcept C2775852435 @default.
- W4294721141 hasConcept C2776445246 @default.
- W4294721141 hasConcept C2778112365 @default.
- W4294721141 hasConcept C2778755073 @default.
- W4294721141 hasConcept C28490314 @default.
- W4294721141 hasConcept C41008148 @default.
- W4294721141 hasConcept C41608201 @default.
- W4294721141 hasConcept C542774811 @default.
- W4294721141 hasConcept C54355233 @default.
- W4294721141 hasConcept C62520636 @default.
- W4294721141 hasConcept C73411735 @default.
- W4294721141 hasConcept C86803240 @default.
- W4294721141 hasConcept C91863865 @default.
- W4294721141 hasConcept C95457728 @default.
- W4294721141 hasConceptScore W4294721141C109089402 @default.
- W4294721141 hasConceptScore W4294721141C121332964 @default.
- W4294721141 hasConceptScore W4294721141C14999030 @default.
- W4294721141 hasConceptScore W4294721141C154945302 @default.
- W4294721141 hasConceptScore W4294721141C166957645 @default.
- W4294721141 hasConceptScore W4294721141C204321447 @default.
- W4294721141 hasConceptScore W4294721141C206310091 @default.
- W4294721141 hasConceptScore W4294721141C2775852435 @default.
- W4294721141 hasConceptScore W4294721141C2776445246 @default.
- W4294721141 hasConceptScore W4294721141C2778112365 @default.
- W4294721141 hasConceptScore W4294721141C2778755073 @default.
- W4294721141 hasConceptScore W4294721141C28490314 @default.
- W4294721141 hasConceptScore W4294721141C41008148 @default.
- W4294721141 hasConceptScore W4294721141C41608201 @default.
- W4294721141 hasConceptScore W4294721141C542774811 @default.
- W4294721141 hasConceptScore W4294721141C54355233 @default.
- W4294721141 hasConceptScore W4294721141C62520636 @default.
- W4294721141 hasConceptScore W4294721141C73411735 @default.
- W4294721141 hasConceptScore W4294721141C86803240 @default.
- W4294721141 hasConceptScore W4294721141C91863865 @default.
- W4294721141 hasConceptScore W4294721141C95457728 @default.
- W4294721141 hasLocation W42947211411 @default.
- W4294721141 hasOpenAccess W4294721141 @default.
- W4294721141 hasPrimaryLocation W42947211411 @default.
- W4294721141 hasRelatedWork W1866214668 @default.
- W4294721141 hasRelatedWork W1964529244 @default.
- W4294721141 hasRelatedWork W2080748494 @default.
- W4294721141 hasRelatedWork W2181773877 @default.
- W4294721141 hasRelatedWork W2364136279 @default.
- W4294721141 hasRelatedWork W2405476719 @default.
- W4294721141 hasRelatedWork W3140955690 @default.
- W4294721141 hasRelatedWork W4210777104 @default.
- W4294721141 hasRelatedWork W4296068977 @default.
- W4294721141 hasRelatedWork W2465421051 @default.
- W4294721141 isParatext "false" @default.
- W4294721141 isRetracted "false" @default.
- W4294721141 workType "article" @default.