Matches in SemOpenAlex for { <https://semopenalex.org/work/W4297841714> ?p ?o ?g. }
Showing items 1 to 79 of
79
with 100 items per page.
- W4297841714 abstract "Recent advances in neural text-to-speech research have been dominated by two-stage pipelines utilizing low-level intermediate speech representation such as mel-spectrograms. However, such predetermined features are fundamentally limited, because they do not allow to exploit the full potential of a data-driven approach through learning hidden representations. For this reason, several end-to-end methods have been proposed. However, such models are harder to train and require a large number of high-quality recordings with transcriptions. Here, we propose WavThruVec - a two-stage architecture that resolves the bottleneck by using high-dimensional Wav2Vec 2.0 embeddings as intermediate speech representation. Since these hidden activations provide high-level linguistic features, they are more robust to noise. That allows us to utilize annotated speech datasets of a lower quality to train the first-stage module. At the same time, the second-stage component can be trained on large-scale untranscribed audio corpora, as Wav2Vec 2.0 embeddings are already time-aligned. This results in an increased generalization capability to out-of-vocabulary words, as well as to a better generalization to unseen speakers. We show that the proposed model not only matches the quality of state-of-the-art neural models, but also presents useful properties enabling tasks like voice conversion or zero-shot synthesis." @default.
- W4297841714 created "2022-10-01" @default.
- W4297841714 creator A5021757437 @default.
- W4297841714 creator A5045500067 @default.
- W4297841714 creator A5075077837 @default.
- W4297841714 creator A5087162974 @default.
- W4297841714 date "2022-09-18" @default.
- W4297841714 modified "2023-09-28" @default.
- W4297841714 title "WavThruVec: Latent speech representation as intermediate features for neural speech synthesis" @default.
- W4297841714 doi "https://doi.org/10.21437/interspeech.2022-10797" @default.
- W4297841714 hasPublicationYear "2022" @default.
- W4297841714 type Work @default.
- W4297841714 citedByCount "4" @default.
- W4297841714 countsByYear W42978417142023 @default.
- W4297841714 crossrefType "proceedings-article" @default.
- W4297841714 hasAuthorship W4297841714A5021757437 @default.
- W4297841714 hasAuthorship W4297841714A5045500067 @default.
- W4297841714 hasAuthorship W4297841714A5075077837 @default.
- W4297841714 hasAuthorship W4297841714A5087162974 @default.
- W4297841714 hasBestOaLocation W42978417142 @default.
- W4297841714 hasConcept C115961682 @default.
- W4297841714 hasConcept C134306372 @default.
- W4297841714 hasConcept C138885662 @default.
- W4297841714 hasConcept C149635348 @default.
- W4297841714 hasConcept C14999030 @default.
- W4297841714 hasConcept C154945302 @default.
- W4297841714 hasConcept C177148314 @default.
- W4297841714 hasConcept C17744445 @default.
- W4297841714 hasConcept C199539241 @default.
- W4297841714 hasConcept C204201278 @default.
- W4297841714 hasConcept C2776359362 @default.
- W4297841714 hasConcept C2777601683 @default.
- W4297841714 hasConcept C2780513914 @default.
- W4297841714 hasConcept C28490314 @default.
- W4297841714 hasConcept C33923547 @default.
- W4297841714 hasConcept C41008148 @default.
- W4297841714 hasConcept C41895202 @default.
- W4297841714 hasConcept C45273575 @default.
- W4297841714 hasConcept C61328038 @default.
- W4297841714 hasConcept C94625758 @default.
- W4297841714 hasConcept C99498987 @default.
- W4297841714 hasConceptScore W4297841714C115961682 @default.
- W4297841714 hasConceptScore W4297841714C134306372 @default.
- W4297841714 hasConceptScore W4297841714C138885662 @default.
- W4297841714 hasConceptScore W4297841714C149635348 @default.
- W4297841714 hasConceptScore W4297841714C14999030 @default.
- W4297841714 hasConceptScore W4297841714C154945302 @default.
- W4297841714 hasConceptScore W4297841714C177148314 @default.
- W4297841714 hasConceptScore W4297841714C17744445 @default.
- W4297841714 hasConceptScore W4297841714C199539241 @default.
- W4297841714 hasConceptScore W4297841714C204201278 @default.
- W4297841714 hasConceptScore W4297841714C2776359362 @default.
- W4297841714 hasConceptScore W4297841714C2777601683 @default.
- W4297841714 hasConceptScore W4297841714C2780513914 @default.
- W4297841714 hasConceptScore W4297841714C28490314 @default.
- W4297841714 hasConceptScore W4297841714C33923547 @default.
- W4297841714 hasConceptScore W4297841714C41008148 @default.
- W4297841714 hasConceptScore W4297841714C41895202 @default.
- W4297841714 hasConceptScore W4297841714C45273575 @default.
- W4297841714 hasConceptScore W4297841714C61328038 @default.
- W4297841714 hasConceptScore W4297841714C94625758 @default.
- W4297841714 hasConceptScore W4297841714C99498987 @default.
- W4297841714 hasLocation W42978417141 @default.
- W4297841714 hasLocation W42978417142 @default.
- W4297841714 hasOpenAccess W4297841714 @default.
- W4297841714 hasPrimaryLocation W42978417141 @default.
- W4297841714 hasRelatedWork W1657825509 @default.
- W4297841714 hasRelatedWork W2485008119 @default.
- W4297841714 hasRelatedWork W2567608124 @default.
- W4297841714 hasRelatedWork W2736031499 @default.
- W4297841714 hasRelatedWork W4224263508 @default.
- W4297841714 hasRelatedWork W4225280403 @default.
- W4297841714 hasRelatedWork W4289829928 @default.
- W4297841714 hasRelatedWork W4310471687 @default.
- W4297841714 hasRelatedWork W4385474305 @default.
- W4297841714 hasRelatedWork W642007152 @default.
- W4297841714 isParatext "false" @default.
- W4297841714 isRetracted "false" @default.
- W4297841714 workType "article" @default.