Matches in SemOpenAlex for { <https://semopenalex.org/work/W4225272861> ?p ?o ?g. }
- W4225272861 endingPage "65" @default.
- W4225272861 startingPage "53" @default.
- W4225272861 abstract "Understanding and controlling latent representations in deep generative models is a challenging yet important problem for analyzing, transforming and generating various types of data. In speech processing, inspiring from the anatomical mechanisms of phonation, the source-filter model considers that speech signals are produced from a few independent and physically meaningful continuous latent factors, among which the fundamental frequency $f_0$ and the formants are of primary importance. In this work, we start from a variational autoencoder (VAE) trained in an unsupervised manner on a large dataset of unlabeled natural speech signals, and we show that the source-filter model of speech production naturally arises as orthogonal subspaces of the VAE latent space. Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we propose a method to identify the latent subspaces encoding $f_0$ and the first three formant frequencies, we show that these subspaces are orthogonal, and based on this orthogonality, we develop a method to accurately and independently control the source-filter speech factors within the latent subspaces. Without requiring additional information such as text or human-labeled data, this results in a deep generative model of speech spectrograms that is conditioned on $f_0$ and the formant frequencies, and which is applied to the transformation speech signals. Finally, we also propose a robust $f_0$ estimation method that exploits the projection of a speech signal onto the learned latent subspace associated with $f_0$." @default.
- W4225272861 created "2022-05-04" @default.
- W4225272861 creator A5020392160 @default.
- W4225272861 creator A5021724743 @default.
- W4225272861 creator A5040347439 @default.
- W4225272861 creator A5045593473 @default.
- W4225272861 creator A5066621495 @default.
- W4225272861 date "2023-03-01" @default.
- W4225272861 modified "2023-09-26" @default.
- W4225272861 title "Learning and controlling the source-filter representation of speech with a variational autoencoder" @default.
- W4225272861 cites W1966264494 @default.
- W4225272861 cites W1973756445 @default.
- W4225272861 cites W1975079546 @default.
- W4225272861 cites W2001050541 @default.
- W4225272861 cites W2070696251 @default.
- W4225272861 cites W2077446647 @default.
- W4225272861 cites W2088432713 @default.
- W4225272861 cites W2088632109 @default.
- W4225272861 cites W2109114466 @default.
- W4225272861 cites W2116428736 @default.
- W4225272861 cites W2118774185 @default.
- W4225272861 cites W2150980750 @default.
- W4225272861 cites W2152205330 @default.
- W4225272861 cites W2163922914 @default.
- W4225272861 cites W2164764235 @default.
- W4225272861 cites W2191779130 @default.
- W4225272861 cites W2294798173 @default.
- W4225272861 cites W2403891086 @default.
- W4225272861 cites W2428180336 @default.
- W4225272861 cites W2471520273 @default.
- W4225272861 cites W2515020857 @default.
- W4225272861 cites W2532494225 @default.
- W4225272861 cites W2766672686 @default.
- W4225272861 cites W2883879995 @default.
- W4225272861 cites W2884225676 @default.
- W4225272861 cites W2901552243 @default.
- W4225272861 cites W2911579794 @default.
- W4225272861 cites W2922004249 @default.
- W4225272861 cites W2929299742 @default.
- W4225272861 cites W2962691331 @default.
- W4225272861 cites W2962850167 @default.
- W4225272861 cites W2962866891 @default.
- W4225272861 cites W2963091184 @default.
- W4225272861 cites W2963300588 @default.
- W4225272861 cites W2963375116 @default.
- W4225272861 cites W2972516210 @default.
- W4225272861 cites W2973046048 @default.
- W4225272861 cites W2979850772 @default.
- W4225272861 cites W2990440871 @default.
- W4225272861 cites W3003162010 @default.
- W4225272861 cites W3005621653 @default.
- W4225272861 cites W3094843121 @default.
- W4225272861 cites W3097206152 @default.
- W4225272861 cites W3097549261 @default.
- W4225272861 cites W3100968126 @default.
- W4225272861 cites W3130335839 @default.
- W4225272861 cites W3131332223 @default.
- W4225272861 cites W3174264304 @default.
- W4225272861 cites W3217536461 @default.
- W4225272861 cites W4235716345 @default.
- W4225272861 doi "https://doi.org/10.1016/j.specom.2023.02.005" @default.
- W4225272861 hasPublicationYear "2023" @default.
- W4225272861 type Work @default.
- W4225272861 citedByCount "1" @default.
- W4225272861 countsByYear W42252728612022 @default.
- W4225272861 crossrefType "journal-article" @default.
- W4225272861 hasAuthorship W4225272861A5020392160 @default.
- W4225272861 hasAuthorship W4225272861A5021724743 @default.
- W4225272861 hasAuthorship W4225272861A5040347439 @default.
- W4225272861 hasAuthorship W4225272861A5045593473 @default.
- W4225272861 hasAuthorship W4225272861A5066621495 @default.
- W4225272861 hasBestOaLocation W42252728613 @default.
- W4225272861 hasConcept C101738243 @default.
- W4225272861 hasConcept C106131492 @default.
- W4225272861 hasConcept C12362212 @default.
- W4225272861 hasConcept C14999030 @default.
- W4225272861 hasConcept C153180895 @default.
- W4225272861 hasConcept C154945302 @default.
- W4225272861 hasConcept C158215666 @default.
- W4225272861 hasConcept C167966045 @default.
- W4225272861 hasConcept C17137986 @default.
- W4225272861 hasConcept C2524010 @default.
- W4225272861 hasConcept C2779581591 @default.
- W4225272861 hasConcept C28490314 @default.
- W4225272861 hasConcept C31972630 @default.
- W4225272861 hasConcept C32834561 @default.
- W4225272861 hasConcept C33923547 @default.
- W4225272861 hasConcept C39890363 @default.
- W4225272861 hasConcept C41008148 @default.
- W4225272861 hasConcept C45273575 @default.
- W4225272861 hasConcept C50644808 @default.
- W4225272861 hasConceptScore W4225272861C101738243 @default.
- W4225272861 hasConceptScore W4225272861C106131492 @default.
- W4225272861 hasConceptScore W4225272861C12362212 @default.
- W4225272861 hasConceptScore W4225272861C14999030 @default.
- W4225272861 hasConceptScore W4225272861C153180895 @default.
- W4225272861 hasConceptScore W4225272861C154945302 @default.
- W4225272861 hasConceptScore W4225272861C158215666 @default.