Matches in SemOpenAlex for { <https://semopenalex.org/work/W2145629417> ?p ?o ?g. }
- W2145629417 abstract "I. Background Articulatory modeling is used to incorporate speech production information into automatic speech recognition (ASR) systems. It is believed that solutions to the problems of co-articulation, pronunciation variations, and other speaking style related phenomena rest in how accurately we capture the production process. II. Objective In this work we present a novel approach for speech recognition that incorporates knowledge of the speech production process. We discuss our contribution on going from a purely statistical speech recognizer to one that is motivated by the physical generative process of speech. III. Methods We follow an analysis-by-synthesis approach. Firstly, we attribute a physical meaning to the inner states of the recognition system pertaining to the configurations the human vocal tract takes over time. We utilize a geometric model of the vocal tract, adapt it to our speakers, and derive realistic vocal tract shapes from electromagnetic articulograph (EMA) measurements in the MOCHA database. Secondly, we synthesize speech from the vocal tract configurations using a physiologically-motivated articulatory synthesis model of speech generation. Thirdly, the observation probability of the Hidden Markov Model (HMM), which is used for phone classification, is a function of the distortion between the speech synthesized from the vocal tract configurations and the real speech. The output of each state in the HMM is based on a mixture of density functions. Each density models the distribution of the distortion at the output of each vocal tract configuration. During training, we initialize the model parameters using ground-truth articulatory knowledge. During testing, only the acoustic data is used. IV. Results and conclusion We present phone classification results using our novel dynamic articulatory model and following our adaptation procedure. The table below shows phone error rates (PER) for a female and a male speaker. We use a three-state HMM with different observation densities and initialization techniques. We combine the probabilities of the baseline topology with the new ones. Our novel framework provides a 10.9% relative reduction in phone error rate over our baseline which uses MFCC features. This is achieved using the distortion features with linear discriminant analysis (LDA) and cepstral mean normalization (CMN). We conclude that incorporating articulatory knowledge in the combined statistical framework we devised contributes to lowering the error rates in speech recognition. Features (dimension) Topology Observation Prob / Initialization Female PER Male PER Both PER Improvement Baseline Features MFCC + CMN (13) 3S-128M-HMM Gaussian/VQ 61.6% 55.9% 58.8% Distortion Features (1024) (Prob. Combination with MFCC, α = 0.2) 3S-1024M-HMM Exponential/Flat Sparsity = 21% 57.6% 53.7% 55.7% 5.3% Distortion Features (1024) (Prob. Combination with MFCC, α = 0.2) 3S-1024M-HMM Exponential/EMA Sparsity = 51% 58.3% 53.9% 56.1% 4.6% Adapted Distortion Features (1024) (Prob. Combination with MFCC, α = 0.25) 3S-1024M-HMM Exponential/EMA Sparsity = 51% 58.4% 53.1% 55.7% 5.3% Distortion Features + LDA + CMN (20) (Prob. Combination with MFCC, α = 0.6) 3S-128M-HMM Gaussian/VQ Sparsity = 0% 54.9% 49.8% 52.4% 10.9%" @default.
- W2145629417 created "2016-06-24" @default.
- W2145629417 creator A5029290738 @default.
- W2145629417 date "2012-01-01" @default.
- W2145629417 modified "2023-09-23" @default.
- W2145629417 title "An analysis-by-synthesis approach to vocal tract modeling for robust speech recognition" @default.
- W2145629417 cites W123135133 @default.
- W2145629417 cites W126529980 @default.
- W2145629417 cites W128469914 @default.
- W2145629417 cites W1486632395 @default.
- W2145629417 cites W1509905243 @default.
- W2145629417 cites W1521950135 @default.
- W2145629417 cites W1536990986 @default.
- W2145629417 cites W1560013842 @default.
- W2145629417 cites W1561506132 @default.
- W2145629417 cites W1561709555 @default.
- W2145629417 cites W1583757088 @default.
- W2145629417 cites W182923466 @default.
- W2145629417 cites W190138384 @default.
- W2145629417 cites W1957067028 @default.
- W2145629417 cites W1977253038 @default.
- W2145629417 cites W1977591085 @default.
- W2145629417 cites W1987538184 @default.
- W2145629417 cites W2003890417 @default.
- W2145629417 cites W2020677212 @default.
- W2145629417 cites W2034829178 @default.
- W2145629417 cites W2038299421 @default.
- W2145629417 cites W2039763773 @default.
- W2145629417 cites W2049633694 @default.
- W2145629417 cites W2052382192 @default.
- W2145629417 cites W2065925624 @default.
- W2145629417 cites W2068447135 @default.
- W2145629417 cites W2069618035 @default.
- W2145629417 cites W2089329752 @default.
- W2145629417 cites W2091650508 @default.
- W2145629417 cites W2105613081 @default.
- W2145629417 cites W2110164509 @default.
- W2145629417 cites W2137644731 @default.
- W2145629417 cites W2147462851 @default.
- W2145629417 cites W2148154194 @default.
- W2145629417 cites W2150423550 @default.
- W2145629417 cites W2152553986 @default.
- W2145629417 cites W2165712214 @default.
- W2145629417 cites W2166469361 @default.
- W2145629417 cites W2395052932 @default.
- W2145629417 cites W2406232293 @default.
- W2145629417 cites W2536935545 @default.
- W2145629417 cites W78819235 @default.
- W2145629417 doi "https://doi.org/10.5339/qfarf.2012.aesnp6" @default.
- W2145629417 hasPublicationYear "2012" @default.
- W2145629417 type Work @default.
- W2145629417 sameAs 2145629417 @default.
- W2145629417 citedByCount "1" @default.
- W2145629417 countsByYear W21456294172016 @default.
- W2145629417 crossrefType "proceedings-article" @default.
- W2145629417 hasAuthorship W2145629417A5029290738 @default.
- W2145629417 hasBestOaLocation W21456294172 @default.
- W2145629417 hasConcept C138885662 @default.
- W2145629417 hasConcept C14999030 @default.
- W2145629417 hasConcept C154945302 @default.
- W2145629417 hasConcept C155635449 @default.
- W2145629417 hasConcept C23224414 @default.
- W2145629417 hasConcept C2778707766 @default.
- W2145629417 hasConcept C2780844864 @default.
- W2145629417 hasConcept C28490314 @default.
- W2145629417 hasConcept C41008148 @default.
- W2145629417 hasConcept C41895202 @default.
- W2145629417 hasConcept C43617652 @default.
- W2145629417 hasConcept C47401133 @default.
- W2145629417 hasConcept C61328038 @default.
- W2145629417 hasConceptScore W2145629417C138885662 @default.
- W2145629417 hasConceptScore W2145629417C14999030 @default.
- W2145629417 hasConceptScore W2145629417C154945302 @default.
- W2145629417 hasConceptScore W2145629417C155635449 @default.
- W2145629417 hasConceptScore W2145629417C23224414 @default.
- W2145629417 hasConceptScore W2145629417C2778707766 @default.
- W2145629417 hasConceptScore W2145629417C2780844864 @default.
- W2145629417 hasConceptScore W2145629417C28490314 @default.
- W2145629417 hasConceptScore W2145629417C41008148 @default.
- W2145629417 hasConceptScore W2145629417C41895202 @default.
- W2145629417 hasConceptScore W2145629417C43617652 @default.
- W2145629417 hasConceptScore W2145629417C47401133 @default.
- W2145629417 hasConceptScore W2145629417C61328038 @default.
- W2145629417 hasLocation W21456294171 @default.
- W2145629417 hasLocation W21456294172 @default.
- W2145629417 hasOpenAccess W2145629417 @default.
- W2145629417 hasPrimaryLocation W21456294171 @default.
- W2145629417 hasRelatedWork W1543281296 @default.
- W2145629417 hasRelatedWork W1597000082 @default.
- W2145629417 hasRelatedWork W1647932338 @default.
- W2145629417 hasRelatedWork W192752013 @default.
- W2145629417 hasRelatedWork W1957067028 @default.
- W2145629417 hasRelatedWork W1971795139 @default.
- W2145629417 hasRelatedWork W1993752538 @default.
- W2145629417 hasRelatedWork W2085457662 @default.
- W2145629417 hasRelatedWork W2096734621 @default.
- W2145629417 hasRelatedWork W2108828451 @default.
- W2145629417 hasRelatedWork W2121652828 @default.
- W2145629417 hasRelatedWork W2124455255 @default.
- W2145629417 hasRelatedWork W2147284655 @default.