Matches in SemOpenAlex for { <https://semopenalex.org/work/W2474555457> ?p ?o ?g. }
- W2474555457 abstract "Human speech has a dual nature: the goal of speech is to convey discrete linguistic symbols corresponding to the intended message while the actual speech signal is produced by the continuous and smooth movement of the articulators with rich temporal structures. Such a dual nature has been amazingly utilized by humans in a beneficial way but has presented a big challenge for both speech science and speech technology. This thesis starts with the observation that the continuous or dynamic aspect of human speech is inadequately modeled in current speech technology, especially in state-of-the-art speech recognition systems, while much could be learned from recent advances in speech science. This motivates a study of articulatory dynamics, based on a recently available large scale speech production database that provides simultaneous acoustic and articulatory measurements. Indeed many insights and valuable experiences have been gained from such a study and, as a result, a hidden dynamic model (HDM) that gracefully integrates the discrete and continuous nature of speech is proposed. But it also turns out that articulatory dynamics is highly complicated and can not be captured by simple models, thus the dynamics are very difficult to put into an efficient computational framework for use in speech technology. As a continuing effort to seek internal dynamics of human speech that can reflect the continuous shape change of the vocal tract and benefit the current speech technology, the second part of the thesis turns to a study of vocal-tract-resonance (VTR) dynamics, built upon the insights and experiences gained from studying articulatory dynamics. It verifies that VTR dynamics can be captured by simple dynamic equations, and a highly accurate and efficient piecewise linear mapping from VTR dynamics to the acoustic space is also carefully designed. Two novel VTR tracking methods are developed in this part: one is based on mimicking manual tracking of VTR dynamics by human experts and uses advanced image processing methods (active contours), the other is the natural outcome of formulating a HDM for VTR dynamics and recovering the hidden dynamics by Kalman smoothing. The residual feature resulting from VTR tracking by HDM has also been used as an appended acoustic feature to improve a hidden Markov model (HMM) based phone recognizer on the TIMIT database. The final part of the thesis is dedicated to arguably the most difficult and comprehensive speech processing application: automatic speech recognition (ASR). It first casts the HDM formulated for speech application under the general framework of probabilistic graphical models in machine learning. However, it also becomes clear that exact inference and parameter learning for such a model is NP hard. In order to use HDM for speech recognition, this final part concentrates on developing novel and powerful variational EM algorithms. The effectiveness of the new algorithms invented has been demonstrated by extensive simulation experiments, and special concerns for speech recognition are also discussed." @default.
- W2474555457 created "2016-07-22" @default.
- W2474555457 creator A5018348437 @default.
- W2474555457 date "2004-08-01" @default.
- W2474555457 modified "2023-09-27" @default.
- W2474555457 title "Hidden Dynamic Models for Speech Processing Applications" @default.
- W2474555457 cites W1269046860 @default.
- W2474555457 cites W133632983 @default.
- W2474555457 cites W1480366595 @default.
- W2474555457 cites W1493163583 @default.
- W2474555457 cites W1493547606 @default.
- W2474555457 cites W1508165687 @default.
- W2474555457 cites W1514443597 @default.
- W2474555457 cites W1516111018 @default.
- W2474555457 cites W1519434885 @default.
- W2474555457 cites W152459128 @default.
- W2474555457 cites W1528496016 @default.
- W2474555457 cites W1534477342 @default.
- W2474555457 cites W1534771531 @default.
- W2474555457 cites W1553004968 @default.
- W2474555457 cites W1554663460 @default.
- W2474555457 cites W1557726990 @default.
- W2474555457 cites W1574548864 @default.
- W2474555457 cites W1575431606 @default.
- W2474555457 cites W1579838312 @default.
- W2474555457 cites W1582484699 @default.
- W2474555457 cites W1594223010 @default.
- W2474555457 cites W1603639024 @default.
- W2474555457 cites W167920206 @default.
- W2474555457 cites W1696644198 @default.
- W2474555457 cites W1704572586 @default.
- W2474555457 cites W1770825568 @default.
- W2474555457 cites W181056519 @default.
- W2474555457 cites W1821294298 @default.
- W2474555457 cites W182332445 @default.
- W2474555457 cites W1825077972 @default.
- W2474555457 cites W186915233 @default.
- W2474555457 cites W1877570817 @default.
- W2474555457 cites W1966648662 @default.
- W2474555457 cites W1969483458 @default.
- W2474555457 cites W1970996882 @default.
- W2474555457 cites W1971735090 @default.
- W2474555457 cites W1973499212 @default.
- W2474555457 cites W1974688490 @default.
- W2474555457 cites W1976589379 @default.
- W2474555457 cites W1977690962 @default.
- W2474555457 cites W1978809497 @default.
- W2474555457 cites W1980148047 @default.
- W2474555457 cites W1983628629 @default.
- W2474555457 cites W1988172096 @default.
- W2474555457 cites W1988550205 @default.
- W2474555457 cites W1988790447 @default.
- W2474555457 cites W1989439365 @default.
- W2474555457 cites W1989705153 @default.
- W2474555457 cites W1990369770 @default.
- W2474555457 cites W1990399922 @default.
- W2474555457 cites W1991530362 @default.
- W2474555457 cites W1992406271 @default.
- W2474555457 cites W1994582185 @default.
- W2474555457 cites W1999885698 @default.
- W2474555457 cites W2003123121 @default.
- W2474555457 cites W2007321142 @default.
- W2474555457 cites W2011024334 @default.
- W2474555457 cites W2020493940 @default.
- W2474555457 cites W2020999234 @default.
- W2474555457 cites W2022194925 @default.
- W2474555457 cites W2022314986 @default.
- W2474555457 cites W2022446405 @default.
- W2474555457 cites W2023204136 @default.
- W2474555457 cites W2024060531 @default.
- W2474555457 cites W2024514957 @default.
- W2474555457 cites W2026129786 @default.
- W2474555457 cites W2033565080 @default.
- W2474555457 cites W2043199434 @default.
- W2474555457 cites W2046463977 @default.
- W2474555457 cites W2046575129 @default.
- W2474555457 cites W2048449762 @default.
- W2474555457 cites W2049633694 @default.
- W2474555457 cites W2051347452 @default.
- W2474555457 cites W2051783213 @default.
- W2474555457 cites W2051812123 @default.
- W2474555457 cites W2052378746 @default.
- W2474555457 cites W2053280194 @default.
- W2474555457 cites W2054159638 @default.
- W2474555457 cites W2056133372 @default.
- W2474555457 cites W2057833190 @default.
- W2474555457 cites W2062888779 @default.
- W2474555457 cites W2068447135 @default.
- W2474555457 cites W2068484625 @default.
- W2474555457 cites W2075570558 @default.
- W2474555457 cites W2077023886 @default.
- W2474555457 cites W2077574412 @default.
- W2474555457 cites W2077804127 @default.
- W2474555457 cites W2082094527 @default.
- W2474555457 cites W2082206048 @default.
- W2474555457 cites W2083277003 @default.
- W2474555457 cites W2083393647 @default.
- W2474555457 cites W2085848504 @default.
- W2474555457 cites W2086699924 @default.
- W2474555457 cites W2087070363 @default.