Abstract
To the problem that articulatory information is not readily available in typical speaker-listener situations, a method that estimates such information from the acoustic signal was proposed, namely speech inversion. Distal supervised learning (DSL) was selected as one of machine learning strategies for speech inversion to study. Eight tract variables were used as articulatory information to model speech dynamics, and the experiment’s background and theoretical foundation of distal supervised learning also were analyzed. Besides a global optimization approach was proposed and the results when speech signal is parameterized as acoustic parameters (APs) were compared with as mel-frequency cepstral coefficients (MFCCs). The results showed that distal supervised learning has a good estimation performance for tract variables.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Neiberg, D., Ananthakrishnan, G., Engwall, O.: The Acoustic to Articulation Mapping: Non-linear or Non-unique. In: Proc. Interspeech, pp. 1485–1488 (2008)
Zhuang, X., Nam, H., Hasegawa-Johnson, M., Goldstein, L., Saltzman, E.: The Entropy of Articulatory Phonological Code: Recognizing Gestures from Tract Variables. In: Proc. Interspeech, pp. 1489–1492 (2008)
Zhuang, X., Nam, H., Hasegawa-Johnson, M., Goldstein, L., Saltzman, E.: Articulatory Phonological Code for Word Classification. In: Proc. Interspeech, pp. 2763–2766 (2009)
Mitra, V., Nam, H., Espy-Wilson, C.Y., Saltzman, E., Goldstein, L.: Retrieving Tract Variables from Acoustics: a Comparison of Different Machine Learning Strategies. IEEE Journal of Selected Topics in Signal Processing 4, 1027–1045 (2010)
Katsamanis, A., Papandreou, G., Maragos, P.: Face Active Appearance Modeling and Speech Acoustic Information to Recover Articulation. IEEE Trans. Audio, Speech, Lang. Process. 17(3), 411–422 (2009)
Mitra, V., Özbek, I., Nam, H., Zhou, X., Espy-Wilson, C.: From Acoustics to Vocal Tract Time Functions. In: Proc. ICASSP, pp. 4497–4500 (2009)
Byrd, D., Saltzman, E.: The Elastic Phrase: Modeling the Dynamics of Boundary-Adjacent Lengthening. J. Phonetics 31(2), 149–180 (2003)
Mitra, V., Nam, H., Espy-Wilson, C., Saltzman, E., Goldstein, L.: Noise Robustness of Tract Variables and their Application to Speech Recognition. In: Proc. Interspeech, U.K., pp. 2759–2762 (2009)
Nam, H., Goldstein, L., Saltzman, E., Byrd, D.: Tada: An Enhanced, Portable Task Dynamics Model in Matlab. J. Acoust. Soc. Amer. 115(5-2), 2430 (2004)
Juneja, A.: Speech Recognition Based on Phonetic Features and Acoustic Landmarks. Ph. D. dissertation, Univ. of MD, College Park (2004)
He, X., Deng, L.: Discriminative Learning for Speech Processing. In: Juang, G.H. (ed.). Morgan & Claypool, San Mateo (2008)
Mitra, V., Nam, H., Espy-Wilson, C.: A Step in the Realization of a Speech Recognition System Based on Gestural Phonology and Landmarks. In: Proc. 157th Meeting ASA, Portland, vol. 125, p. 2530 (2009); J. Acoust. Soc. Amer.
Jordan, M.I., Rumelhart, D.E.: Forward Models–Supervisd Learning with a Distal Teacher. Cogn. Sci. 16, 307–354 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, Y., Zhang, S. (2012). Research on the Distal Supervised Learning Model of Speech Inversion. In: Liu, B., Ma, M., Chang, J. (eds) Information Computing and Applications. ICICA 2012. Lecture Notes in Computer Science, vol 7473. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34062-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-34062-8_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34061-1
Online ISBN: 978-3-642-34062-8
eBook Packages: Computer ScienceComputer Science (R0)