Research on the Distal Supervised Learning Model of Speech Inversion

Chen, Ying; Zhang, Shaobai

doi:10.1007/978-3-642-34062-8_17

Research on the Distal Supervised Learning Model of Speech Inversion

Ying Chen¹⁹ &
Shaobai Zhang¹⁹

Conference paper

4753 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7473))

Abstract

To the problem that articulatory information is not readily available in typical speaker-listener situations, a method that estimates such information from the acoustic signal was proposed, namely speech inversion. Distal supervised learning (DSL) was selected as one of machine learning strategies for speech inversion to study. Eight tract variables were used as articulatory information to model speech dynamics, and the experiment’s background and theoretical foundation of distal supervised learning also were analyzed. Besides a global optimization approach was proposed and the results when speech signal is parameterized as acoustic parameters (APs) were compared with as mel-frequency cepstral coefficients (MFCCs). The results showed that distal supervised learning has a good estimation performance for tract variables.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Neiberg, D., Ananthakrishnan, G., Engwall, O.: The Acoustic to Articulation Mapping: Non-linear or Non-unique. In: Proc. Interspeech, pp. 1485–1488 (2008)
Google Scholar
Zhuang, X., Nam, H., Hasegawa-Johnson, M., Goldstein, L., Saltzman, E.: The Entropy of Articulatory Phonological Code: Recognizing Gestures from Tract Variables. In: Proc. Interspeech, pp. 1489–1492 (2008)
Google Scholar
Zhuang, X., Nam, H., Hasegawa-Johnson, M., Goldstein, L., Saltzman, E.: Articulatory Phonological Code for Word Classification. In: Proc. Interspeech, pp. 2763–2766 (2009)
Google Scholar
Mitra, V., Nam, H., Espy-Wilson, C.Y., Saltzman, E., Goldstein, L.: Retrieving Tract Variables from Acoustics: a Comparison of Different Machine Learning Strategies. IEEE Journal of Selected Topics in Signal Processing 4, 1027–1045 (2010)
Article Google Scholar
Katsamanis, A., Papandreou, G., Maragos, P.: Face Active Appearance Modeling and Speech Acoustic Information to Recover Articulation. IEEE Trans. Audio, Speech, Lang. Process. 17(3), 411–422 (2009)
Article Google Scholar
Mitra, V., Özbek, I., Nam, H., Zhou, X., Espy-Wilson, C.: From Acoustics to Vocal Tract Time Functions. In: Proc. ICASSP, pp. 4497–4500 (2009)
Google Scholar
Byrd, D., Saltzman, E.: The Elastic Phrase: Modeling the Dynamics of Boundary-Adjacent Lengthening. J. Phonetics 31(2), 149–180 (2003)
Article Google Scholar
Mitra, V., Nam, H., Espy-Wilson, C., Saltzman, E., Goldstein, L.: Noise Robustness of Tract Variables and their Application to Speech Recognition. In: Proc. Interspeech, U.K., pp. 2759–2762 (2009)
Google Scholar
Nam, H., Goldstein, L., Saltzman, E., Byrd, D.: Tada: An Enhanced, Portable Task Dynamics Model in Matlab. J. Acoust. Soc. Amer. 115(5-2), 2430 (2004)
Google Scholar
Juneja, A.: Speech Recognition Based on Phonetic Features and Acoustic Landmarks. Ph. D. dissertation, Univ. of MD, College Park (2004)
Google Scholar
He, X., Deng, L.: Discriminative Learning for Speech Processing. In: Juang, G.H. (ed.). Morgan & Claypool, San Mateo (2008)
Google Scholar
Mitra, V., Nam, H., Espy-Wilson, C.: A Step in the Realization of a Speech Recognition System Based on Gestural Phonology and Landmarks. In: Proc. 157th Meeting ASA, Portland, vol. 125, p. 2530 (2009); J. Acoust. Soc. Amer.
Google Scholar
Jordan, M.I., Rumelhart, D.E.: Forward Models–Supervisd Learning with a Distal Teacher. Cogn. Sci. 16, 307–354 (1992)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Department, Nanjing University of Posts and Telecommunications, 210003, Nanjing, Jiangsu, China
Ying Chen & Shaobai Zhang

Authors

Ying Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shaobai Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Science, Hebei United University, 063000, Tangshan, Hebei, China
Baoxiang Liu
Nanyang Technological University, Singapore
Maode Ma
College of Science, Hebei United University, 063009, Tangshan, Hebei, China
Jincai Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y., Zhang, S. (2012). Research on the Distal Supervised Learning Model of Speech Inversion. In: Liu, B., Ma, M., Chang, J. (eds) Information Computing and Applications. ICICA 2012. Lecture Notes in Computer Science, vol 7473. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34062-8_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-34062-8_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34061-1
Online ISBN: 978-3-642-34062-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics