Skip to main content

Research on the Distal Supervised Learning Model of Speech Inversion

  • Conference paper
  • 4753 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7473))

Abstract

To the problem that articulatory information is not readily available in typical speaker-listener situations, a method that estimates such information from the acoustic signal was proposed, namely speech inversion. Distal supervised learning (DSL) was selected as one of machine learning strategies for speech inversion to study. Eight tract variables were used as articulatory information to model speech dynamics, and the experiment’s background and theoretical foundation of distal supervised learning also were analyzed. Besides a global optimization approach was proposed and the results when speech signal is parameterized as acoustic parameters (APs) were compared with as mel-frequency cepstral coefficients (MFCCs). The results showed that distal supervised learning has a good estimation performance for tract variables.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Neiberg, D., Ananthakrishnan, G., Engwall, O.: The Acoustic to Articulation Mapping: Non-linear or Non-unique. In: Proc. Interspeech, pp. 1485–1488 (2008)

    Google Scholar 

  2. Zhuang, X., Nam, H., Hasegawa-Johnson, M., Goldstein, L., Saltzman, E.: The Entropy of Articulatory Phonological Code: Recognizing Gestures from Tract Variables. In: Proc. Interspeech, pp. 1489–1492 (2008)

    Google Scholar 

  3. Zhuang, X., Nam, H., Hasegawa-Johnson, M., Goldstein, L., Saltzman, E.: Articulatory Phonological Code for Word Classification. In: Proc. Interspeech, pp. 2763–2766 (2009)

    Google Scholar 

  4. Mitra, V., Nam, H., Espy-Wilson, C.Y., Saltzman, E., Goldstein, L.: Retrieving Tract Variables from Acoustics: a Comparison of Different Machine Learning Strategies. IEEE Journal of Selected Topics in Signal Processing 4, 1027–1045 (2010)

    Article  Google Scholar 

  5. Katsamanis, A., Papandreou, G., Maragos, P.: Face Active Appearance Modeling and Speech Acoustic Information to Recover Articulation. IEEE Trans. Audio, Speech, Lang. Process. 17(3), 411–422 (2009)

    Article  Google Scholar 

  6. Mitra, V., Özbek, I., Nam, H., Zhou, X., Espy-Wilson, C.: From Acoustics to Vocal Tract Time Functions. In: Proc. ICASSP, pp. 4497–4500 (2009)

    Google Scholar 

  7. Byrd, D., Saltzman, E.: The Elastic Phrase: Modeling the Dynamics of Boundary-Adjacent Lengthening. J. Phonetics 31(2), 149–180 (2003)

    Article  Google Scholar 

  8. Mitra, V., Nam, H., Espy-Wilson, C., Saltzman, E., Goldstein, L.: Noise Robustness of Tract Variables and their Application to Speech Recognition. In: Proc. Interspeech, U.K., pp. 2759–2762 (2009)

    Google Scholar 

  9. Nam, H., Goldstein, L., Saltzman, E., Byrd, D.: Tada: An Enhanced, Portable Task Dynamics Model in Matlab. J. Acoust. Soc. Amer. 115(5-2), 2430 (2004)

    Google Scholar 

  10. Juneja, A.: Speech Recognition Based on Phonetic Features and Acoustic Landmarks. Ph. D. dissertation, Univ. of MD, College Park (2004)

    Google Scholar 

  11. He, X., Deng, L.: Discriminative Learning for Speech Processing. In: Juang, G.H. (ed.). Morgan & Claypool, San Mateo (2008)

    Google Scholar 

  12. Mitra, V., Nam, H., Espy-Wilson, C.: A Step in the Realization of a Speech Recognition System Based on Gestural Phonology and Landmarks. In: Proc. 157th Meeting ASA, Portland, vol. 125, p. 2530 (2009); J. Acoust. Soc. Amer.

    Google Scholar 

  13. Jordan, M.I., Rumelhart, D.E.: Forward Models–Supervisd Learning with a Distal Teacher. Cogn. Sci. 16, 307–354 (1992)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, Y., Zhang, S. (2012). Research on the Distal Supervised Learning Model of Speech Inversion. In: Liu, B., Ma, M., Chang, J. (eds) Information Computing and Applications. ICICA 2012. Lecture Notes in Computer Science, vol 7473. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34062-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34062-8_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34061-1

  • Online ISBN: 978-3-642-34062-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics