Skip to main content

An Audiovisual Talking Head for Augmented Speech Generation: Models and Animations Based on a Real Speaker’s Articulatory Data

  • Conference paper
Book cover Articulated Motion and Deformable Objects (AMDO 2008)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5098))

Included in the following conference series:

Abstract

We present a methodology developed to derive three-dimensional models of speech articulators from volume MRI and multiple view video images acquired on one speaker. Linear component analysis is used to model these highly deformable articulators as the weighted sum of a small number of basic shapes corresponding to the articulators’ degrees of freedom for speech. These models are assembled into an audiovisual talking head that can produce augmented audiovisual speech, i.e. can display usually non visible articulators such as tongue or velum. The talking head is then animated by recovering its control parameters by inversion from the coordinates of a small number of points of the articulators of the same speaker tracked by Electro-Magnetic Articulography. The augmented speech produced points the way to promising applications in the domain of speech therapy for speech retarded children, perception and production rehabilitation of hearing impaired children, and pronunciation training for second language learners.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mills, A.E.: The development of phonology in the blind child. In: Dodd, B., Campbell, R. (eds.) Hearing by eye: the psychology of lipreading, pp. 145–161. Lawrence Erlbaum Associates, London (1987)

    Google Scholar 

  2. Vihman, M.M., Macken, M.A., Miller, R., Simmons, H., Miller, J.: From babbling to speech: A re-assessment of the continuity issue, Language, vol. 61, pp. 397–445 (1985)

    Google Scholar 

  3. Stoel-Gammon, C.: Prelinguistic vocalizations of Hearing-Impaired and Normally Hearing subjects. A comparison of consonantal inventories. Journal of Speech and Hearing Disorders 53, 302–315 (1988)

    Google Scholar 

  4. Mulford, R.: First words of the blind child. In: Smith, M.D., Locke, J.L. (eds.) The emergent lexicon: The child’s development of a linguistic vocabulary, pp. 293–338. Academic Press, New-York (1988)

    Google Scholar 

  5. Sumby, W.H., Pollack, I.: Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America 26, 212–215 (1954)

    Article  Google Scholar 

  6. Benoît, C., Le Goff, B.: Audio-visual speech synthesis from French text: Eight years of models, designs and evaluation at the ICP. Speech Communication 26, 117–129 (1998)

    Article  Google Scholar 

  7. Montgomery, D.: Do dyslexics have difficulty accessing articulatory information? Psychological Research 43 (1981)

    Google Scholar 

  8. Massaro, D.W., Light, J.: Using visible speech to train perception and production of speech for individuals with hearing loss. Journal of Speech, Language, and Hearing Research 47, 304–320 (2004)

    Article  Google Scholar 

  9. Bälter, O., Engwall, O., Öster, A.-M., Kjellström, H.: Wizard-of-Oz Test of ARTUR - a Computer-Based Speech Training System with Articulation Correction. In: Proceedings of the Seventh International ACM SIGACCESS Conference on Computers and Accessibility, Baltimore (2005)

    Google Scholar 

  10. Badin, P., Bailly, G., Revéret, L., Baciu, M., Segebarth, C., Savariaux, C.: Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images. Journal of Phonetics 30, 533–553 (2002)

    Article  Google Scholar 

  11. Badin, P., Serrurier, A.: Three-dimensional linear modeling of tongue: Articulatory data and models. In: Proceedings of the 7th International Seminar on Speech Production, ISSP7, Ubatuba, SP, Brazil (2006)

    Google Scholar 

  12. Bailly, G., Elisei, F., Badin, P., Savariaux, C.: Degrees of freedom of facial movements in face-to-face conversational speech. In: Proceedings of the International Workshop on Multimodal Corpora., Genoa, Italy (2006)

    Google Scholar 

  13. Serrurier, A., Badin, P.: A three-dimensional articulatory model of nasals based on MRI and CT data. Journal of the Acoustical Society of America 123, 2335–2355 (2008)

    Article  Google Scholar 

  14. Kelso, J.A.S., Saltzman, E.L., Tuller, B.: The dynamical theory of speech production: Data and theory. Journal of Phonetics 14, 29–60 (1986)

    Google Scholar 

  15. Bailly, G., Bérar, M., Elisei, F., Odisio, M.: Audiovisual speech synthesis. International Journal of Speech Technology 6, 331–346 (2003)

    Article  Google Scholar 

  16. Perkell, J.S., Cohen, M.M., Svirsky, M.A., Matthies, M.L., Garabieta, I., Jackson, M.T.T.: Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. Journal of the Acoustical Society of America 92, 3078–3096 (1992)

    Article  Google Scholar 

  17. Hoole, P., Nguyen, N.: Electromagnetic Articulography in coarticulation research. Forschungsberichte des Instituts für Phonetik und Spachliche Kommunikation der Universität München, vol. 35, pp. 177–184. FIPKM (1997)

    Google Scholar 

  18. Tarabalka, Y., Badin, P., Elisei, F., Bailly, G.: Can you read tongue movements? Evaluation of the contribution of tongue display to speech understanding. In: 1ère Conférence internationale sur l’accessibilité et les systèmes de suppléance aux personnes en situation de handicaps (ASSISTH 2007), Toulouse, France (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Francisco J. Perales Robert B. Fisher

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Badin, P., Elisei, F., Bailly, G., Tarabalka, Y. (2008). An Audiovisual Talking Head for Augmented Speech Generation: Models and Animations Based on a Real Speaker’s Articulatory Data. In: Perales, F.J., Fisher, R.B. (eds) Articulated Motion and Deformable Objects. AMDO 2008. Lecture Notes in Computer Science, vol 5098. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70517-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70517-8_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70516-1

  • Online ISBN: 978-3-540-70517-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics