An Audiovisual Talking Head for Augmented Speech Generation: Models and Animations Based on a Real Speaker’s Articulatory Data

  • Pierre Badin
  • Frédéric Elisei
  • Gérard Bailly
  • Yuliya Tarabalka
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5098)


We present a methodology developed to derive three-dimensional models of speech articulators from volume MRI and multiple view video images acquired on one speaker. Linear component analysis is used to model these highly deformable articulators as the weighted sum of a small number of basic shapes corresponding to the articulators’ degrees of freedom for speech. These models are assembled into an audiovisual talking head that can produce augmented audiovisual speech, i.e. can display usually non visible articulators such as tongue or velum. The talking head is then animated by recovering its control parameters by inversion from the coordinates of a small number of points of the articulators of the same speaker tracked by Electro-Magnetic Articulography. The augmented speech produced points the way to promising applications in the domain of speech therapy for speech retarded children, perception and production rehabilitation of hearing impaired children, and pronunciation training for second language learners.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mills, A.E.: The development of phonology in the blind child. In: Dodd, B., Campbell, R. (eds.) Hearing by eye: the psychology of lipreading, pp. 145–161. Lawrence Erlbaum Associates, London (1987)Google Scholar
  2. 2.
    Vihman, M.M., Macken, M.A., Miller, R., Simmons, H., Miller, J.: From babbling to speech: A re-assessment of the continuity issue, Language, vol. 61, pp. 397–445 (1985)Google Scholar
  3. 3.
    Stoel-Gammon, C.: Prelinguistic vocalizations of Hearing-Impaired and Normally Hearing subjects. A comparison of consonantal inventories. Journal of Speech and Hearing Disorders 53, 302–315 (1988)Google Scholar
  4. 4.
    Mulford, R.: First words of the blind child. In: Smith, M.D., Locke, J.L. (eds.) The emergent lexicon: The child’s development of a linguistic vocabulary, pp. 293–338. Academic Press, New-York (1988)Google Scholar
  5. 5.
    Sumby, W.H., Pollack, I.: Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America 26, 212–215 (1954)CrossRefGoogle Scholar
  6. 6.
    Benoît, C., Le Goff, B.: Audio-visual speech synthesis from French text: Eight years of models, designs and evaluation at the ICP. Speech Communication 26, 117–129 (1998)CrossRefGoogle Scholar
  7. 7.
    Montgomery, D.: Do dyslexics have difficulty accessing articulatory information? Psychological Research 43 (1981)Google Scholar
  8. 8.
    Massaro, D.W., Light, J.: Using visible speech to train perception and production of speech for individuals with hearing loss. Journal of Speech, Language, and Hearing Research 47, 304–320 (2004)CrossRefGoogle Scholar
  9. 9.
    Bälter, O., Engwall, O., Öster, A.-M., Kjellström, H.: Wizard-of-Oz Test of ARTUR - a Computer-Based Speech Training System with Articulation Correction. In: Proceedings of the Seventh International ACM SIGACCESS Conference on Computers and Accessibility, Baltimore (2005)Google Scholar
  10. 10.
    Badin, P., Bailly, G., Revéret, L., Baciu, M., Segebarth, C., Savariaux, C.: Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images. Journal of Phonetics 30, 533–553 (2002)CrossRefGoogle Scholar
  11. 11.
    Badin, P., Serrurier, A.: Three-dimensional linear modeling of tongue: Articulatory data and models. In: Proceedings of the 7th International Seminar on Speech Production, ISSP7, Ubatuba, SP, Brazil (2006)Google Scholar
  12. 12.
    Bailly, G., Elisei, F., Badin, P., Savariaux, C.: Degrees of freedom of facial movements in face-to-face conversational speech. In: Proceedings of the International Workshop on Multimodal Corpora., Genoa, Italy (2006)Google Scholar
  13. 13.
    Serrurier, A., Badin, P.: A three-dimensional articulatory model of nasals based on MRI and CT data. Journal of the Acoustical Society of America 123, 2335–2355 (2008)CrossRefGoogle Scholar
  14. 14.
    Kelso, J.A.S., Saltzman, E.L., Tuller, B.: The dynamical theory of speech production: Data and theory. Journal of Phonetics 14, 29–60 (1986)Google Scholar
  15. 15.
    Bailly, G., Bérar, M., Elisei, F., Odisio, M.: Audiovisual speech synthesis. International Journal of Speech Technology 6, 331–346 (2003)CrossRefGoogle Scholar
  16. 16.
    Perkell, J.S., Cohen, M.M., Svirsky, M.A., Matthies, M.L., Garabieta, I., Jackson, M.T.T.: Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. Journal of the Acoustical Society of America 92, 3078–3096 (1992)CrossRefGoogle Scholar
  17. 17.
    Hoole, P., Nguyen, N.: Electromagnetic Articulography in coarticulation research. Forschungsberichte des Instituts für Phonetik und Spachliche Kommunikation der Universität München, vol. 35, pp. 177–184. FIPKM (1997)Google Scholar
  18. 18.
    Tarabalka, Y., Badin, P., Elisei, F., Bailly, G.: Can you read tongue movements? Evaluation of the contribution of tongue display to speech understanding. In: 1ère Conférence internationale sur l’accessibilité et les systèmes de suppléance aux personnes en situation de handicaps (ASSISTH 2007), Toulouse, France (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Pierre Badin
    • 1
  • Frédéric Elisei
    • 1
  • Gérard Bailly
    • 1
  • Yuliya Tarabalka
    • 1
  1. 1.GIPSA-lab / DPCUMR 5216 CNRS – INPG – UJF – Université Stendhal, GrenobleSaint Martin d’Hères CedexFrance

Personalised recommendations