The Visual Computer

, Volume 20, Issue 2–3, pp 86–105 | Cite as

Animating visible speech and facial expressions

  • Jiyong MaEmail author
  • Ronald Cole
original article


We present four techniques for modeling and animating faces starting from a set of morph targets. The first technique involves obtaining parameters to control individual facial components and learning the mapping from one type of parameter to another through machine learning techniques. The second technique is to fuse visible speech and facial expressions in the lower part of a face. The third technique combines coarticulation rules and kernel smoothing techniques. Finally, a new 3D tongue model with flexible and intuitive skeleton controls is presented. The results of eight animated character models demonstrate that these techniques are powerful and effective.


Face animation Visible speech Visual speech 3D tongue model Animated speech 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Albrecht I, Haber J, Seidel H-P (2002) Speech synchronization for physics-based facial animation. In: Proceedings of the international conference in Central Europe on computer graphics, Czech Republic,4 February 2002. Vis Comput Vision 10:9–16 Google Scholar
  2. 2.
    Badin P, Bailly G, Raybaudi M, Segebarth C (1998) A three-dimensional linear articulatory model based on MRI data. In: Mannell RH, Robert-Ribes J (eds) Proceedings of the 5th international conference on spoken language processing, Sydney, Australia, 4 December 1998, 2:417–420 Google Scholar
  3. 3.
    Barr AH (1981) Superquadrics and angle-preserving transformations. IEEE Comput Graph Appl 1(1):11–23 Google Scholar
  4. 4.
    Bavelas JB (1994) Gestures as part of speech: methodological implications. Res Lang Soc Interact 27:201–221 Google Scholar
  5. 5.
    Brand ME (1999) Voice puppetry. In: Proceedings of ACM SIGGRAPH, Los Angeles, 13 August 1999, pp 21–28 Google Scholar
  6. 6.
    Breen AP, Bowers E, Welsh W (1996) An investigation into the generation of mouth shapes for a talking head. In: Proceedings of the international conference on spoken language processing (ICSLP), Philadelphia, 3–6 October 1996, pp 108–111 Google Scholar
  7. 7.
    Bregler C, Covell M, Slaney M (1997) Video rewrite: driving visual speech with audio. In: Proceedings of ACM SIGGRAPH, Los Angeles, 3–8 August 1997, pp 353–360 Google Scholar
  8. 8.
    Cassell J, Vilhjalmsson H, Bickmore T (2001) BEAT: the Behavior Expression Animation Toolkit. In: Proceedings of ACM SIGGRAPH Los Angeles, 12–17 August 2001, pp 477–486 Google Scholar
  9. 9.
    Celniker G, Gossard D (1991) Deformable curve and surface finite-elements for freeform shape design. In: Proceedings of ACM SIGGRAPH, Las Vegas, NV, 28 July–2 August 1991, pp 257–265 Google Scholar
  10. 10.
    Cohen MM, Massaro DW (1993) Modeling coarticulation in synthetic visual speech. In: Thalman NM, Thalman D (eds) Models and techniques in computer animation. Springer, Berlin Heidelberg New York, pp 139–156 Google Scholar
  11. 11.
    Cohn JF, Zlochower A, Lien J, Wu YT, Kanade T (1997) Automated face coding: a computer-vision based method of facial expression analysis. In: Proceedings of the 7th European conference on facial expression, measurement, and meaning, Salzburg, Austria.,16–22 July 1997, pp 329–333 Google Scholar
  12. 12.
    Cole R, Massaro DW, de Villiers J, Rundle B, Shobaki K, Wouters J, Cohen M, Beskow J, Stone P, Connors P, Tarachow A, Solcher D (1999) New tools for interactive speech and language training: using animated conversational agents in the classrooms of profoundly deaf children. In: Proceedings of the ESCA/SOCRRATES workshop on method and tool innovations for speech science education, University College, London, 16–17 April 1999, pp 45–52 Google Scholar
  13. 13.
    Ekman P, Friesen W (1978) Facial action coding system. Consulting Psychologists Press, Palo Alto, CA Google Scholar
  14. 14.
    Engwall O (2000) A 3D tongue model based on MRI data. In: Proceedings of ICSLP, III, Beijing, 16 October 2000, pp 901–904 Google Scholar
  15. 15.
    Eubank RL (1999) Nonparametric regression and spline smoothing. Marcel Dekker, New York Google Scholar
  16. 16.
    Ezzat T, Geiger G, Poggio T (2002) Trainable video realistic speech animation. In: Proceedings of ACM SIGGRAPH 2002, San Antonio, TX, 23–26 July 2002, pp 388–398 Google Scholar
  17. 17.
    Farin G (2002) Curves and surfaces for CAGD, 5th edn. Academic, San Diego, pp 155–175 Google Scholar
  18. 18.
    Guenter B, Grimm C, Wood D, Malvar H, Pighin F (1998) Making faces. In: Proceedings of ACM SIGGRAPH, Orlando, FL, 19–24 July 1998, pp 55–66 Google Scholar
  19. 19.
    Jeffers J, Barley M (1971) Speechreading. Thomas, Springfield, IL Google Scholar
  20. 20.
    Kent RD, Minifie FD (1977) Coarticulation in recent speech production models. J Phonet 5:115–135 Google Scholar
  21. 21.
    Kent RD (1997) The speech sciences. Singular, San Diego Google Scholar
  22. 22.
    King SA, Parent RE (2001) A 3D parametric tongue model for animated speech. J Vis Comput Animat 12(3):107–115 Google Scholar
  23. 23.
    Kleiser J (1989) A fast, efficient, accurate way to represent the human face: state of the art in facial animation.In: Proceedings of ACM SIGGRAPH, Tutorials, Boston, 31 July–4 August 1989, 22:20–33 Google Scholar
  24. 24.
    Koch RM, Gross MH, Carls FR, von Büren DF, Fankhauser G, Parish YIH (1996) Simulating facial surgery using finite element models. In: Proceedings of ACM SIGGRAPH, New Orleans, 4–9 August 1996, pp 421–428 Google Scholar
  25. 25.
    Kouadio C, Poulin P, Lachapelle P (1998) Real time facial animation based upon a bank of 3D facial expressions. In: Proceedings of Computer Animation ’98, Philadelphia, June 1998, pp 128–136 Google Scholar
  26. 26.
    Kshirsagar S, Magnenat-Thalmann N (2000) Lip synchronization using linear predictive analysis. In Proceedings of the IEEE international conference on multimedia and expo (II), New York, 30 July–2 August 2000, pp 1077–1080 Google Scholar
  27. 27.
    Kshirsagar S, Molet T, Magnenat-Thalmann N (2001) Principal components of expressive speech animation. In: Proceedings of Computer Graphics International, Hong Kong, 3 June–6 July 2001, pp 38–44 Google Scholar
  28. 28.
    Lee Y, Terzopoulos D, Waters K (1995) Realistic modeling for facial animation. In: Proceedings of ACM SIGGRAPH’95, Los Angeles, August 1995, pp 55–62 Google Scholar
  29. 29.
    Löfqvist, A (1990) Speech as audible gestures. In: Hardcastle WJ, Marchal A (eds) Speech production and speech modelling. Kluwer, Dordrecht, pp 289–322 Google Scholar
  30. 30.
    Maestri G (1996) Digital character animation.New Riders, Indianapolis Google Scholar
  31. 31.
    Moccozet L, Magnenat Thalmann N (1997) Dirichlet free-form deformations and their application to hand simulation. In: Proceedings of the IEEE international conference on computer animation, Geneva, 5–6 June 1997, pp 93–102 Google Scholar
  32. 32.
    Magnenat Thalmann N, Primeau E, Thalmann D (1988) Abstract muscle action procedures for human face animation. Vis Comput 3(5):290–297 Google Scholar
  33. 33.
    Massaro DW (1996) Perceiving talking faces: from speech perception to a behavioral principle. MIT Press, Cambridge, MA Google Scholar
  34. 34.
    Ma JY, Yan J, Cole R (2002) CU animate: tools for enabling conversions with animated characters. In: Proceedings of the international conference on spoken language processing (ICSLP), Denver, CO, 16–20 September 2002, 1:197–200 Google Scholar
  35. 35.
    McNeill D (1992) Hand and mind: what gestures reveal about thought. University of Chicago Press, Chicago Google Scholar
  36. 36.
    Noh JY, Neumann U (2001) Expression cloning. In: Proceedings of ACM SIGGRAPH, Los Angeles, August 2001, pp 277–288 Google Scholar
  37. 37.
    Öhman SEG (1966) Coarticulation in VCV utterances: spectrographic measurements. J Acoust Soc Am 39:151–168 Google Scholar
  38. 38.
    Pandzic IS, Forchheimer R (2002) MPEG-4 facial animation: the standard, implementation and applications. Wiley, New York Google Scholar
  39. 39.
    Parke F (1972) Computer generated animation of face. In: Proceedings of the ACM national conference, Boston, 1 August 1972, pp 451–457 Google Scholar
  40. 40.
    Pighin F, Szeliski R, Salesin D (2002) Modeling and animating realistic faces from images. Int J Comput Vision 50(2):143–1698 CrossRefzbMATHGoogle Scholar
  41. 41.
    Pelachaud C, Badler N, Steedman M (1991) Linguistic issues in facial animation. In: Magnenat-Thalmann N, Thalmann D (eds) Proceedings of Computer Animation, Springer, Berlin Heidelberg New York, 1 June 1991, pp 15–30 Google Scholar
  42. 42.
    Pellom B, Hacioglu K (2003) Recent improvements in the SONIC ASR system for noisy speech: the SPINE task. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP), Hong Kong, 6–10 April 2003, 1:4–7 Google Scholar
  43. 43.
    Platt SM, Badler NI (1981) Animating facial expressions. ACM Comput Graph 15(3):245–252 Google Scholar
  44. 44.
    Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1988) Numerical recipes in C. Cambridge University Press, Cambridge, UK Google Scholar
  45. 45.
    Sanguineti V, Laboissiere R, Payan Y (1997) A control model of human tongue movements in speech. Biol Cybern 77:11–22 CrossRefzbMATHGoogle Scholar
  46. 46.
    Sclaroff S, Pentland A (1995) Modal matching for corrispondence and recognition. IEEE Trans Patt Anal Mach Intell 17(6):545–561 CrossRefGoogle Scholar
  47. 47.
    Small LH (1999) Fundamentals of phonetics: a practical guide for students. Allyn & Bacon, Boston Google Scholar
  48. 48.
    Stone M, Lundberg A (1996) Three-dimensional tongue surface shapes of English consonants and vowels. J Acoust Soc Am 99(6):3728–3737 Google Scholar
  49. 49.
    Terzopoulos D, Waters K (1990) Physically-based facial modeling, analysis, and animation. J Vis Comput Animat 1(4):73–80 Google Scholar
  50. 50.
    Vetter T, Poggio T (1995) Linear object classes and image synthesis from a single example image. IEEE Trans Patt Anal Mach Intell 19(7):733–742 CrossRefGoogle Scholar
  51. 51.
    Walther EF (1982) Lipreading. Nelson-Hall, Chicago Google Scholar
  52. 52.
    Hardcastle WJ, Hewlett N (1999) Coarticulation: theory, data and techniques. Cambridge University Press, Cambridge, UKGoogle Scholar

Copyright information

© Springer-Verlag 2004

Authors and Affiliations

  1. 1.Center for Spoken Language ResearchUniversity of Colorado at BoulderBoulderUSA

Personalised recommendations