Skip to main content
Log in

Animating visible speech and facial expressions

  • original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

We present four techniques for modeling and animating faces starting from a set of morph targets. The first technique involves obtaining parameters to control individual facial components and learning the mapping from one type of parameter to another through machine learning techniques. The second technique is to fuse visible speech and facial expressions in the lower part of a face. The third technique combines coarticulation rules and kernel smoothing techniques. Finally, a new 3D tongue model with flexible and intuitive skeleton controls is presented. The results of eight animated character models demonstrate that these techniques are powerful and effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Albrecht I, Haber J, Seidel H-P (2002) Speech synchronization for physics-based facial animation. In: Proceedings of the international conference in Central Europe on computer graphics, Czech Republic,4 February 2002. Vis Comput Vision 10:9–16

    Google Scholar 

  2. Badin P, Bailly G, Raybaudi M, Segebarth C (1998) A three-dimensional linear articulatory model based on MRI data. In: Mannell RH, Robert-Ribes J (eds) Proceedings of the 5th international conference on spoken language processing, Sydney, Australia, 4 December 1998, 2:417–420

  3. Barr AH (1981) Superquadrics and angle-preserving transformations. IEEE Comput Graph Appl 1(1):11–23

    Google Scholar 

  4. Bavelas JB (1994) Gestures as part of speech: methodological implications. Res Lang Soc Interact 27:201–221

    Google Scholar 

  5. Brand ME (1999) Voice puppetry. In: Proceedings of ACM SIGGRAPH, Los Angeles, 13 August 1999, pp 21–28

  6. Breen AP, Bowers E, Welsh W (1996) An investigation into the generation of mouth shapes for a talking head. In: Proceedings of the international conference on spoken language processing (ICSLP), Philadelphia, 3–6 October 1996, pp 108–111

  7. Bregler C, Covell M, Slaney M (1997) Video rewrite: driving visual speech with audio. In: Proceedings of ACM SIGGRAPH, Los Angeles, 3–8 August 1997, pp 353–360

  8. Cassell J, Vilhjalmsson H, Bickmore T (2001) BEAT: the Behavior Expression Animation Toolkit. In: Proceedings of ACM SIGGRAPH Los Angeles, 12–17 August 2001, pp 477–486

  9. Celniker G, Gossard D (1991) Deformable curve and surface finite-elements for freeform shape design. In: Proceedings of ACM SIGGRAPH, Las Vegas, NV, 28 July–2 August 1991, pp 257–265

  10. Cohen MM, Massaro DW (1993) Modeling coarticulation in synthetic visual speech. In: Thalman NM, Thalman D (eds) Models and techniques in computer animation. Springer, Berlin Heidelberg New York, pp 139–156

  11. Cohn JF, Zlochower A, Lien J, Wu YT, Kanade T (1997) Automated face coding: a computer-vision based method of facial expression analysis. In: Proceedings of the 7th European conference on facial expression, measurement, and meaning, Salzburg, Austria.,16–22 July 1997, pp 329–333

  12. Cole R, Massaro DW, de Villiers J, Rundle B, Shobaki K, Wouters J, Cohen M, Beskow J, Stone P, Connors P, Tarachow A, Solcher D (1999) New tools for interactive speech and language training: using animated conversational agents in the classrooms of profoundly deaf children. In: Proceedings of the ESCA/SOCRRATES workshop on method and tool innovations for speech science education, University College, London, 16–17 April 1999, pp 45–52

  13. Ekman P, Friesen W (1978) Facial action coding system. Consulting Psychologists Press, Palo Alto, CA

  14. Engwall O (2000) A 3D tongue model based on MRI data. In: Proceedings of ICSLP, III, Beijing, 16 October 2000, pp 901–904

  15. Eubank RL (1999) Nonparametric regression and spline smoothing. Marcel Dekker, New York

  16. Ezzat T, Geiger G, Poggio T (2002) Trainable video realistic speech animation. In: Proceedings of ACM SIGGRAPH 2002, San Antonio, TX, 23–26 July 2002, pp 388–398

  17. Farin G (2002) Curves and surfaces for CAGD, 5th edn. Academic, San Diego, pp 155–175

  18. Guenter B, Grimm C, Wood D, Malvar H, Pighin F (1998) Making faces. In: Proceedings of ACM SIGGRAPH, Orlando, FL, 19–24 July 1998, pp 55–66

  19. Jeffers J, Barley M (1971) Speechreading. Thomas, Springfield, IL

  20. Kent RD, Minifie FD (1977) Coarticulation in recent speech production models. J Phonet 5:115–135

    Google Scholar 

  21. Kent RD (1997) The speech sciences. Singular, San Diego

  22. King SA, Parent RE (2001) A 3D parametric tongue model for animated speech. J Vis Comput Animat 12(3):107–115

    Google Scholar 

  23. Kleiser J (1989) A fast, efficient, accurate way to represent the human face: state of the art in facial animation.In: Proceedings of ACM SIGGRAPH, Tutorials, Boston, 31 July–4 August 1989, 22:20–33

  24. Koch RM, Gross MH, Carls FR, von Büren DF, Fankhauser G, Parish YIH (1996) Simulating facial surgery using finite element models. In: Proceedings of ACM SIGGRAPH, New Orleans, 4–9 August 1996, pp 421–428

  25. Kouadio C, Poulin P, Lachapelle P (1998) Real time facial animation based upon a bank of 3D facial expressions. In: Proceedings of Computer Animation ’98, Philadelphia, June 1998, pp 128–136

  26. Kshirsagar S, Magnenat-Thalmann N (2000) Lip synchronization using linear predictive analysis. In Proceedings of the IEEE international conference on multimedia and expo (II), New York, 30 July–2 August 2000, pp 1077–1080

  27. Kshirsagar S, Molet T, Magnenat-Thalmann N (2001) Principal components of expressive speech animation. In: Proceedings of Computer Graphics International, Hong Kong, 3 June–6 July 2001, pp 38–44

  28. Lee Y, Terzopoulos D, Waters K (1995) Realistic modeling for facial animation. In: Proceedings of ACM SIGGRAPH’95, Los Angeles, August 1995, pp 55–62

  29. Löfqvist, A (1990) Speech as audible gestures. In: Hardcastle WJ, Marchal A (eds) Speech production and speech modelling. Kluwer, Dordrecht, pp 289–322

  30. Maestri G (1996) Digital character animation.New Riders, Indianapolis

  31. Moccozet L, Magnenat Thalmann N (1997) Dirichlet free-form deformations and their application to hand simulation. In: Proceedings of the IEEE international conference on computer animation, Geneva, 5–6 June 1997, pp 93–102

  32. Magnenat Thalmann N, Primeau E, Thalmann D (1988) Abstract muscle action procedures for human face animation. Vis Comput 3(5):290–297

    Google Scholar 

  33. Massaro DW (1996) Perceiving talking faces: from speech perception to a behavioral principle. MIT Press, Cambridge, MA

    Google Scholar 

  34. Ma JY, Yan J, Cole R (2002) CU animate: tools for enabling conversions with animated characters. In: Proceedings of the international conference on spoken language processing (ICSLP), Denver, CO, 16–20 September 2002, 1:197–200

  35. McNeill D (1992) Hand and mind: what gestures reveal about thought. University of Chicago Press, Chicago

    Google Scholar 

  36. Noh JY, Neumann U (2001) Expression cloning. In: Proceedings of ACM SIGGRAPH, Los Angeles, August 2001, pp 277–288

  37. Öhman SEG (1966) Coarticulation in VCV utterances: spectrographic measurements. J Acoust Soc Am 39:151–168

    Google Scholar 

  38. Pandzic IS, Forchheimer R (2002) MPEG-4 facial animation: the standard, implementation and applications. Wiley, New York

  39. Parke F (1972) Computer generated animation of face. In: Proceedings of the ACM national conference, Boston, 1 August 1972, pp 451–457

  40. Pighin F, Szeliski R, Salesin D (2002) Modeling and animating realistic faces from images. Int J Comput Vision 50(2):143–1698

    Article  MATH  Google Scholar 

  41. Pelachaud C, Badler N, Steedman M (1991) Linguistic issues in facial animation. In: Magnenat-Thalmann N, Thalmann D (eds) Proceedings of Computer Animation, Springer, Berlin Heidelberg New York, 1 June 1991, pp 15–30

  42. Pellom B, Hacioglu K (2003) Recent improvements in the SONIC ASR system for noisy speech: the SPINE task. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP), Hong Kong, 6–10 April 2003, 1:4–7

  43. Platt SM, Badler NI (1981) Animating facial expressions. ACM Comput Graph 15(3):245–252

    Google Scholar 

  44. Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1988) Numerical recipes in C. Cambridge University Press, Cambridge, UK

  45. Sanguineti V, Laboissiere R, Payan Y (1997) A control model of human tongue movements in speech. Biol Cybern 77:11–22

    Article  MATH  Google Scholar 

  46. Sclaroff S, Pentland A (1995) Modal matching for corrispondence and recognition. IEEE Trans Patt Anal Mach Intell 17(6):545–561

    Article  Google Scholar 

  47. Small LH (1999) Fundamentals of phonetics: a practical guide for students. Allyn & Bacon, Boston

    Google Scholar 

  48. Stone M, Lundberg A (1996) Three-dimensional tongue surface shapes of English consonants and vowels. J Acoust Soc Am 99(6):3728–3737

    Google Scholar 

  49. Terzopoulos D, Waters K (1990) Physically-based facial modeling, analysis, and animation. J Vis Comput Animat 1(4):73–80

    Google Scholar 

  50. Vetter T, Poggio T (1995) Linear object classes and image synthesis from a single example image. IEEE Trans Patt Anal Mach Intell 19(7):733–742

    Article  Google Scholar 

  51. Walther EF (1982) Lipreading. Nelson-Hall, Chicago

  52. Hardcastle WJ, Hewlett N (1999) Coarticulation: theory, data and techniques. Cambridge University Press, Cambridge, UK

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiyong Ma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, J., Cole, R. Animating visible speech and facial expressions. Visual Comp 20, 86–105 (2004). https://doi.org/10.1007/s00371-003-0234-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-003-0234-y

Keywords

Navigation