Skip to main content

Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar

  • Chapter
Modeling Machine Emotions for Realizing Intelligence

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 1))

Abstract

Facial expression is one of the most expressive ways for human beings to deliver their emotion, intention, and other nonverbal messages in face to face communications. In this chapter, a layered parametric framework is proposed to synthesize the emotional facial expressions for an MPEG4 compliant talking avatar based on the three dimensional PAD model, including pleasure-displeasure, arousal-nonarousal and dominance-submissiveness. The PAD dimensions are used to capture the high-level emotional state of talking avatar with specific facial expression. A set of partial expression parameter (PEP) is designed to depict the expressive facial motion patterns in local face areas, and reduce the complexity of directly manipulation of low-level MPEG4 facial animation parameters (FAP). The relationship among the emotion (PAD), expression (PEP) and animation (FAP) parameter is analyzed on a virtual facial expression database. Two levels of parameter mapping are implemented, namely the emotion-expression mapping from PAD to PEP, and the linear interpolation from PEP to FAP. The synthetic emotional facial expression is combined with the talking avatar speech animation in a text to audio visual speech system. Perceptual evaluation shows that our approach can generate appropriate facial expressions for subtle and complex emotions defined by PAD and thus enhance the emotional expressivity of talking avatar.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Albrecht, I., Haber, J., Seidel, H.P.: Automatic generation of non-verbal facial expressions from speech. In: Proc. Computer Graphics International 2002, pp. 283–293 (2002)

    Google Scholar 

  2. Albrecht, I., Schröder, M., Haber, J., Seidel, H.P.: Mixed feelings: expression of non-basic emotions in a muscle-based talking head. Virtual Reality 8(4), 201–212 (2005)

    Article  Google Scholar 

  3. Busso, C., Deng, Z., Grimm, M., Neumann, U., Narayanan, S.: Rigid head motion in expressive speech animation: analysis and synthesis. IEEE Transactions on Audio, Speech, and Language Processing 15(3), 1075–1086 (2007)

    Article  Google Scholar 

  4. Cao, J., Wang, H., Hu, P., Miao, J.: PAD model based facial expression analysis. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Remagnino, P., Porikli, F., Peters, J., Klosowski, J., Arns, L., Chun, Y.K., Rhyne, T.-M., Monroe, L. (eds.) ISVC 2008, Part II. LNCS, vol. 5359, pp. 450–459. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Cao, Y., Tien, W.C., Faloutsos, P., Pighin, F.: Expressive speech-driven facial animation. ACM Trans. on Graph 24 (2005)

    Google Scholar 

  6. Cohn, J., Zlochower, A., Lien, J., Kanade, T.: Automated face analysis by feature point tracking has high concurrent validity with manual FACS coding. Psychophysiology 36, 35–43 (1999)

    Article  Google Scholar 

  7. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction, vol. 18(1), pp. 32–80 (2001)

    Google Scholar 

  8. Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., Schröder, M.: FEELTRACE: an instrument for recording perceived emotion in real time. In: Proceedings of the ISCA workshop on speech and emotion, Northern Ireland, pp. 19–24 (2000)

    Google Scholar 

  9. Cui, D., Meng, F., Cai, L., Sun, L.: Affect related acoustic features of speech and their modification. In: Paiva, A.C.R., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 776–777. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  10. Darwin, C.: The expression of the emotions in man and animals. University of Chicago Press, Chicago (1965)

    Google Scholar 

  11. Du, Y., Lin, X.: Emotional facial expression model building. Pattern Recognition Letters 24(16), 2923–2934 (2003)

    Article  Google Scholar 

  12. Ekman, P.: Universals and cultural differences in facial expressions of emotion. In: Cole, J. (ed.) Proc. Nebraska Symposium on Motivation, vol. 19, pp. 207–283 (1971)

    Google Scholar 

  13. Ekman, P.: About brows: emotional and conversational signals. In: Human ethology: claims and limits of a new discipline: contributions to the Colloquium, pp. 169–248. Cambridge University Press, England (1979)

    Google Scholar 

  14. Ekman, P., Friesen, W.: Facial action coding system: A technique for the measurement of facial movement. Tech. rep. Consulting Psychologists Press (1978)

    Google Scholar 

  15. Faigin, G.: The Artist’s Complete Guide to Facial Expression. Watson-Guptill (2008)

    Google Scholar 

  16. Fasel, B., Luttin, J.: Recognition of asymmetric facial action unit activities and intensities. In: Proceedings of International Conference of Pattern Recognition (2000)

    Google Scholar 

  17. Friesen, W., Ekman, P.: Emfacs-7: emotional facial action coding system, Unpublished manuscript, University of California at San Francisco (1983)

    Google Scholar 

  18. Fritsch, F.N., Carlson, R.E.: Monotone piecewise cubic interpolation. SIAM Journal on Numerical Analysis 17, 238–246 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  19. Granstrom, B., House, D.: Audiovisual representation of prosody in expressive speech communication. Speech Communication 46(3-4), 473–484 (2005)

    Article  Google Scholar 

  20. Hong, P., Wen, Z., Huang, T.S.: Real-time speech-driven face animation with expressions using neural networks, vol. 13(4), pp. 916–927 (2002)

    Google Scholar 

  21. Ibanez, J., Aylett, R., Ruiz-Rodarte, R.: Storytelling in virtual environments from a virtual guide perspective. Virtual Reality 7, 30–42 (2003)

    Article  Google Scholar 

  22. Kalra, P., Mangili, A., Magnenat-Thalmann, N., Thalmann, D.: Simulation of facial muscle actions based on rational free form deformations. In: Proc. Eurographics 1992, pp. 59–69 (1992)

    Google Scholar 

  23. Kanade, T., Cohn, J.F., Tian, Y.: Comprehensive database for facial expression analysis. In: Proc. Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 46–53 (2000)

    Google Scholar 

  24. Kshirsagar, S., Escher, M., Sannier, G., Magnenat-Thalmann, N.: Multimodal animation system based on the mpeg-4 standard. In: Proceedings Multimedia Modelling 1999, pp. 21–25 (1999)

    Google Scholar 

  25. Lang, P.J.: Behavioral treatment and bio-behavioral assessment: computer applications. In: Technology in mental health care delivery systems, pp. 119–137. Ablex, Norwood (1980)

    Google Scholar 

  26. Lavagetto, F., Pockaj, R.: An efficient use of mpeg-4 fap interpolation for facial animation at 70 bits/frame, vol. 11(10), pp. 1085–1097 (2001)

    Google Scholar 

  27. Li, S.Z., Jain, A.K.: Handbook of Facial Recognition. Springer, New York (2005)

    Google Scholar 

  28. Li, X., Zhou, H., Song, S., Ran, T., Fu, X.: The reliability and validity of the chinese version of abbreviated PAD emotion scales. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 513–518. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  29. Linden Research, Inc.: Second life: Online 3D virtual world, http://secondlife.com/

  30. Lyons, M., Akamatsu, S., Kamachi, M., Gyoba, J.: Coding facial expressions with gabor wavelets. In: Proc. Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 200–205 (1998)

    Google Scholar 

  31. Mana, N., Pianesi, F.: Hmm-based synthesis of emotional facial expressions during speech in synthetic talking heads. In: Proceeding of 8th International Conference on Multimodal Interfaces (ICMI 2006), Banff, AB, Canada, pp. 380–387 (2006)

    Google Scholar 

  32. Mehrabian, A.: Communication without words. Psychology Today 2, 53–56 (1968)

    Google Scholar 

  33. Mehrabian, A.: Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament. Current Psychology: Developmental, Learning, Personality, Social 14(4), 261–292 (1996)

    MathSciNet  Google Scholar 

  34. Motion Pictures Expert Group: ISO/IEC 14496-2.: International standard, information technology-coding of audio-visual objects. part 2: Visual; amendment 1: Visual extensions (1999/Amd. 1: 2000(E))

    Google Scholar 

  35. Mower, E., Mataric, M.J., Narayanan, S.: Human perception of audio-visual synthetic character emotion expression in the presence of ambiguous and conflicting information. IEEE Transaction on Multimedia 11(5), 843–855 (2009)

    Article  Google Scholar 

  36. Oddcast Inc.: Personalized speaking avatars service, http://www.voki.com/

  37. Osgood, C.E., Suci, G.J., Tannenbaum, P.H.: The Measurement of Meaning. University of Illinois Press (1957)

    Google Scholar 

  38. Parke, F.I.: Parameterized models for facial animation, vol. 2(9), pp. 61–68 (1982)

    Google Scholar 

  39. Ekman, P., Rosenberg, E.L.: What the face reveals: basic and applied studies of spontaneous expression using the facial action coding system (FACS). Oxford University Press, US (2005)

    Google Scholar 

  40. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)

    Google Scholar 

  41. Raouzaiou, A., Tsapatsoulis, N., Karpouzis, K., Kollias, S.: Parameterized facial expression synthesis based on MPEG-4. EURASIP Journal on Applied Signal Processing 2002, 1021–1038 (2002)

    Google Scholar 

  42. Reallusion, Inc.: Crazytalk for skype, http://www.reallusion.com/crazytalk4skype/

  43. Plutchik, R.: A general psychoevolutionary theory of emotion. In: Emotion: Theory, research, and experience. Theories of emotion, vol. 1, pp. 3–33. Academic, New York (1980)

    Google Scholar 

  44. Russell, J., Mehrabian, A.: Evidence for a three-factor theory of emotions. Journal of Research in Personality 11, 273–294 (1977)

    Article  Google Scholar 

  45. Russell, J.A., Fernez-Dols, J.M. (eds.): The Psychology of Facial Expression. Cambridge University Press, Cambridge (1997)

    Google Scholar 

  46. Ruttkay, Z., Noot, H., Hagen, P.: Emotion disc and emotion squares: Tools to explore the facial expression space. Computer Graphics Forum 22(1), 49–53 (2003)

    Article  Google Scholar 

  47. Schröder, M.: Dimensional emotion representation as a basis for speech synthesis with non-extreme emotions. In: André, E., Dybkjær, L., Minker, W., Heisterkamp, P. (eds.) ADS 2004. LNCS (LNAI), vol. 3068, pp. 209–220. Springer, Heidelberg (2004)

    Google Scholar 

  48. Schröder, M.: Expressing degree of activation in synthetic speech. IEEE Transactions on Audio, Speech, and Language Processing 14(4), 1128–1136 (2006)

    Article  Google Scholar 

  49. Tang, H., Huang, T.S.: MPEG4 performance-driven avatar via robust facial motion tracking. In: International Conference on Image Processing, ICIP, San Diego, CA, United state, pp. 249–252 (2008)

    Google Scholar 

  50. Terzopolous, D., Waters, K.: Physically-based facial modeling, analysis and animation. Journal of Visualization and Computer Animation 1, 73–80 (1990)

    Google Scholar 

  51. Theune, M., Meijs, K., Heylen, D., Ordelman, R.: Generating expressive speech for storytelling applications. IEEE Transactions on Audio, Speech, and Language Processing 14(4), 1137–1144 (2006)

    Article  Google Scholar 

  52. Tian, Y.I., Kanade, T., Cohn, J.F.: Recognizing action units for facial expression analysis, vol. 23(2), pp. 97–115 (2001)

    Google Scholar 

  53. Tsapatsoulis, N., Raousaiou, A., Kollias, S., Cowie, R., Douglas-Cowie, E.: Emotion recognition and synthesis based on MPEG-4 FAPs. In: MPEG-4 facial animation-the standard implementations applications, pp. 141–167. Wiley, Hillsdale (2002)

    Chapter  Google Scholar 

  54. Wang, Z., Cai, L., AI, H.: A dynamic viseme model for personalizing a talking head. In: Sixth International Conference on Signal Processing (ICSP 2002), pp. 26–30 (2002)

    Google Scholar 

  55. Waters, K.: A muscle model of animating three dimensional facial expression. Computer Graphics 22(4), 17–24 (1987)

    Article  Google Scholar 

  56. Welbergen, H., Nijholt, A., Reidsma, D., Zwiers, J.: Presenting in virtual worlds: Towards an architecture for a 3D presenter explaining 2D-presented information. In: Maybury, M., Stock, O., Wahlster, W. (eds.) INTETAIN 2005. LNCS (LNAI), vol. 3814, pp. 203–212. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  57. Whissell, C.: The Dictionary of Affect in Language Emotion: Theory, Research and Experience. In: The Measurement of Emotions, vol. 4, pp. 113–131. Academic Press, London (1989)

    Google Scholar 

  58. Wu, Z., Zhang, S., Cai, L., Meng, H.M.: Real-time synthesis of Chinese visual speech and facial expressions using mpeg-4 fap features in a three-dimensional avatar. In: INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, vol. 4, pp. 1802–1805 (2006)

    Google Scholar 

  59. Yang, H., Meng, H.M., Cai, L.: Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis. In: INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, vol. 4, pp. 1806–1809 (2006)

    Google Scholar 

  60. Yang, H., Meng, H.M., Wu, Z., Cai, L.: Modelling the global acoustic correlates of expressivity for Chinese text-to-speech synthesis. In: Proc. IEEE Spoken Language Technology Workshop, pp. 138–141 (2006)

    Google Scholar 

  61. Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transaction on Multimedia 31(1), 39–58 (2009)

    Google Scholar 

  62. Zhang, S.: Pseudo facial expression database, http://hcsi.cs.tsinghua.edu.cn/Demo/jaffe/emot/index.php

  63. Zhang, S., Wu, Z., Meng, H.M., Cai, L.: Head movement synthesis based on semantic and prosodic features for a Chinese expressive avatar. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2007, vol. 4, pp. 837–840 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Berlin Heidelberg

About this chapter

Cite this chapter

Zhang, S., Wu, Z., Meng, H.M., Cai, L. (2010). Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar. In: Nishida, T., Jain, L.C., Faucher, C. (eds) Modeling Machine Emotions for Realizing Intelligence. Smart Innovation, Systems and Technologies, vol 1. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12604-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12604-8_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12603-1

  • Online ISBN: 978-3-642-12604-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics