Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar

Zhang, Shen; Wu, Zhiyong; Meng, Helen M.; Cai, Lianhong

doi:10.1007/978-3-642-12604-8_6

Shen Zhang⁶,
Zhiyong Wu^6,7,
Helen M. Meng⁸ &
…
Lianhong Cai⁶

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 1))

898 Accesses
7 Citations

Abstract

Facial expression is one of the most expressive ways for human beings to deliver their emotion, intention, and other nonverbal messages in face to face communications. In this chapter, a layered parametric framework is proposed to synthesize the emotional facial expressions for an MPEG4 compliant talking avatar based on the three dimensional PAD model, including pleasure-displeasure, arousal-nonarousal and dominance-submissiveness. The PAD dimensions are used to capture the high-level emotional state of talking avatar with specific facial expression. A set of partial expression parameter (PEP) is designed to depict the expressive facial motion patterns in local face areas, and reduce the complexity of directly manipulation of low-level MPEG4 facial animation parameters (FAP). The relationship among the emotion (PAD), expression (PEP) and animation (FAP) parameter is analyzed on a virtual facial expression database. Two levels of parameter mapping are implemented, namely the emotion-expression mapping from PAD to PEP, and the linear interpolation from PEP to FAP. The synthetic emotional facial expression is combined with the talking avatar speech animation in a text to audio visual speech system. Perceptual evaluation shows that our approach can generate appropriate facial expressions for subtle and complex emotions defined by PAD and thus enhance the emotional expressivity of talking avatar.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Blend Shape Interpolation and FACS for Realistic Avatar

Article 28 January 2015

FAVS: 3D Facial Animation According to Vietnamese Semantic Analysis

Audio-Driven Lips and Expression on 3D Human Face

References

Albrecht, I., Haber, J., Seidel, H.P.: Automatic generation of non-verbal facial expressions from speech. In: Proc. Computer Graphics International 2002, pp. 283–293 (2002)
Google Scholar
Albrecht, I., Schröder, M., Haber, J., Seidel, H.P.: Mixed feelings: expression of non-basic emotions in a muscle-based talking head. Virtual Reality 8(4), 201–212 (2005)
Article Google Scholar
Busso, C., Deng, Z., Grimm, M., Neumann, U., Narayanan, S.: Rigid head motion in expressive speech animation: analysis and synthesis. IEEE Transactions on Audio, Speech, and Language Processing 15(3), 1075–1086 (2007)
Article Google Scholar
Cao, J., Wang, H., Hu, P., Miao, J.: PAD model based facial expression analysis. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Remagnino, P., Porikli, F., Peters, J., Klosowski, J., Arns, L., Chun, Y.K., Rhyne, T.-M., Monroe, L. (eds.) ISVC 2008, Part II. LNCS, vol. 5359, pp. 450–459. Springer, Heidelberg (2008)
Chapter Google Scholar
Cao, Y., Tien, W.C., Faloutsos, P., Pighin, F.: Expressive speech-driven facial animation. ACM Trans. on Graph 24 (2005)
Google Scholar
Cohn, J., Zlochower, A., Lien, J., Kanade, T.: Automated face analysis by feature point tracking has high concurrent validity with manual FACS coding. Psychophysiology 36, 35–43 (1999)
Article Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction, vol. 18(1), pp. 32–80 (2001)
Google Scholar
Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., Schröder, M.: FEELTRACE: an instrument for recording perceived emotion in real time. In: Proceedings of the ISCA workshop on speech and emotion, Northern Ireland, pp. 19–24 (2000)
Google Scholar
Cui, D., Meng, F., Cai, L., Sun, L.: Affect related acoustic features of speech and their modification. In: Paiva, A.C.R., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 776–777. Springer, Heidelberg (2007)
Chapter Google Scholar
Darwin, C.: The expression of the emotions in man and animals. University of Chicago Press, Chicago (1965)
Google Scholar
Du, Y., Lin, X.: Emotional facial expression model building. Pattern Recognition Letters 24(16), 2923–2934 (2003)
Article Google Scholar
Ekman, P.: Universals and cultural differences in facial expressions of emotion. In: Cole, J. (ed.) Proc. Nebraska Symposium on Motivation, vol. 19, pp. 207–283 (1971)
Google Scholar
Ekman, P.: About brows: emotional and conversational signals. In: Human ethology: claims and limits of a new discipline: contributions to the Colloquium, pp. 169–248. Cambridge University Press, England (1979)
Google Scholar
Ekman, P., Friesen, W.: Facial action coding system: A technique for the measurement of facial movement. Tech. rep. Consulting Psychologists Press (1978)
Google Scholar
Faigin, G.: The Artist’s Complete Guide to Facial Expression. Watson-Guptill (2008)
Google Scholar
Fasel, B., Luttin, J.: Recognition of asymmetric facial action unit activities and intensities. In: Proceedings of International Conference of Pattern Recognition (2000)
Google Scholar
Friesen, W., Ekman, P.: Emfacs-7: emotional facial action coding system, Unpublished manuscript, University of California at San Francisco (1983)
Google Scholar
Fritsch, F.N., Carlson, R.E.: Monotone piecewise cubic interpolation. SIAM Journal on Numerical Analysis 17, 238–246 (1980)
Article MATH MathSciNet Google Scholar
Granstrom, B., House, D.: Audiovisual representation of prosody in expressive speech communication. Speech Communication 46(3-4), 473–484 (2005)
Article Google Scholar
Hong, P., Wen, Z., Huang, T.S.: Real-time speech-driven face animation with expressions using neural networks, vol. 13(4), pp. 916–927 (2002)
Google Scholar
Ibanez, J., Aylett, R., Ruiz-Rodarte, R.: Storytelling in virtual environments from a virtual guide perspective. Virtual Reality 7, 30–42 (2003)
Article Google Scholar
Kalra, P., Mangili, A., Magnenat-Thalmann, N., Thalmann, D.: Simulation of facial muscle actions based on rational free form deformations. In: Proc. Eurographics 1992, pp. 59–69 (1992)
Google Scholar
Kanade, T., Cohn, J.F., Tian, Y.: Comprehensive database for facial expression analysis. In: Proc. Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 46–53 (2000)
Google Scholar
Kshirsagar, S., Escher, M., Sannier, G., Magnenat-Thalmann, N.: Multimodal animation system based on the mpeg-4 standard. In: Proceedings Multimedia Modelling 1999, pp. 21–25 (1999)
Google Scholar
Lang, P.J.: Behavioral treatment and bio-behavioral assessment: computer applications. In: Technology in mental health care delivery systems, pp. 119–137. Ablex, Norwood (1980)
Google Scholar
Lavagetto, F., Pockaj, R.: An efficient use of mpeg-4 fap interpolation for facial animation at 70 bits/frame, vol. 11(10), pp. 1085–1097 (2001)
Google Scholar
Li, S.Z., Jain, A.K.: Handbook of Facial Recognition. Springer, New York (2005)
Google Scholar
Li, X., Zhou, H., Song, S., Ran, T., Fu, X.: The reliability and validity of the chinese version of abbreviated PAD emotion scales. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 513–518. Springer, Heidelberg (2005)
Chapter Google Scholar
Linden Research, Inc.: Second life: Online 3D virtual world, http://secondlife.com/
Lyons, M., Akamatsu, S., Kamachi, M., Gyoba, J.: Coding facial expressions with gabor wavelets. In: Proc. Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 200–205 (1998)
Google Scholar
Mana, N., Pianesi, F.: Hmm-based synthesis of emotional facial expressions during speech in synthetic talking heads. In: Proceeding of 8th International Conference on Multimodal Interfaces (ICMI 2006), Banff, AB, Canada, pp. 380–387 (2006)
Google Scholar
Mehrabian, A.: Communication without words. Psychology Today 2, 53–56 (1968)
Google Scholar
Mehrabian, A.: Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament. Current Psychology: Developmental, Learning, Personality, Social 14(4), 261–292 (1996)
MathSciNet Google Scholar
Motion Pictures Expert Group: ISO/IEC 14496-2.: International standard, information technology-coding of audio-visual objects. part 2: Visual; amendment 1: Visual extensions (1999/Amd. 1: 2000(E))
Google Scholar
Mower, E., Mataric, M.J., Narayanan, S.: Human perception of audio-visual synthetic character emotion expression in the presence of ambiguous and conflicting information. IEEE Transaction on Multimedia 11(5), 843–855 (2009)
Article Google Scholar
Oddcast Inc.: Personalized speaking avatars service, http://www.voki.com/
Osgood, C.E., Suci, G.J., Tannenbaum, P.H.: The Measurement of Meaning. University of Illinois Press (1957)
Google Scholar
Parke, F.I.: Parameterized models for facial animation, vol. 2(9), pp. 61–68 (1982)
Google Scholar
Ekman, P., Rosenberg, E.L.: What the face reveals: basic and applied studies of spontaneous expression using the facial action coding system (FACS). Oxford University Press, US (2005)
Google Scholar
Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
Google Scholar
Raouzaiou, A., Tsapatsoulis, N., Karpouzis, K., Kollias, S.: Parameterized facial expression synthesis based on MPEG-4. EURASIP Journal on Applied Signal Processing 2002, 1021–1038 (2002)
Google Scholar
Reallusion, Inc.: Crazytalk for skype, http://www.reallusion.com/crazytalk4skype/
Plutchik, R.: A general psychoevolutionary theory of emotion. In: Emotion: Theory, research, and experience. Theories of emotion, vol. 1, pp. 3–33. Academic, New York (1980)
Google Scholar
Russell, J., Mehrabian, A.: Evidence for a three-factor theory of emotions. Journal of Research in Personality 11, 273–294 (1977)
Article Google Scholar
Russell, J.A., Fernez-Dols, J.M. (eds.): The Psychology of Facial Expression. Cambridge University Press, Cambridge (1997)
Google Scholar
Ruttkay, Z., Noot, H., Hagen, P.: Emotion disc and emotion squares: Tools to explore the facial expression space. Computer Graphics Forum 22(1), 49–53 (2003)
Article Google Scholar
Schröder, M.: Dimensional emotion representation as a basis for speech synthesis with non-extreme emotions. In: André, E., Dybkjær, L., Minker, W., Heisterkamp, P. (eds.) ADS 2004. LNCS (LNAI), vol. 3068, pp. 209–220. Springer, Heidelberg (2004)
Google Scholar
Schröder, M.: Expressing degree of activation in synthetic speech. IEEE Transactions on Audio, Speech, and Language Processing 14(4), 1128–1136 (2006)
Article Google Scholar
Tang, H., Huang, T.S.: MPEG4 performance-driven avatar via robust facial motion tracking. In: International Conference on Image Processing, ICIP, San Diego, CA, United state, pp. 249–252 (2008)
Google Scholar
Terzopolous, D., Waters, K.: Physically-based facial modeling, analysis and animation. Journal of Visualization and Computer Animation 1, 73–80 (1990)
Google Scholar
Theune, M., Meijs, K., Heylen, D., Ordelman, R.: Generating expressive speech for storytelling applications. IEEE Transactions on Audio, Speech, and Language Processing 14(4), 1137–1144 (2006)
Article Google Scholar
Tian, Y.I., Kanade, T., Cohn, J.F.: Recognizing action units for facial expression analysis, vol. 23(2), pp. 97–115 (2001)
Google Scholar
Tsapatsoulis, N., Raousaiou, A., Kollias, S., Cowie, R., Douglas-Cowie, E.: Emotion recognition and synthesis based on MPEG-4 FAPs. In: MPEG-4 facial animation-the standard implementations applications, pp. 141–167. Wiley, Hillsdale (2002)
Chapter Google Scholar
Wang, Z., Cai, L., AI, H.: A dynamic viseme model for personalizing a talking head. In: Sixth International Conference on Signal Processing (ICSP 2002), pp. 26–30 (2002)
Google Scholar
Waters, K.: A muscle model of animating three dimensional facial expression. Computer Graphics 22(4), 17–24 (1987)
Article Google Scholar
Welbergen, H., Nijholt, A., Reidsma, D., Zwiers, J.: Presenting in virtual worlds: Towards an architecture for a 3D presenter explaining 2D-presented information. In: Maybury, M., Stock, O., Wahlster, W. (eds.) INTETAIN 2005. LNCS (LNAI), vol. 3814, pp. 203–212. Springer, Heidelberg (2005)
Chapter Google Scholar
Whissell, C.: The Dictionary of Affect in Language Emotion: Theory, Research and Experience. In: The Measurement of Emotions, vol. 4, pp. 113–131. Academic Press, London (1989)
Google Scholar
Wu, Z., Zhang, S., Cai, L., Meng, H.M.: Real-time synthesis of Chinese visual speech and facial expressions using mpeg-4 fap features in a three-dimensional avatar. In: INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, vol. 4, pp. 1802–1805 (2006)
Google Scholar
Yang, H., Meng, H.M., Cai, L.: Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis. In: INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, vol. 4, pp. 1806–1809 (2006)
Google Scholar
Yang, H., Meng, H.M., Wu, Z., Cai, L.: Modelling the global acoustic correlates of expressivity for Chinese text-to-speech synthesis. In: Proc. IEEE Spoken Language Technology Workshop, pp. 138–141 (2006)
Google Scholar
Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transaction on Multimedia 31(1), 39–58 (2009)
Google Scholar
Zhang, S.: Pseudo facial expression database, http://hcsi.cs.tsinghua.edu.cn/Demo/jaffe/emot/index.php
Zhang, S., Wu, Z., Meng, H.M., Cai, L.: Head movement synthesis based on semantic and prosodic features for a Chinese expressive avatar. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2007, vol. 4, pp. 837–840 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Pervasive Computing, Ministry of Education, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Shen Zhang, Zhiyong Wu & Lianhong Cai
Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems, Graduate School at Shenzhen, Tsinghua University, Shenzhen
Zhiyong Wu
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, HKSAR, China
Helen M. Meng

Authors

Shen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Helen M. Meng
View author publications
You can also search for this author in PubMed Google Scholar
Lianhong Cai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University , Yoshida-Honmachi, Sakyo-ku, 606-8501, Kyoto, Japan
Toyoaki Nishida
SCT-Building, University of South Australia , Mawson Lakes Campus, Adelaide, South Australia, Australia
Lakhmi C. Jain
Universitè Paul Cézanne Polytech’ Marseille , Avenue Escadrille Normandie-Niemen, 13397, Marseille Cedex 20, France
Colette Faucher

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhang, S., Wu, Z., Meng, H.M., Cai, L. (2010). Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar. In: Nishida, T., Jain, L.C., Faucher, C. (eds) Modeling Machine Emotions for Realizing Intelligence. Smart Innovation, Systems and Technologies, vol 1. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12604-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-12604-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12603-1
Online ISBN: 978-3-642-12604-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar

Abstract

Access this chapter

Preview

Similar content being viewed by others

Blend Shape Interpolation and FACS for Realistic Avatar

FAVS: 3D Facial Animation According to Vietnamese Semantic Analysis

Audio-Driven Lips and Expression on 3D Human Face

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar

Abstract

Access this chapter

Preview

Similar content being viewed by others

Blend Shape Interpolation and FACS for Realistic Avatar

FAVS: 3D Facial Animation According to Vietnamese Semantic Analysis

Audio-Driven Lips and Expression on 3D Human Face

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation