Abstract
Facial expression is one of the most expressive ways for human beings to deliver their emotion, intention, and other nonverbal messages in face to face communications. In this chapter, a layered parametric framework is proposed to synthesize the emotional facial expressions for an MPEG4 compliant talking avatar based on the three dimensional PAD model, including pleasure-displeasure, arousal-nonarousal and dominance-submissiveness. The PAD dimensions are used to capture the high-level emotional state of talking avatar with specific facial expression. A set of partial expression parameter (PEP) is designed to depict the expressive facial motion patterns in local face areas, and reduce the complexity of directly manipulation of low-level MPEG4 facial animation parameters (FAP). The relationship among the emotion (PAD), expression (PEP) and animation (FAP) parameter is analyzed on a virtual facial expression database. Two levels of parameter mapping are implemented, namely the emotion-expression mapping from PAD to PEP, and the linear interpolation from PEP to FAP. The synthetic emotional facial expression is combined with the talking avatar speech animation in a text to audio visual speech system. Perceptual evaluation shows that our approach can generate appropriate facial expressions for subtle and complex emotions defined by PAD and thus enhance the emotional expressivity of talking avatar.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Albrecht, I., Haber, J., Seidel, H.P.: Automatic generation of non-verbal facial expressions from speech. In: Proc. Computer Graphics International 2002, pp. 283–293 (2002)
Albrecht, I., Schröder, M., Haber, J., Seidel, H.P.: Mixed feelings: expression of non-basic emotions in a muscle-based talking head. Virtual Reality 8(4), 201–212 (2005)
Busso, C., Deng, Z., Grimm, M., Neumann, U., Narayanan, S.: Rigid head motion in expressive speech animation: analysis and synthesis. IEEE Transactions on Audio, Speech, and Language Processing 15(3), 1075–1086 (2007)
Cao, J., Wang, H., Hu, P., Miao, J.: PAD model based facial expression analysis. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Remagnino, P., Porikli, F., Peters, J., Klosowski, J., Arns, L., Chun, Y.K., Rhyne, T.-M., Monroe, L. (eds.) ISVC 2008, Part II. LNCS, vol. 5359, pp. 450–459. Springer, Heidelberg (2008)
Cao, Y., Tien, W.C., Faloutsos, P., Pighin, F.: Expressive speech-driven facial animation. ACM Trans. on Graph 24 (2005)
Cohn, J., Zlochower, A., Lien, J., Kanade, T.: Automated face analysis by feature point tracking has high concurrent validity with manual FACS coding. Psychophysiology 36, 35–43 (1999)
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction, vol. 18(1), pp. 32–80 (2001)
Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., Schröder, M.: FEELTRACE: an instrument for recording perceived emotion in real time. In: Proceedings of the ISCA workshop on speech and emotion, Northern Ireland, pp. 19–24 (2000)
Cui, D., Meng, F., Cai, L., Sun, L.: Affect related acoustic features of speech and their modification. In: Paiva, A.C.R., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 776–777. Springer, Heidelberg (2007)
Darwin, C.: The expression of the emotions in man and animals. University of Chicago Press, Chicago (1965)
Du, Y., Lin, X.: Emotional facial expression model building. Pattern Recognition Letters 24(16), 2923–2934 (2003)
Ekman, P.: Universals and cultural differences in facial expressions of emotion. In: Cole, J. (ed.) Proc. Nebraska Symposium on Motivation, vol. 19, pp. 207–283 (1971)
Ekman, P.: About brows: emotional and conversational signals. In: Human ethology: claims and limits of a new discipline: contributions to the Colloquium, pp. 169–248. Cambridge University Press, England (1979)
Ekman, P., Friesen, W.: Facial action coding system: A technique for the measurement of facial movement. Tech. rep. Consulting Psychologists Press (1978)
Faigin, G.: The Artist’s Complete Guide to Facial Expression. Watson-Guptill (2008)
Fasel, B., Luttin, J.: Recognition of asymmetric facial action unit activities and intensities. In: Proceedings of International Conference of Pattern Recognition (2000)
Friesen, W., Ekman, P.: Emfacs-7: emotional facial action coding system, Unpublished manuscript, University of California at San Francisco (1983)
Fritsch, F.N., Carlson, R.E.: Monotone piecewise cubic interpolation. SIAM Journal on Numerical Analysis 17, 238–246 (1980)
Granstrom, B., House, D.: Audiovisual representation of prosody in expressive speech communication. Speech Communication 46(3-4), 473–484 (2005)
Hong, P., Wen, Z., Huang, T.S.: Real-time speech-driven face animation with expressions using neural networks, vol. 13(4), pp. 916–927 (2002)
Ibanez, J., Aylett, R., Ruiz-Rodarte, R.: Storytelling in virtual environments from a virtual guide perspective. Virtual Reality 7, 30–42 (2003)
Kalra, P., Mangili, A., Magnenat-Thalmann, N., Thalmann, D.: Simulation of facial muscle actions based on rational free form deformations. In: Proc. Eurographics 1992, pp. 59–69 (1992)
Kanade, T., Cohn, J.F., Tian, Y.: Comprehensive database for facial expression analysis. In: Proc. Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 46–53 (2000)
Kshirsagar, S., Escher, M., Sannier, G., Magnenat-Thalmann, N.: Multimodal animation system based on the mpeg-4 standard. In: Proceedings Multimedia Modelling 1999, pp. 21–25 (1999)
Lang, P.J.: Behavioral treatment and bio-behavioral assessment: computer applications. In: Technology in mental health care delivery systems, pp. 119–137. Ablex, Norwood (1980)
Lavagetto, F., Pockaj, R.: An efficient use of mpeg-4 fap interpolation for facial animation at 70 bits/frame, vol. 11(10), pp. 1085–1097 (2001)
Li, S.Z., Jain, A.K.: Handbook of Facial Recognition. Springer, New York (2005)
Li, X., Zhou, H., Song, S., Ran, T., Fu, X.: The reliability and validity of the chinese version of abbreviated PAD emotion scales. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 513–518. Springer, Heidelberg (2005)
Linden Research, Inc.: Second life: Online 3D virtual world, http://secondlife.com/
Lyons, M., Akamatsu, S., Kamachi, M., Gyoba, J.: Coding facial expressions with gabor wavelets. In: Proc. Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 200–205 (1998)
Mana, N., Pianesi, F.: Hmm-based synthesis of emotional facial expressions during speech in synthetic talking heads. In: Proceeding of 8th International Conference on Multimodal Interfaces (ICMI 2006), Banff, AB, Canada, pp. 380–387 (2006)
Mehrabian, A.: Communication without words. Psychology Today 2, 53–56 (1968)
Mehrabian, A.: Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament. Current Psychology: Developmental, Learning, Personality, Social 14(4), 261–292 (1996)
Motion Pictures Expert Group: ISO/IEC 14496-2.: International standard, information technology-coding of audio-visual objects. part 2: Visual; amendment 1: Visual extensions (1999/Amd. 1: 2000(E))
Mower, E., Mataric, M.J., Narayanan, S.: Human perception of audio-visual synthetic character emotion expression in the presence of ambiguous and conflicting information. IEEE Transaction on Multimedia 11(5), 843–855 (2009)
Oddcast Inc.: Personalized speaking avatars service, http://www.voki.com/
Osgood, C.E., Suci, G.J., Tannenbaum, P.H.: The Measurement of Meaning. University of Illinois Press (1957)
Parke, F.I.: Parameterized models for facial animation, vol. 2(9), pp. 61–68 (1982)
Ekman, P., Rosenberg, E.L.: What the face reveals: basic and applied studies of spontaneous expression using the facial action coding system (FACS). Oxford University Press, US (2005)
Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
Raouzaiou, A., Tsapatsoulis, N., Karpouzis, K., Kollias, S.: Parameterized facial expression synthesis based on MPEG-4. EURASIP Journal on Applied Signal Processing 2002, 1021–1038 (2002)
Reallusion, Inc.: Crazytalk for skype, http://www.reallusion.com/crazytalk4skype/
Plutchik, R.: A general psychoevolutionary theory of emotion. In: Emotion: Theory, research, and experience. Theories of emotion, vol. 1, pp. 3–33. Academic, New York (1980)
Russell, J., Mehrabian, A.: Evidence for a three-factor theory of emotions. Journal of Research in Personality 11, 273–294 (1977)
Russell, J.A., Fernez-Dols, J.M. (eds.): The Psychology of Facial Expression. Cambridge University Press, Cambridge (1997)
Ruttkay, Z., Noot, H., Hagen, P.: Emotion disc and emotion squares: Tools to explore the facial expression space. Computer Graphics Forum 22(1), 49–53 (2003)
Schröder, M.: Dimensional emotion representation as a basis for speech synthesis with non-extreme emotions. In: André, E., Dybkjær, L., Minker, W., Heisterkamp, P. (eds.) ADS 2004. LNCS (LNAI), vol. 3068, pp. 209–220. Springer, Heidelberg (2004)
Schröder, M.: Expressing degree of activation in synthetic speech. IEEE Transactions on Audio, Speech, and Language Processing 14(4), 1128–1136 (2006)
Tang, H., Huang, T.S.: MPEG4 performance-driven avatar via robust facial motion tracking. In: International Conference on Image Processing, ICIP, San Diego, CA, United state, pp. 249–252 (2008)
Terzopolous, D., Waters, K.: Physically-based facial modeling, analysis and animation. Journal of Visualization and Computer Animation 1, 73–80 (1990)
Theune, M., Meijs, K., Heylen, D., Ordelman, R.: Generating expressive speech for storytelling applications. IEEE Transactions on Audio, Speech, and Language Processing 14(4), 1137–1144 (2006)
Tian, Y.I., Kanade, T., Cohn, J.F.: Recognizing action units for facial expression analysis, vol. 23(2), pp. 97–115 (2001)
Tsapatsoulis, N., Raousaiou, A., Kollias, S., Cowie, R., Douglas-Cowie, E.: Emotion recognition and synthesis based on MPEG-4 FAPs. In: MPEG-4 facial animation-the standard implementations applications, pp. 141–167. Wiley, Hillsdale (2002)
Wang, Z., Cai, L., AI, H.: A dynamic viseme model for personalizing a talking head. In: Sixth International Conference on Signal Processing (ICSP 2002), pp. 26–30 (2002)
Waters, K.: A muscle model of animating three dimensional facial expression. Computer Graphics 22(4), 17–24 (1987)
Welbergen, H., Nijholt, A., Reidsma, D., Zwiers, J.: Presenting in virtual worlds: Towards an architecture for a 3D presenter explaining 2D-presented information. In: Maybury, M., Stock, O., Wahlster, W. (eds.) INTETAIN 2005. LNCS (LNAI), vol. 3814, pp. 203–212. Springer, Heidelberg (2005)
Whissell, C.: The Dictionary of Affect in Language Emotion: Theory, Research and Experience. In: The Measurement of Emotions, vol. 4, pp. 113–131. Academic Press, London (1989)
Wu, Z., Zhang, S., Cai, L., Meng, H.M.: Real-time synthesis of Chinese visual speech and facial expressions using mpeg-4 fap features in a three-dimensional avatar. In: INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, vol. 4, pp. 1802–1805 (2006)
Yang, H., Meng, H.M., Cai, L.: Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis. In: INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, vol. 4, pp. 1806–1809 (2006)
Yang, H., Meng, H.M., Wu, Z., Cai, L.: Modelling the global acoustic correlates of expressivity for Chinese text-to-speech synthesis. In: Proc. IEEE Spoken Language Technology Workshop, pp. 138–141 (2006)
Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transaction on Multimedia 31(1), 39–58 (2009)
Zhang, S.: Pseudo facial expression database, http://hcsi.cs.tsinghua.edu.cn/Demo/jaffe/emot/index.php
Zhang, S., Wu, Z., Meng, H.M., Cai, L.: Head movement synthesis based on semantic and prosodic features for a Chinese expressive avatar. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2007, vol. 4, pp. 837–840 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Berlin Heidelberg
About this chapter
Cite this chapter
Zhang, S., Wu, Z., Meng, H.M., Cai, L. (2010). Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar. In: Nishida, T., Jain, L.C., Faucher, C. (eds) Modeling Machine Emotions for Realizing Intelligence. Smart Innovation, Systems and Technologies, vol 1. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12604-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-12604-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12603-1
Online ISBN: 978-3-642-12604-8
eBook Packages: EngineeringEngineering (R0)