Abstract
Emotional intelligence is an indispensable facet of human intelligence and one of the most important factors for a successful social life. Endowing machines with this kind of intelligence towards affective human–machine interaction, however, is not an easy task. It becomes more complex with the fact that human beings use several modalities jointly to interpret affective states, since emotion affects almost all modes – audio-visual (facial expression, voice, gesture, posture, etc.), physiological (respiration, skin temperature, etc.), and contextual (goal, preference, environment, social situation, etc.) states. Compared to common unimodal approaches, many specific problems arise from the case of multimodal emotion recognition, especially concerning fusion architecture of the multimodal information. In this chapter, we firstly give a short review for the problems and then present research results of various multimodal architectures based on combined analysis of facial expression, speech, and physiological signals. Lastly we introduce designing of an adaptive neural network classifier that is capable of deciding the necessity of adaptation process in respect of environmental changes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ambady N, Rosenthal R (1992) Thin slices of expressive behavior as predictors of interpersonal consequences: a meta-analysis. Psychol Bull 111(2):256–274
Balomenos T, Raouzaiou A, Ioannou S, Drosopoulos A, Karpouzis K, Kollias S (2006) Emotion analysis in man–machine interaction systems. In: Bengio S, Bourlard H (eds) Machine learning for multimodal interaction. Lecture notes in computer science, vol 3361. Springer, Berlin, pp 318–328
Caridakis G, Karpouzis K, Kollias S (2008) User and context adaptive neural networks for emotion recognition. Neurocomputing, Elsevier, 71(13–15):2553–2562
Caridakis G, Malatesta L, Kessous L, Amir N, Raouzaiou A, Karpouzis K (2006) Modeling naturalistic affective states via facial and vocal expressions recognition. In: International conference on multimodal interfaces (ICMI’06), Banff, AB, 2–4 Nov 2006
Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schröder M (2000) FEELTRACE: an instrument for recording perceived emotion in real time. In: ISCA workshop on speech and emotion, Northern Ireland, pp 19–24
Doulamis N, Doulamis A, Kollias S (2000) On-line retrainable neural networks: improving performance of neural networks in image analysis problems. IEEE Trans Neural Netw 11(1):1–20
Ekman P, Friesen WF (1969) The repertoire of nonverbal behavioral categories – origins, usage, and coding. Semiotica 1:49–98
Ekman P, Friesen W (1975) Unmasking the face. Prentice-Hall, Englewood Cliffs, NJ<loc>
Ekman P, Huang TS, Sejnowski TJ, Hager JC (eds) (1993) NSF understanding the face. A Human Face eStore, Salt Lake City (see Library)
Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211
FP5 IST ERMIS (2007) http://www.image.ntua.gr/ermis. Accessed 30 Oct 2007
Fridlund AJ (1997) The new ethology of human facial expression. In: Russell JA, Fernandez-Dols JM (eds) The psychology of facial expression. Cambridge University Press, Cambridge, MA, pp 103–129
Goleman D (1995) Emotional intelligence. Bantam Books, New York, NY
Juslin PN, Scherer KR (2005) Vocal expression of affect. In: Harrigan J, Rosenthal R, Scherer K (eds) The new handbook of methods in nonverbal behavior research. Oxford University Press, Oxford
Karpouzis K, Raouzaiou A, Drosopoulos A, Ioannou S, Balomenos T, Tsapatsoulis N, Kollias S (2004) Facial expression and gesture analysis for emotionally-rich man–machine interaction. In: Sarris N, Strintzis M (eds) 3D modeling and animation: synthesis and analysis techniques. Idea Group, Hershey, PA, pp 175–200
Keltner D, Ekman P (2000) Facial expression of emotion. In: Lewis M, Haviland-Jones JM (eds) Handbook of emotions. Guilford Press, New York, NY, pp 236–249
Kim J, André E (2006) Emotion recognition using physiological and speech signal in short-term observation. In: Perception and interactive technologies, LNAI 4201. Springer, Berlin, Heidelberg, pp 53–64
Kim J, André E, Rehm M, Vogt T, Wagner J (2005) Integrating information from speech and physiological signals to achieve emotional sensitivity. In: Proceedings of the 9th European conference on speech communication and technology, Lisbon, Portugal
Larsen RJ, Diener E (1992) Promises and problems with the circumplex model of emotion. In: Clark MS, (ed) Review of personality and social psychology, vol 13. Sage, Newbury Park, CA, pp 25–59
Luettin J, Thacker N, Beet S (1996) Active shape models for visual speech feature extraction. In: Storck DG, Hennecke ME (eds) Speechreading by humans and machines. Springer, Berlin, pp 383–390
Matsumoto D (1990) Cultural similarities and differences in display rules. Motiv Emot 14:195–214
Pantic M (2005) Affective computing. In: Pagani M (ed) Encyclopedia of multimedia technology and networking, vol 1. Idea Group Reference, Hershy, PA, pp 8–14
Pantic M, Pentland A, Nijholt A, Huang TS (2006) Human computing and machine understanding of human behaviour: a survey. In: Proceedings of the ACM international conference on multimodal interfaces, Banff, Alberta, Canada, pp 239–248
Pantic M, Rothkrantz LJM (2003) Toward an affect-sensitive multimodal human–computer interaction. Proc IEEE 91(9):1370–1390
Pantic M, Sebe N, Cohn JF, Huang TS (2005) Affective multimodal human–computer interaction. In: Proceedings of the 13th annual ACM international conference on Multimedia, pp 669–676
Park D, EL-Sharkawi MA, Marks RJ II (1991) An adaptively trained neural network. IEEE Trans Neural Netw 2:334–345
Pelachaud C, Carofiglio V, De Carolis B, de Rosis F, Poggi I (2002) Embodied contextual agent in information delivering application. In: Proceedings of the international conference on autonomous agents and multi-agent systems. Bologna, Italy
Picard RW (1997) Affective computing. The MIT Press, Cambridge, MA
Picard RW (2003) Affective computing: challenges. Int J Human–Comput Stud 59(1–2):55–64
Potamianos G, Neti C, Gravier G, Garg A (2003 Sept) Automatic recognition of audio-visual speech: recent progress and challenges. Proc IEEE 91(9):1306–1326
Russell JA (1994) Is there universal recognition of emotion from facial expression? Psychol Bull 115(1):102–141
Taylor J, Fragopanagos N (2004) Modelling human attention and emotions. Proc 2004 IEEE Int Joint Conf Neural Netw 1:501–506.
Taylor J, Fragopanagos N (2005) The interaction of attention and emotion. Neural Netw 18(4):353–369
Watson D, Weber K, Assenheimer JS, Clark LA, Strauss ME, McCormick RA (1995a) Testing a tripartite model: I. Evaluating the convergent and discriminant validity of anxiety and depression symptom scales. J Abnorm Psychol 104:3–14
Watson D, Clark LA, Weber K, Smith-Assenheimer J, Strauss ME, McCormick RA (1995b) Testing a tripartite model: II. Exploring the symptom structure of anxiety and depression in student, adult, and patient samples. J Abnorm Psychol 104:15–25
Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Pantic, M., Caridakis, G., André, E., Kim, J., Karpouzis, K., Kollias, S. (2011). Multimodal Emotion Recognition from Low-Level Cues. In: Cowie, R., Pelachaud, C., Petta, P. (eds) Emotion-Oriented Systems. Cognitive Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15184-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-15184-2_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15183-5
Online ISBN: 978-3-642-15184-2
eBook Packages: Computer ScienceComputer Science (R0)