Multimodal Emotion Recognition from Low-Level Cues

Part of the Cognitive Technologies book series (COGTECH)


Emotional intelligence is an indispensable facet of human intelligence and one of the most important factors for a successful social life. Endowing machines with this kind of intelligence towards affective human–machine interaction, however, is not an easy task. It becomes more complex with the fact that human beings use several modalities jointly to interpret affective states, since emotion affects almost all modes – audio-visual (facial expression, voice, gesture, posture, etc.), physiological (respiration, skin temperature, etc.), and contextual (goal, preference, environment, social situation, etc.) states. Compared to common unimodal approaches, many specific problems arise from the case of multimodal emotion recognition, especially concerning fusion architecture of the multimodal information. In this chapter, we firstly give a short review for the problems and then present research results of various multimodal architectures based on combined analysis of facial expression, speech, and physiological signals. Lastly we introduce designing of an adaptive neural network classifier that is capable of deciding the necessity of adaptation process in respect of environmental changes.


Facial Expression Affective State Emotion Recognition Emotional Intelligence Network Weight 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Ambady N, Rosenthal R (1992) Thin slices of expressive behavior as predictors of interpersonal consequences: a meta-analysis. Psychol Bull 111(2):256–274CrossRefGoogle Scholar
  2. Balomenos T, Raouzaiou A, Ioannou S, Drosopoulos A, Karpouzis K, Kollias S (2006) Emotion analysis in man–machine interaction systems. In: Bengio S, Bourlard H (eds) Machine learning for multimodal interaction. Lecture notes in computer science, vol 3361. Springer, Berlin, pp 318–328Google Scholar
  3. Caridakis G, Karpouzis K, Kollias S (2008) User and context adaptive neural networks for emotion recognition. Neurocomputing, Elsevier, 71(13–15):2553–2562CrossRefGoogle Scholar
  4. Caridakis G, Malatesta L, Kessous L, Amir N, Raouzaiou A, Karpouzis K (2006) Modeling naturalistic affective states via facial and vocal expressions recognition. In: International conference on multimodal interfaces (ICMI’06), Banff, AB, 2–4 Nov 2006Google Scholar
  5. Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schröder M (2000) FEELTRACE: an instrument for recording perceived emotion in real time. In: ISCA workshop on speech and emotion, Northern Ireland, pp 19–24Google Scholar
  6. Doulamis N, Doulamis A, Kollias S (2000) On-line retrainable neural networks: improving performance of neural networks in image analysis problems. IEEE Trans Neural Netw 11(1):1–20CrossRefGoogle Scholar
  7. Ekman P, Friesen WF (1969) The repertoire of nonverbal behavioral categories – origins, usage, and coding. Semiotica 1:49–98Google Scholar
  8. Ekman P, Friesen W (1975) Unmasking the face. Prentice-Hall, Englewood Cliffs, NJ<loc>Google Scholar
  9. Ekman P, Huang TS, Sejnowski TJ, Hager JC (eds) (1993) NSF understanding the face. A Human Face eStore, Salt Lake City (see Library)Google Scholar
  10. Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211CrossRefGoogle Scholar
  11. FP5 IST ERMIS (2007) Accessed 30 Oct 2007
  12. Fridlund AJ (1997) The new ethology of human facial expression. In: Russell JA, Fernandez-Dols JM (eds) The psychology of facial expression. Cambridge University Press, Cambridge, MA, pp 103–129Google Scholar
  13. Goleman D (1995) Emotional intelligence. Bantam Books, New York, NYGoogle Scholar
  14. Juslin PN, Scherer KR (2005) Vocal expression of affect. In: Harrigan J, Rosenthal R, Scherer K (eds) The new handbook of methods in nonverbal behavior research. Oxford University Press, OxfordGoogle Scholar
  15. Karpouzis K, Raouzaiou A, Drosopoulos A, Ioannou S, Balomenos T, Tsapatsoulis N, Kollias S (2004) Facial expression and gesture analysis for emotionally-rich man–machine interaction. In: Sarris N, Strintzis M (eds) 3D modeling and animation: synthesis and analysis techniques. Idea Group, Hershey, PA, pp 175–200Google Scholar
  16. Keltner D, Ekman P (2000) Facial expression of emotion. In: Lewis M, Haviland-Jones JM (eds) Handbook of emotions. Guilford Press, New York, NY, pp 236–249Google Scholar
  17. Kim J, André E (2006) Emotion recognition using physiological and speech signal in short-term observation. In: Perception and interactive technologies, LNAI 4201. Springer, Berlin, Heidelberg, pp 53–64CrossRefGoogle Scholar
  18. Kim J, André E, Rehm M, Vogt T, Wagner J (2005) Integrating information from speech and physiological signals to achieve emotional sensitivity. In: Proceedings of the 9th European conference on speech communication and technology, Lisbon, PortugalGoogle Scholar
  19. Larsen RJ, Diener E (1992) Promises and problems with the circumplex model of emotion. In: Clark MS, (ed) Review of personality and social psychology, vol 13. Sage, Newbury Park, CA, pp 25–59Google Scholar
  20. Luettin J, Thacker N, Beet S (1996) Active shape models for visual speech feature extraction. In: Storck DG, Hennecke ME (eds) Speechreading by humans and machines. Springer, Berlin, pp 383–390Google Scholar
  21. Matsumoto D (1990) Cultural similarities and differences in display rules. Motiv Emot 14:195–214CrossRefGoogle Scholar
  22. Pantic M (2005) Affective computing. In: Pagani M (ed) Encyclopedia of multimedia technology and networking, vol 1. Idea Group Reference, Hershy, PA, pp 8–14Google Scholar
  23. Pantic M, Pentland A, Nijholt A, Huang TS (2006) Human computing and machine understanding of human behaviour: a survey. In: Proceedings of the ACM international conference on multimodal interfaces, Banff, Alberta, Canada, pp 239–248Google Scholar
  24. Pantic M, Rothkrantz LJM (2003) Toward an affect-sensitive multimodal human–computer interaction. Proc IEEE 91(9):1370–1390CrossRefGoogle Scholar
  25. Pantic M, Sebe N, Cohn JF, Huang TS (2005) Affective multimodal human–computer interaction. In: Proceedings of the 13th annual ACM international conference on Multimedia, pp 669–676CrossRefGoogle Scholar
  26. Park D, EL-Sharkawi MA, Marks RJ II (1991) An adaptively trained neural network. IEEE Trans Neural Netw 2:334–345CrossRefGoogle Scholar
  27. Pelachaud C, Carofiglio V, De Carolis B, de Rosis F, Poggi I (2002) Embodied contextual agent in information delivering application. In: Proceedings of the international conference on autonomous agents and multi-agent systems. Bologna, ItalyGoogle Scholar
  28. Picard RW (1997) Affective computing. The MIT Press, Cambridge, MAGoogle Scholar
  29. Picard RW (2003) Affective computing: challenges. Int J Human–Comput Stud 59(1–2):55–64CrossRefGoogle Scholar
  30. Potamianos G, Neti C, Gravier G, Garg A (2003 Sept) Automatic recognition of audio-visual speech: recent progress and challenges. Proc IEEE 91(9):1306–1326Google Scholar
  31. Russell JA (1994) Is there universal recognition of emotion from facial expression? Psychol Bull 115(1):102–141CrossRefGoogle Scholar
  32. Taylor J, Fragopanagos N (2004) Modelling human attention and emotions. Proc 2004 IEEE Int Joint Conf Neural Netw 1:501–506.Google Scholar
  33. Taylor J, Fragopanagos N (2005) The interaction of attention and emotion. Neural Netw 18(4):353–369CrossRefGoogle Scholar
  34. Watson D, Weber K, Assenheimer JS, Clark LA, Strauss ME, McCormick RA (1995a) Testing a tripartite model: I. Evaluating the convergent and discriminant validity of anxiety and depression symptom scales. J Abnorm Psychol 104:3–14CrossRefGoogle Scholar
  35. Watson D, Clark LA, Weber K, Smith-Assenheimer J, Strauss ME, McCormick RA (1995b) Testing a tripartite model: II. Exploring the symptom structure of anxiety and depression in student, adult, and patient samples. J Abnorm Psychol 104:15–25CrossRefGoogle Scholar
  36. Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Department of ComputingImperial CollegeLondonUK
  2. 2.Faculty of Electrical Engineering, Mathematics and Computer ScienceUniversity of TwenteEnschedeThe Netherlands
  3. 3.Image, Video and Multimedia Systems LabNational Technical University of AthensAthensGreece
  4. 4.University of AusgburgAusgburgGermany
  5. 5.Image, Video and Multimedia Systems LabInstitute of Communications and Computer Systems, National Technical University of AthensAthensGreece

Personalised recommendations