Skip to main content

Multimodal Emotion Recognition from Low-Level Cues

Part of the Cognitive Technologies book series (COGTECH)

Abstract

Emotional intelligence is an indispensable facet of human intelligence and one of the most important factors for a successful social life. Endowing machines with this kind of intelligence towards affective human–machine interaction, however, is not an easy task. It becomes more complex with the fact that human beings use several modalities jointly to interpret affective states, since emotion affects almost all modes – audio-visual (facial expression, voice, gesture, posture, etc.), physiological (respiration, skin temperature, etc.), and contextual (goal, preference, environment, social situation, etc.) states. Compared to common unimodal approaches, many specific problems arise from the case of multimodal emotion recognition, especially concerning fusion architecture of the multimodal information. In this chapter, we firstly give a short review for the problems and then present research results of various multimodal architectures based on combined analysis of facial expression, speech, and physiological signals. Lastly we introduce designing of an adaptive neural network classifier that is capable of deciding the necessity of adaptation process in respect of environmental changes.

Keywords

  • Facial Expression
  • Affective State
  • Emotion Recognition
  • Emotional Intelligence
  • Network Weight

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-15184-2_8
  • Chapter length: 18 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   229.00
Price excludes VAT (USA)
  • ISBN: 978-3-642-15184-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   289.00
Price excludes VAT (USA)
Hardcover Book
USD   299.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  • Ambady N, Rosenthal R (1992) Thin slices of expressive behavior as predictors of interpersonal consequences: a meta-analysis. Psychol Bull 111(2):256–274

    CrossRef  Google Scholar 

  • Balomenos T, Raouzaiou A, Ioannou S, Drosopoulos A, Karpouzis K, Kollias S (2006) Emotion analysis in man–machine interaction systems. In: Bengio S, Bourlard H (eds) Machine learning for multimodal interaction. Lecture notes in computer science, vol 3361. Springer, Berlin, pp 318–328

    Google Scholar 

  • Caridakis G, Karpouzis K, Kollias S (2008) User and context adaptive neural networks for emotion recognition. Neurocomputing, Elsevier, 71(13–15):2553–2562

    CrossRef  Google Scholar 

  • Caridakis G, Malatesta L, Kessous L, Amir N, Raouzaiou A, Karpouzis K (2006) Modeling naturalistic affective states via facial and vocal expressions recognition. In: International conference on multimodal interfaces (ICMI’06), Banff, AB, 2–4 Nov 2006

    Google Scholar 

  • Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schröder M (2000) FEELTRACE: an instrument for recording perceived emotion in real time. In: ISCA workshop on speech and emotion, Northern Ireland, pp 19–24

    Google Scholar 

  • Doulamis N, Doulamis A, Kollias S (2000) On-line retrainable neural networks: improving performance of neural networks in image analysis problems. IEEE Trans Neural Netw 11(1):1–20

    CrossRef  Google Scholar 

  • Ekman P, Friesen WF (1969) The repertoire of nonverbal behavioral categories – origins, usage, and coding. Semiotica 1:49–98

    Google Scholar 

  • Ekman P, Friesen W (1975) Unmasking the face. Prentice-Hall, Englewood Cliffs, NJ<loc>

    Google Scholar 

  • Ekman P, Huang TS, Sejnowski TJ, Hager JC (eds) (1993) NSF understanding the face. A Human Face eStore, Salt Lake City (see Library)

    Google Scholar 

  • Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211

    CrossRef  Google Scholar 

  • FP5 IST ERMIS (2007) http://www.image.ntua.gr/ermis. Accessed 30 Oct 2007

  • Fridlund AJ (1997) The new ethology of human facial expression. In: Russell JA, Fernandez-Dols JM (eds) The psychology of facial expression. Cambridge University Press, Cambridge, MA, pp 103–129

    Google Scholar 

  • Goleman D (1995) Emotional intelligence. Bantam Books, New York, NY

    Google Scholar 

  • Juslin PN, Scherer KR (2005) Vocal expression of affect. In: Harrigan J, Rosenthal R, Scherer K (eds) The new handbook of methods in nonverbal behavior research. Oxford University Press, Oxford

    Google Scholar 

  • Karpouzis K, Raouzaiou A, Drosopoulos A, Ioannou S, Balomenos T, Tsapatsoulis N, Kollias S (2004) Facial expression and gesture analysis for emotionally-rich man–machine interaction. In: Sarris N, Strintzis M (eds) 3D modeling and animation: synthesis and analysis techniques. Idea Group, Hershey, PA, pp 175–200

    Google Scholar 

  • Keltner D, Ekman P (2000) Facial expression of emotion. In: Lewis M, Haviland-Jones JM (eds) Handbook of emotions. Guilford Press, New York, NY, pp 236–249

    Google Scholar 

  • Kim J, André E (2006) Emotion recognition using physiological and speech signal in short-term observation. In: Perception and interactive technologies, LNAI 4201. Springer, Berlin, Heidelberg, pp 53–64

    CrossRef  Google Scholar 

  • Kim J, André E, Rehm M, Vogt T, Wagner J (2005) Integrating information from speech and physiological signals to achieve emotional sensitivity. In: Proceedings of the 9th European conference on speech communication and technology, Lisbon, Portugal

    Google Scholar 

  • Larsen RJ, Diener E (1992) Promises and problems with the circumplex model of emotion. In: Clark MS, (ed) Review of personality and social psychology, vol 13. Sage, Newbury Park, CA, pp 25–59

    Google Scholar 

  • Luettin J, Thacker N, Beet S (1996) Active shape models for visual speech feature extraction. In: Storck DG, Hennecke ME (eds) Speechreading by humans and machines. Springer, Berlin, pp 383–390

    Google Scholar 

  • Matsumoto D (1990) Cultural similarities and differences in display rules. Motiv Emot 14:195–214

    CrossRef  Google Scholar 

  • Pantic M (2005) Affective computing. In: Pagani M (ed) Encyclopedia of multimedia technology and networking, vol 1. Idea Group Reference, Hershy, PA, pp 8–14

    Google Scholar 

  • Pantic M, Pentland A, Nijholt A, Huang TS (2006) Human computing and machine understanding of human behaviour: a survey. In: Proceedings of the ACM international conference on multimodal interfaces, Banff, Alberta, Canada, pp 239–248

    Google Scholar 

  • Pantic M, Rothkrantz LJM (2003) Toward an affect-sensitive multimodal human–computer interaction. Proc IEEE 91(9):1370–1390

    CrossRef  Google Scholar 

  • Pantic M, Sebe N, Cohn JF, Huang TS (2005) Affective multimodal human–computer interaction. In: Proceedings of the 13th annual ACM international conference on Multimedia, pp 669–676

    CrossRef  Google Scholar 

  • Park D, EL-Sharkawi MA, Marks RJ II (1991) An adaptively trained neural network. IEEE Trans Neural Netw 2:334–345

    CrossRef  Google Scholar 

  • Pelachaud C, Carofiglio V, De Carolis B, de Rosis F, Poggi I (2002) Embodied contextual agent in information delivering application. In: Proceedings of the international conference on autonomous agents and multi-agent systems. Bologna, Italy

    Google Scholar 

  • Picard RW (1997) Affective computing. The MIT Press, Cambridge, MA

    Google Scholar 

  • Picard RW (2003) Affective computing: challenges. Int J Human–Comput Stud 59(1–2):55–64

    CrossRef  Google Scholar 

  • Potamianos G, Neti C, Gravier G, Garg A (2003 Sept) Automatic recognition of audio-visual speech: recent progress and challenges. Proc IEEE 91(9):1306–1326

    Google Scholar 

  • Russell JA (1994) Is there universal recognition of emotion from facial expression? Psychol Bull 115(1):102–141

    CrossRef  Google Scholar 

  • Taylor J, Fragopanagos N (2004) Modelling human attention and emotions. Proc 2004 IEEE Int Joint Conf Neural Netw 1:501–506.

    Google Scholar 

  • Taylor J, Fragopanagos N (2005) The interaction of attention and emotion. Neural Netw 18(4):353–369

    CrossRef  Google Scholar 

  • Watson D, Weber K, Assenheimer JS, Clark LA, Strauss ME, McCormick RA (1995a) Testing a tripartite model: I. Evaluating the convergent and discriminant validity of anxiety and depression symptom scales. J Abnorm Psychol 104:3–14

    CrossRef  Google Scholar 

  • Watson D, Clark LA, Weber K, Smith-Assenheimer J, Strauss ME, McCormick RA (1995b) Testing a tripartite model: II. Exploring the symptom structure of anxiety and depression in student, adult, and patient samples. J Abnorm Psychol 104:15–25

    CrossRef  Google Scholar 

  • Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maja Pantic .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Pantic, M., Caridakis, G., André, E., Kim, J., Karpouzis, K., Kollias, S. (2011). Multimodal Emotion Recognition from Low-Level Cues. In: Cowie, R., Pelachaud, C., Petta, P. (eds) Emotion-Oriented Systems. Cognitive Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15184-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15184-2_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15183-5

  • Online ISBN: 978-3-642-15184-2

  • eBook Packages: Computer ScienceComputer Science (R0)