Multimodal emotion recognition from expressive faces, body gestures and speech

  • George Caridakis
  • Ginevra Castellano
  • Loic Kessous
  • Amaryllis Raouzaiou
  • Lori Malatesta
  • Stelios Asteriadis
  • Kostas Karpouzis
Part of the IFIP The International Federation for Information Processing book series (IFIPAICT, volume 247)


In this paper we present a multimodal approach for the recognition of eight emotions that integrates information from facial expressions, body movement and gestures and speech. We trained and tested a model with a Bayesian classifier, using a multimodal corpus with eight emotions and ten subjects. First individual classifiers were trained for each modality. Then data were fused at the feature level and the decision level. Fusing multimodal data increased very much the recognition rates in comparison with the unimodal systems: the multimodal approach gave an improvement of more than 10% with respect to the most successful unimodal system. Further, the fusion performed at the feature level showed better results than the one performed at the decision level.


Affective body language Affective speech Emotion recognition Multimodal fusion 


  1. 1.
    Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction, IEEE Signal Processing Magazine, January 2001.Google Scholar
  2. 2.
    Picard, R.: Affective computing, Boston, MA: MIT Press (1997).Google Scholar
  3. 3.
    Sebe, N., Cohen, I., Huang, T.S.: Multimodal Emotion Recognition, Handbook of Pattern Recognition and Computer Vision, World Scientific, ISBN 981-256-105-6, January 2005.Google Scholar
  4. 4.
    Pantic, M., Sebe, N., Cohn, J., Huang, T.S.: Affective Multimodal Human-Computer Interaction, ACM Multimedia, pp. 669–676, Singapore, November 2005.Google Scholar
  5. 5.
    Scherer, K. R., Wallbott, H. G.: Analysis of Nonverbal Behavior. HANDBOOK OF DISCOURSE: ANALYSIS, Vol. 2, Cap. 11, Academic Press London (1985).Google Scholar
  6. 6.
    Banse, R., Scherer, K.R.: Acoustic Profiles in Vocal Emotion Expression. Journal of Personality and Social Psychology. 614–636 (1996).Google Scholar
  7. 7.
    Vogt, T., André, E.: Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. IEEE International Conference on Multimedia & Expo (ICME 2005).Google Scholar
  8. 8.
    Gunes H., Piccardi M.: A Bimodal Face and Body Gesture Database for Automatic Analysis of Human Nonverbal Affective Behavior, Proc. of ICPR 2006 the 18th International Conference on Pattern Recognition, 20–24 Aug. 2006, Hong Kong, China.Google Scholar
  9. 9.
    Banziger, T., Pirker, H., Scherer, K.: Gemep — geneva multimodal emotion portrayals: a corpus for the study of multimodal emotional expressions. In L. Deviller et al. (Ed.), Proceedings of LREC’06 Workshop on Corpora for Research on Emotion and Affect (pp. 15–019). Genoa. Italy (2006).Google Scholar
  10. 10.
    Douglas-Cowie, E., Campbell, N., Cowie, R., Roach, P.: Emotional speech: towards a new generation of databases. Speech Communication, 40, 33–60 (2003).zbMATHCrossRefGoogle Scholar
  11. 11.
    Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: The state of the art. IEEE Trans, on Pattern Analysis and Machine Intelligence, 22(12):1424–1445 (2000).CrossRefGoogle Scholar
  12. 12.
    Ioannou, S., Raouzaiou, A., Tzouvaras, V., Mailis, T., Karpouzis, K., Kollias, S.: Emotion recognition through facial expression analysis based on a neurofuzzy network, Neural Networks, Elsevier, Vol. 18, Issue 4, May 2005, pp. 423–435.Google Scholar
  13. 13.
    Cowie, R., Douglas-Cowie, E.: Automatic statistical analysis of the signal and prosodic signs of emotion in speech. In Proc. International Conf. on Spoken Language Processing, pp. 1989–1992 (1996).Google Scholar
  14. 14.
    K.R. Scherer: Adding the affective dimension: A new look in speech analysis and synthesis, In Proc. International Conf. on Spoken Language Processing, pp. 1808–1811, (1996).Google Scholar
  15. 15.
    Camurri, A., Lagerlof, I, Volpe, G.: Recognizing Emotion from Dance Movement: Comparison of Spectator Recognition and Automated Techniques, International Journal of Human-Computer Studies, 59(1∓), pp. 213–225, Elsevier Science, July 2003.CrossRefGoogle Scholar
  16. 16.
    Bianchi-Berthouze, N., Kleinsmith, A. A categorical approach to affective gesture recognition, Connection Science, V. 15, N. 4, pp. 259–269. (2003).Google Scholar
  17. 17.
    Picard, R.W., Vyzas, E., Healey, J.: Toward machine emotional intelligence: Analysis of affective physiological state, IEEE Trans, on Pattern Analysis and Machine Intelligence, 23(10):1175–1191 (2001).CrossRefGoogle Scholar
  18. 18.
    Pantic M., Rothkrantz, L.J.M.: Towards an Affect-sensitive Multimodal Human-Computer Interaction, Proceedings of the IEEE, vol. 91, no. 9, pp. 1370–1390 (2003).CrossRefGoogle Scholar
  19. 19.
    Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzaeh, A., Lee, S., Neumann, U., Narayanan, S.: “Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal information,” Proc. of ACM 6th int’l Conf. on Multimodal Interfaces (ICMI 2004), State College, PA, Oct. 2004. pp205–211.Google Scholar
  20. 20.
    Kim, J., André, E., Rehm, M., Vogt, T., Wagner, J.: Integrating information from speech and physiological signals to achieve emotional sensitivity. Proc. of the 9th European Conference on Speech Communication and Technology (2005).Google Scholar
  21. 21.
    Gunes H, Piccardi M.: Bi-modal emotion recognition from expressive face and body gestures. Journal of Network and Computer Applications (2006), doi:10.1016/j.jnca.2006.09.007Google Scholar
  22. 22.
    el Kaliouby, R., Robinson, P.: Generalization of a Vision-Based Computational Model of Mind-Reading. In Proceedings of First International Conference on Affective Computing and Intelligent Interfaces, pp 582–589 (2005).Google Scholar
  23. 23.
    Engelbrecht, A.P., Fletcher, L., Cloete, I.: Variance analysis of sensitivity information for pruning multilayer feedforward neural networks, Neural Networks, 1999. IJCNN’ 99. International Joint Conference on, Vol.3, Iss., 1999, pp:1829–1833 vol.3.Google Scholar
  24. 24.
    Young, J. W.: Head and Face Anthropometry of Adult U.S. Civilians, FAA Civil Aeromedical Institute, 1963–1993 (final report 1993)Google Scholar
  25. 25.
    Raouzaiou, A., Tsapatsoulis, N., Karpouzis, K., Kollias, S.: Parameterized facial expression synthesis based on MPEG-4, EURASIP Journal on Applied Signal Processing, Vol. 2002, No 10, 2002, pp. 1021–1038.zbMATHCrossRefGoogle Scholar
  26. 26.
    Camurri, A., Coletta, P., Massari, A., Mazzarino, B., Peri, M., Ricchetti, M., Ricci, A. and Volpe, G.:Toward real-time multimodal processing: EyesWeb 4.0, in Proc. AISB 2004 Convention: Motion, Emotion and Cognition, Leeds, UK, March 2004.Google Scholar
  27. 27.
    Camurri, A., Mazzarino, B., and Volpe, G.: Analysis of Expressive Gesture: The Eyesweb Expressive Gesture Processing Library, in A. Camurri, G. Volpe (Eds.), Gesture-based Communication in Human-Computer Interaction, LNAI 2915, Springer Verlag (2004).Google Scholar
  28. 28.
    Castellano, G., Camurri, A., Mazzarino, B., Volpe, G.: A mathematical model to analyse the dynamics of gesture expressivity, in Proc. of AISB 2007 Convention: Artificial and Ambient Intelligence, Newcastle upon Tyne, UK, April 2007.Google Scholar
  29. 29.
    Kessous, L., Amir, N.: Comparison of feature extraction approaches based on the Bark time/frequency representation for classification of expressive speechpaper submitted to Interspeech 2007.Google Scholar
  30. 30.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd Edition, Morgan Kaufmann, San Francisco (2005).zbMATHGoogle Scholar
  31. 31.
    Kononenko, I.: On Biases in Estimating Multi-Valued Attributes. In: 14th International Joint Conference on Articial Intelligence, 1034–1040 (1995).Google Scholar

Copyright information

© International Federation for Information Processing 2007

Authors and Affiliations

  • George Caridakis
    • 1
  • Ginevra Castellano
    • 2
  • Loic Kessous
    • 3
  • Amaryllis Raouzaiou
    • 1
  • Lori Malatesta
    • 1
  • Stelios Asteriadis
    • 1
  • Kostas Karpouzis
    • 1
  1. 1.Image, Video and Multimedia Systems LaboratoryNational Technical University of AthensAthensGreece
  2. 2.InfoMus LabDIST - University of GenovaGenovaItaly
  3. 3.Department of Speech, Language and HearingUniversity of Tel AvivTel AvivIsrael

Personalised recommendations