Evidence Theory-Based Multimodal Emotion Recognition

  • Marco Paleari
  • Rachid Benmokhtar
  • Benoit Huet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5371)


Automatic recognition of human affective states is still a largely unexplored and challenging topic. Even more issues arise when dealing with variable quality of the inputs or aiming for real-time, unconstrained, and person independent scenarios. In this paper, we explore audio-visual multimodal emotion recognition. We present SAMMI, a framework designed to extract real-time emotion appraisals from non-prototypical, person independent, facial expressions and vocal prosody. Different probabilistic method for fusion are compared and evaluated with a novel fusion technique called NNET. Results shows that NNET can improve the recognition score (CR + ) of about 19% and the mean average precision of about 30% with respect to the best unimodal system.


Facial Expression Emotion Recognition Radial Basis Function Neural Network Fusion Technique Mean Average Precision 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Picard, R.: Affective Computing. MIT Press, Cambridge (1997)CrossRefGoogle Scholar
  2. 2.
    Benmokhtar, R., Huet, B.: Neural network combining classifier based on Dempster-Shafer theory for semantic indexing in video content. In: Cham, T.-J., Cai, J., Dorai, C., Rajan, D., Chua, T.-S., Chia, L.-T. (eds.) MMM 2007. LNCS, vol. 4351, pp. 196–205. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Lisetti, C., Nasoz, F.: Using noninvasive wearable computers to recognize human emotions from physiological signals. EURASIP Journal on ASP 11, 1672–1687 (2004)Google Scholar
  4. 4.
    Villon, O., Lisetti, C.L.: Toward Building Adaptive User’s Psycho-Physiological Maps of Emotions using Bio-Sensors. In: Proceedings of KI (2006)Google Scholar
  5. 5.
    Mase, K.: Recognition of facial expression from optical flow. Proceedings of IEICE Transactions E74, 3474–3483 (1991)Google Scholar
  6. 6.
    Essa, I.A., Pentland, A.P.: Coding, Analysis, Interpretation, and Recognition of Facial Expressions. IEEE Transactions PAMI 19(7), 757–763 (1997)CrossRefGoogle Scholar
  7. 7.
    Cohen, I., Sebe, N., Garg, A., Lew, S., Huang, T.: Facial expression recognition from video sequences. In: Proceedings of ICME, pp. 121–124 (2002)Google Scholar
  8. 8.
    Pantic, M., Rothkrantz, L.: Toward an Affect-Sensitive Multimodal Human-Computer Interaction. Proceedings of IEEE 91, 1370–1390 (2003)CrossRefGoogle Scholar
  9. 9.
    Gunes, H., Piccardi, M.: Bi-modal emotion recognition from expressive face and body gestures. Journal NCA 30(4), 1334–1345 (2007)Google Scholar
  10. 10.
    Noble, J.: Spoken Emotion Recognition with Support Vector Machines. PhD Thesis (2003)Google Scholar
  11. 11.
    Zeng, Z., Hu, Y., Liu, M., Fu, Y., Huang, T.S.: Training combination strategy of multi-stream fused hidden Markov model for audio-visual affect recognition. In: ACM MM, pp. 65–68 (2006)Google Scholar
  12. 12.
    Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee., C., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of ICMI, pp. 205–211 (2004)Google Scholar
  13. 13.
    Audio-Visual Affect Recognition through Multi-Stream Fused HMM for HCI. In: CVPR. vol. 2 (2005)Google Scholar
  14. 14.
    Paleari, M., Huet, B., Duffy, B.: SAMMI, Semantic Affect-enhanced MultiMedia Indexing. In: SAMT (2007)Google Scholar
  15. 15.
    Paleari, M., Huet, B.: Toward Emotion Indexing of Multimedia Excerpts. In: CBMI (2008)Google Scholar
  16. 16.
    Martin, O., Kotsia, I., Macq, B., Pitas, I.: The eNTERFACE05 Audio-Visual Emotion Database. In: Proceedings of ICDEW (2006)Google Scholar
  17. 17.
    Galmar, E., Huet, B.: Analysis of Vector Space Model and Spatiotemporal Segmentation for Video Indexing and Retrieval. In: ACM CIVR (2007)Google Scholar
  18. 18.
    Benmokhtar, R., Huet, B.: Multi-level Fusion for Semantic Video Content Indexing and Retrieval. In: Proceedings of AMR (2007)Google Scholar
  19. 19.
    IntelCorporation: Open Source Computer Vision Library: Reference Manual (November 2006),
  20. 20.
    Vukadinovic, D., Pantic, M.: Fully automatic facial feature point detection using Gabor feature based boosted classifiers. In: Proceedings of IEEE ICSMC, pp. 1692–1698 (2005)Google Scholar
  21. 21.
    Sohail, A.S.M., Bhattacharya, P.: Detection of Facial Feature Points Using Anthropometric Face Model. In: Proceedings of SPIEMP, vol. 31, pp. 189–200 (2006)Google Scholar
  22. 22.
    Boersmal, P., Weenink, D.: Praat: doing phonetics by computer (January 2008),
  23. 23.
    Benmokhtar, R., Huet, B.: Classifier fusion: Combination methods for semantic indexing in video content. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4132, pp. 65–74. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  24. 24.
    Denoeux, T.: An evidence-theoretic neural network classifer. In: Proceedings of IEEE SMC, vol. 31, pp. 712–717 (1995)Google Scholar
  25. 25.
    Cohen, I., Garg, A., Huang, T.S.: Emotion recognition from facial expressions using multilevel HMM. In: NIPS (2000)Google Scholar
  26. 26.
    Benmokhtar, R., Huet, B.: Low-level feature fusion models for soccer scene classification. In: 2008 IEEE ICME (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Marco Paleari
    • 1
  • Rachid Benmokhtar
    • 1
  • Benoit Huet
    • 1
  1. 1.EURECOMSophia AntipolisFrance

Personalised recommendations