Multimodal information fusion application to human emotion recognition from face and speech

Mansoorizadeh, Muharram; Moghaddam Charkari, Nasrollah

doi:10.1007/s11042-009-0344-2

Multimodal information fusion application to human emotion recognition from face and speech

Published: 20 August 2009

Volume 49, pages 277–297, (2010)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Muharram Mansoorizadeh¹ &
Nasrollah Moghaddam Charkari¹

1968 Accesses
104 Citations
3 Altmetric
Explore all metrics

Abstract

A multimedia content is composed of several streams that carry information in audio, video or textual channels. Classification and clustering multimedia contents require extraction and combination of information from these streams. The streams constituting a multimedia content are naturally different in terms of scale, dynamics and temporal patterns. These differences make combining the information sources using classic combination techniques difficult. We propose an asynchronous feature level fusion approach that creates a unified hybrid feature space out of the individual signal measurements. The target space can be used for clustering or classification of the multimedia content. As a representative application, we used the proposed approach to recognize basic affective states from speech prosody and facial expressions. Experimental results over two audiovisual emotion databases with 42 and 12 subjects revealed that the performance of the proposed system is significantly higher than the unimodal face based and speech based systems, as well as synchronous feature level and decision level fusion approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Article Open access 13 February 2024

Notes

The term facial feature may refer to a part of the face such as eye and mouth or a quantity acquired from the face such as the distance between the two eye centers. The distinction between these concepts should be clear from the context.
see Section 1.
Tarbiat Modares University Emotion Database.

References

Bassili JN (1979) Emotion recognition: the role of facial movement and the relative importance of upper and lower areas of the face. J Pers Soc Psychol 37(11):2049–2058
Article Google Scholar
Black MJ, Yacoob Y (1997) Recognizing facial expressions in image sequences using local parameterized models of image motion. Int J Comput Vis 25:23–48
Article Google Scholar
Boehner K, DePaula R, Dourish P, Sengers P (2007) How emotion is made and measured. Int J Human Comput Stud 65:275–291
Article Google Scholar
Boersma P, Weenink D (2007) Praat: doing phonetics by computer (version 4.6.12) [computer program]
Busso C, Narayanan SS (2007) Interrelation between speech and facial gestures in emotional utterances: a single subject study. IEEE Trans Audio Speech Lang Process 15:2331–2347
Article Google Scholar
Castellano G, Kessous L, Caridakis G (2008) Emotion recognition through multiple modalities: face, body gesture, speech. In: Beale R (ed) Affect and emotion in human-computer interaction, vol 4868. Springer, New York
Google Scholar
Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor JG (2001) Emotion recognition in human-computer interaction. Signal Process Mag IEEE 18:32–80
Article Google Scholar
De Silva LC, Pei Chi N (2000) Bimodal emotion recognition. In: Proceedings of the fourth IEEE international conference on automatic face and gesture recognition, vol 1, pp 332–335
Dupont S, Luettin J (2000) Audio-visual speech modeling for continuous speech recognition. IEEE Trans Multimedia 2(3):141–151
Article Google Scholar
Ekman P (1993) Facial expression and emotion. Am Psychol 48:384–392
Article Google Scholar
Ekman P, Friesen WV, Hager JC (2002) Facial Action Coding System (FACS), the manual: a human face
Fasel B, Luettin J (1999) Automatic facial expression analysis: a survey, vol 36
Fragopanagos N, Taylor JG (2005) Emotion recognition in human-computer interaction. Neural Netw 18:389–405
Article Google Scholar
Gunes H, Piccardi M, Pantic M (2008) From the lab to the real world: affect recognition using multiple cues and modalities. In: Or J (ed) Affective computing: focus on emotion expression. InTech Education and Publishing, Vienna, pp 185–218
Google Scholar
Hager GD, Belhumeur PN (1998) Efficient region tracking with parametric models of geometry and illumination. PAMI 20:1025–1039
Google Scholar
Hall DL, Llians J (2001) Handbook of multisensor data fusion. CRC, Boca Raton
Google Scholar
Jimenez LO, Morales-Morell A, Creus A (1999) Classification of hyperdimensional data based on feature and decision fusion approaches using projection pursuit, majority voting, and neural networks. IEEE Trans Geosci Remote Sens 37(3 Part 1):1360–1366
Article Google Scholar
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502
Article Google Scholar
Mansoorizadeh M, Charkari NM (2008) Bimodal person-dependent emotion recognition: comparison of feature level and decision level information fusion. In: HCI/HRI workshop, PETRA’08
Mansoorizadeh M, Charkari NM (2009) Audiovisual emotion database in persian language (persian). In: CSI national conference, csicc 2009, CSI, Tehran
Martin O, Kotsia I, Macq B, Pitas I (2006) The enterface’05 audio-visual emotion database. In: Proc. 22nd intl. conf. on data engineering workshops (ICDEW’06)
Mehrabian A (1968) Communication without words. Psychol Today 2:53–56
Google Scholar
Paleari M, Lisetti CL (2006) Toward multimodal fusion of affective cues. In: Proceedings of the 1st ACM international workshop on human-centered multimedia. ACM, New York, pp 99–108
Chapter Google Scholar
Pantic M, Rothkrantz LJM (2000) Automatic analysis of facial expressions: the state of the art. PAMI 22:124–1445
Google Scholar
Pierre-Yves O (2003) The production and recognition of emotions in speech: features and algorithms. Int J Human-Comput Stud 59(1–2):157–183
Article Google Scholar
Ross A, Jain A (2003) Information fusion in biometrics. Pattern Recogn Lett 24(13):2115–2125
Article Google Scholar
Sadlier DA, O’Connor NE (2005) Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans Circuits Syst Video Technol 15(10):1225–1233
Article Google Scholar
Sobottka K, Pitas I (1998) A novel method for automatic face segmentation, facial feature extraction and tracking. Signal Process Image Commun 12:263–281
Article Google Scholar
Song M, You M, Li N, Chen C (1920) A robust multimodal approach for emotion recognition. Neurocomputing 71:1913–2008
Article Google Scholar
Webb A (ed) (2002) Statistical pattern recognition. Wiley, New York
MATH Google Scholar
Welch G, Bishop G (1995) An introduction to the Kalman filter, University of North Carolina at Chapel Hill, Chapel Hill
Google Scholar
Wu Y, Chang EY, Chang KCC, Smith JR (2004) Optimal multimodal fusion for multimedia data analysis. In: Proceedings of the 12th annual ACM international conference on multimedia. ACM, New York, pp 572–579
Chapter Google Scholar
Yang J, Yang J-y, Zhang D, Lu J-f (2003) Feature fusion: parallel strategy vs. serial strategy. Pattern Recogn 36(6):1369–1381
Article MATH Google Scholar
Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. PAMI 31:39–58
Google Scholar
Zhou ZH, Geng X (2004) Projection functions for eye detection. Pattern Recogn 37:1049–1056
Article MATH Google Scholar

Download references

Acknowledgements

The authors would like to express their sincere thanks to professor Kabir, professor Khaki, Mr Massoud Kimyaei and Colleagues for their valuable help and suggestions.

Author information

Authors and Affiliations

Parallel and Image Processing Lab, Faculty of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran
Muharram Mansoorizadeh & Nasrollah Moghaddam Charkari

Authors

Muharram Mansoorizadeh
View author publications
You can also search for this author in PubMed Google Scholar
Nasrollah Moghaddam Charkari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muharram Mansoorizadeh.

Additional information

This project has been supported in part by the Iran Telecommunication Research Center (ITRC) under grant no. T500/20592.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mansoorizadeh, M., Moghaddam Charkari, N. Multimodal information fusion application to human emotion recognition from face and speech. Multimed Tools Appl 49, 277–297 (2010). https://doi.org/10.1007/s11042-009-0344-2

Download citation

Published: 20 August 2009
Issue Date: August 2010
DOI: https://doi.org/10.1007/s11042-009-0344-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal information fusion application to human emotion recognition from face and speech

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multimodal information fusion application to human emotion recognition from face and speech

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation