Emotion recognition using multimodal deep learning in multiple psychophysiological signals and video

Wang, Zhongmin; Zhou, Xiaoxiao; Wang, Wenlang; Liang, Chen

doi:10.1007/s13042-019-01056-8

Emotion recognition using multimodal deep learning in multiple psychophysiological signals and video

Original Article
Published: 20 January 2020

Volume 11, pages 923–934, (2020)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Wang Zhongmin^1,2,
Zhou Xiaoxiao ORCID: orcid.org/0000-0003-0870-454X¹,
Wang Wenlang^1,2 &
…
Liang Chen^1,2

1727 Accesses
38 Citations
Explore all metrics

Abstract

Emotion recognition has attracted great interest. Numerous emotion recognition approaches have been proposed, most of which focus on visual, acoustic or psychophysiological information individually. Although more recent research has considered multimodal approaches, individual modalities are often combined only by simple fusion or are directly fused with deep learning networks at the feature level. In this paper, we propose an approach to training several specialist networks that employs deep learning techniques to fuse the features of individual modalities. This approach includes a multimodal deep belief network (MDBN), which optimizes and fuses unified psychophysiological features derived from the features of multiple psychophysiological signals, a bimodal deep belief network (BDBN) that focuses on representative visual features among the features of a video stream, and another BDBN that focuses on the high multimodal features in the unified features obtained from two modalities. Experiments are conducted on the BioVid Emo DB database and 80.89% accuracy is achieved, which outperforms the state-of-the-art approaches. The results demonstrate that the proposed approach can solve the problems of feature redundancy and lack of key features caused by multimodal fusion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech Expression Multimodal Emotion Recognition Based on Deep Belief Network

Article 18 May 2021

Multimodal modelling of human emotion using sound, image and text fusion

Article 11 August 2023

Hierarchical fusion of visual and physiological signals for emotion recognition

Article 05 April 2021

References

Gogia Y, Singh E, Mohatta S et al (2010) Multi-modal affect detection for learning applications. Region 10 Conference (TENCON). https://doi.org/10.1109/TENCON.2016.7848760
Haag A, Goronzy S, Schaich P, Williams J (2004) Emotion recognition using bio-sensors: first steps towards an automatic system. Int J Comput Electric Eng 3068:36–48
Google Scholar
Busso C, Deng Z, Yildirim S et al (2005) Analysis of emotion recognition using facial expressions, speech and multimodal information. ACM Int Conf Multimodal Interfaces 38(4):205–211
Google Scholar
Poria S, Cambria E, Hussain A et al (2015) Towards an intelligent framework for multimodal affective data analysis. Neural Netw 63:104–116
Article Google Scholar
Yang Y, Wu QMJ, Zheng WL et al (2017) EEG-based emotion recognition using hierarchical network with subnetwork nodes. IEEE Trans Cogn Dev Syst 10(2):408–419
Article Google Scholar
Sun B, Xu Q, He J et al (2016) Audio-video based multimodal emotion recognition using SVMs and deep learning. In: Chinese conference on pattern recognition, pp 621–631
Google Scholar
Quiros-Ramirez MA, Onisawa T (2015) Considering cross-cultural context in the automatic recognition of emotions. Int J Mach Learn Cybern 6(1):119–127
Article Google Scholar
Corchs S, Fersini E, Gasparini F (2017) Ensemble learning on visual and textual data for social image emotion classification. Int J Mach Learn Cybern 4:1–14
Google Scholar
Bargal SA, Barsoum E, Ferrer CC et al (2016) Emotion recognition in the wild from videos using images. In: ICMI’16: Proceedings of the 18th ACM international conference on multimodal interaction, pp 433–436
Kahou SE, Bouthillier X, Lamblin P et al (2015) EmoNets: multimodal deep learning approaches for emotion recognition in video. J Multimodal User Interfaces 10(2):1–13
Google Scholar
Alarcao Soraia M, Fonseca Manuel J (2017) Emotions recognition using EEG signals: a survey. IEEE Trans Affect Comput. https://doi.org/10.1109/TAFFC.2017.2714671
Article Google Scholar
Li X, Song D, Zhang P et al (2017) Emotion recognition from multi-channel EEG data through convolutional recurrent neural network. In: 2016 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 352–359
Liu W, Zheng WL, Lu BL (2016) Emotion recognition using multimodal deep learning. In: Hirose A, Ozawa S, Doya K, Ikeda K, Lee M, Liu D (eds) Neural information processing, pp 521–529
Chapter Google Scholar
Nguyen D, Nguyen K, Sridharan S et al (2017) Deep spatio-temporal features for multimodal emotion recognition. In: 2017 IEEE winter conference on applications of computer vision, pp 1215–1223
Valstar M, Gratch J, Ringeval F et al (2016) AVEC 2016: depression, mood, and emotion recognition workshop and challenge. In: AVEC’16: proceedings of the 6th international workshop on audio/visual emotion challenge, pp 3–110
Poria S, Chaturvedi I, Cambria E et al (2016) Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: Bonchi F, Domingo-Ferrer J, Baeza-Yates R, Zhou Z-H, Wu X (eds) Proceedings - IEEE 16th international conference on data mining, pp 439–448
Soleymani M et al (2012) DEAP: a database for emotion analysis using physiological signals. IEEE Trans Affect Comput 3(1):18–31
Article Google Scholar
Zhang L, Walter S, Ma X et al (2017) “BioVid Emo DB”: a multimodal database for emotion analyses validated by subjective ratings. Comput Intell. https://doi.org/10.1109/SSCI.2016.7849931
Article Google Scholar
Bengio Y (2009) Learning deep architectures for AI. Now Publishers Inc, Boston, pp 121–150
Book Google Scholar
Li W, Abtahi F, Zhu Z (2015) A deep feature based multi-kernel learning approach for video emotion recognition. ACM Int Conf Multimodal Interact:483–490
Kahou SE, Michalski V, Konda K et al (2015) Recurrent neural networks for emotion recognition in video. ACM Int Conf Multimodal Interact:467–474
Ranganathan H, Chakraborty S, Panchanathan S (2016) Multimodal emotion recognition using deep learning architectures. Appl Comput Vision:1–9
Szegedy C, Ioffe S, Vanhoucke V et al (2016) Inception-v4, Inception-ResNet and the impact of residual connections on learning. AAAI Conference on Artificial Intelligence. arXiv:1602.07261
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by the grants from the National Natural Science Foundation of China (Grant No. 61373116), the Shaanxi Science and Technology Co-ordinate Innovation Project (Grant No. 2016KTZDGY04-01), General projects in the industrial field of Shaanxi (Grant No. 2018GY-013), Xianyang science and technology project (Grant No. 2017k01-25-2) and the Special Research Fund of Shaanxi Educational Committee (Grant No. 16JK1706).

Author information

Authors and Affiliations

School of Computer Science and Technology, Xi’an University of Posts and Telecommunications, No. 618 West Chang’an Street, Chang’an District, Xi’an, 710121, Shaanxi, China
Wang Zhongmin, Zhou Xiaoxiao, Wang Wenlang & Liang Chen
Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi’an University of Posts and Telecommunications, No. 618 West Chang’an Street, Chang’an District, Xi’an, 710121, Shaanxi, China
Wang Zhongmin, Wang Wenlang & Liang Chen

Authors

Wang Zhongmin
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Xiaoxiao
View author publications
You can also search for this author in PubMed Google Scholar
Wang Wenlang
View author publications
You can also search for this author in PubMed Google Scholar
Liang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wang Zhongmin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Z., Zhou, X., Wang, W. et al. Emotion recognition using multimodal deep learning in multiple psychophysiological signals and video. Int. J. Mach. Learn. & Cyber. 11, 923–934 (2020). https://doi.org/10.1007/s13042-019-01056-8

Download citation

Received: 16 July 2018
Accepted: 21 December 2019
Published: 20 January 2020
Issue Date: April 2020
DOI: https://doi.org/10.1007/s13042-019-01056-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Emotion recognition using multimodal deep learning in multiple psychophysiological signals and video

Abstract

Access this article

Similar content being viewed by others

Speech Expression Multimodal Emotion Recognition Based on Deep Belief Network

Multimodal modelling of human emotion using sound, image and text fusion

Hierarchical fusion of visual and physiological signals for emotion recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Emotion recognition using multimodal deep learning in multiple psychophysiological signals and video

Abstract

Access this article

Similar content being viewed by others

Speech Expression Multimodal Emotion Recognition Based on Deep Belief Network

Multimodal modelling of human emotion using sound, image and text fusion

Hierarchical fusion of visual and physiological signals for emotion recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation