DeepVANet: A Deep End-to-End Network for Multi-modal Emotion Recognition

Zhang, Yuhao; Hossain, Md Zakir; Rahman, Shafin

doi:10.1007/978-3-030-85613-7_16

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12934))

Included in the following conference series:

IFIP Conference on Human-Computer Interaction

2598 Accesses
11 Citations

Abstract

Human facial expressions and bio-signals (e.g., electroencephalogram and electrocardiogram) play a vital role in emotion recognition. Recent approaches employ both vision-based and bio-sensing data to design multi-modal recognition systems. However, these approaches require tremendous domain-specific knowledge, complex pre-processing steps and fail to take full advantage of the end-to-end nature of deep learning techniques. This paper proposes a deep end-to-end framework, DeepVANet, for multi-modal valence-arousal-based emotion recognition that applies deep learning methods to extract face appearance features and bio-sensing features. We use convolutional long short-term memory (ConvLSTM) techniques in face appearance feature extraction to capture spatial and temporal information from face image sequences. Unlike conventional time or frequency domain features (e.g., spectral power and average signal intensity), we use a 1D convolutional neural network (Conv1D) to learn bio-sensing features automatically. In experiments, we evaluate our method using DEAP and MAHNOB-HCI datasets. Our proposed multi-modal framework successfully outperforms both single- and multi-modal methods achieving superior performance compared to state-of-the-art approaches and reaches as high as 99.22% correctness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Code and evaluation are available at: https://github.com/geekdanielz/DeepVANet.

References

Alhagry, S., Fahmy, A.A., El-Khoribi, R.A.: Emotion recognition based on EEG using LSTM recurrent neural network. Emotion 8(10), 355–358 (2017)
Google Scholar
Anubhav, Nath, D., Singh, M., Sethia, D., Kalra, D., Indu, S.: An efficient approach to EEG-based emotion recognition using LSTM network. In: IEEE International Colloquium on Signal Processing & Its Applications (CSPA), pp. 88–92 (2020)
Google Scholar
Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). In: International Conference on Computer Vision (2017)
Google Scholar
Delorme, A., Makeig, S.: EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 134(1), 9–21 (2004)
Article Google Scholar
Dzedzickis, A., Kaklauskas, A., Bucinskas, V.: Human emotion recognition: review of sensors and methods. Sensors 20(3), 592 (2020)
Article Google Scholar
Ekman, P.: Are there basic emotions? Psychol. Rev. 99, 550–553 (1992)
Article Google Scholar
Freund, Y., Schapire, R.E., et al.: Experiments with a new boosting algorithm. In: ICML, vol. 96, pp. 148–156 (1996)
Google Scholar
Ghimire, D., Lee, J.: Geometric feature-based facial expression recognition in image sequences using multi-class adaboost and support vector machines. Sensors 13(6), 7714–7734 (2013)
Article Google Scholar
Haddad, J., Lezoray, O., Hamel, P.: 3D-CNN for facial emotion recognition in videos. In: Bebis, G., et al. (eds.) ISVC 2020. LNCS, vol. 12510, pp. 298–309. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64559-5_23
Chapter Google Scholar
Happy, S., George, A., Routray, A.: A real time facial expression classification system using local binary patterns. In: 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), pp. 1–5. IEEE (2012)
Google Scholar
Huang, J., Li, Y., Tao, J., Lian, Z., Yi, J.: End-to-end continuous emotion recognition from video using 3D ConvLSTM networks. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6837–6841 (2018)
Google Scholar
Huang, Y., Yang, J., Liu, S., Pan, J.: Combining facial expressions and electroencephalography to enhance emotion recognition. Future Internet 11(5), 105 (2019)
Article Google Scholar
Jinliang, G., Fang, F., Wang, W., Ren, F.: EEG emotion recognition based on granger causality and CapsNet neural network. In: International Conference on Cloud Computing and Intelligence Systems (CCIS), pp. 47–52 (2018)
Google Scholar
Kahou, S.E., Michalski, V., Konda, K., Memisevic, R., Pal, C.: Recurrent neural networks for emotion recognition in video. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 467–474 (2015)
Google Scholar
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Ko, B.C.: A brief review of facial emotion recognition based on visual information. Sensors 18(2), 401 (2018)
Article Google Scholar
Koelstra, S., et al.: DEAP: a database for emotion analysis using physiological signals. IEEE Trans. Affect. Comput. 3(1), 18–31 (2012)
Article Google Scholar
Koelstra, S., Patras, I.: Fusion of facial expressions and EEG for implicit affective tagging. Image Vis. Comput. 31(2), 164–174 (2013)
Article Google Scholar
Kossaifi, J., Tzimiropoulos, G., Todorovic, S., Pantic, M.: AFEW-VA database for valence and arousal estimation in-the-wild. Image Vis. Comput. 65, 23–36 (2017)
Article Google Scholar
Kumar, P., Happy, S., Routray, A.: A real-time robust facial expression recognition system using HOG features. In: 2016 International Conference on Computing, Analytics and Security Trends (CAST), pp. 289–293. IEEE (2016)
Google Scholar
Lee, J., Kim, S., Kim, S., Park, J., Sohn, K.: Context-aware emotion recognition networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10143–10152 (2019)
Google Scholar
Liu, W., Zheng, W.-L., Lu, B.-L.: Emotion recognition using multimodal deep learning. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9948, pp. 521–529. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46672-9_58
Chapter Google Scholar
Nie, W., Ren, M., Nie, J., Zhao, S.: C-GCN: correlation based graph convolutional network for audio-video emotion recognition. IEEE Trans. Multimed. 1 (2020)
Google Scholar
Panoulas, K.J., Hadjileontiadis, L.J., Panas, S.M.: Brain-computer interface (BCI): types, processing perspectives and applications. In: Tsihrintzis, G.A., Jain, L.C. (eds.) Multimedia Services in Intelligent Environments. Smart Innovation, Systems and Technologies, vol. 3, pp. 299–321. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13396-1_14
Chapter Google Scholar
Salama, E.S., El-Khoribi, R.A., Shoman, M.E., Shalaby, M.A.W.: A 3D-convolutional neural network framework with ensemble learning techniques for multi-modal emotion recognition. Egypt. Inform. J. 22, 167–176 (2020)
Article Google Scholar
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, vol. 28, pp. 802–810 (2015)
Google Scholar
Shu, L., et al.: A review of emotion recognition using physiological signals. Sensors 18(7), 2074 (2018)
Article Google Scholar
Siddharth, J., T.P., Sejnowski, T.J.: Utilizing deep learning towards multi-modal bio-sensing and vision-based affective computing. IEEE Trans. Affect. Comput. (2019)
Google Scholar
Soleymani, M., Lichtenauer, J., Pun, T., Pantic, M.: A multimodal database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 3(1), 42–55 (2012)
Article Google Scholar
Tang, H., Liu, W., Zheng, W.L., Lu, B.L.: Multimodal emotion recognition using deep neural networks. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds.) ICONIP 2017. LNCS, vol. 10637, pp. 811–819. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70093-9_86
Chapter Google Scholar
Torres, E.P., Torres, E.A., Hernandez-Alvarez, M., Yoo, S.G.: EEG-based BCI emotion recognition: a survey. Sensors 20(18), 5083 (2020)
Article Google Scholar
Ullah, I., Hussain, M., Aboalsamh, H., et al.: An automated system for epilepsy detection using EEG brain signals based on deep learning approach. Expert Syst. Appl. 107, 61–71 (2018)
Article Google Scholar
Wiem, M.B.H., Lachiri, Z.: Emotion classification in arousal valence model using MAHNOB-HCI database. Int. J. Adv. Comput. Sci. Appl. 8(3), 1–6 (2017)
Google Scholar
Yan, J., Zheng, W., Xu, Q., Lu, G., Li, H., Wang, B.: Sparse kernel reduced-rank regression for bimodal emotion recognition from facial expression and speech. IEEE Trans. Multimed. 18(7), 1319–1329 (2016)
Article Google Scholar
Yang, Y., Hossain, M.Z., Gedeon, T., Rahman, S.: RealSmileNet: a deep end-to-end network for spontaneous and posed smile recognition. In: Ishikawa, H., Liu, C.-L., Pajdla, T., Shi, J. (eds.) ACCV 2020. LNCS, vol. 12626, pp. 21–37. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69541-5_2
Chapter Google Scholar
Yang, Y., Wu, Q., Qiu, M., Wang, Y., Chen, X.: Emotion recognition from multi-channel EEG through parallel convolutional recurrent neural network. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2018)
Google Scholar
Zeng, Z., et al.: Audio-visual affect recognition. IEEE Trans. Multimed. 9(2), 424–428 (2007)
Article Google Scholar
Zhang, H.: Expression-EEG based collaborative multimodal emotion recognition using deep autoencoder. IEEE Access 8, 164130–164143 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

The Australian National University, Canberra, ACT, 2601, Australia
Yuhao Zhang & Md Zakir Hossain
CSIRO Agriculture and Food, Black Mountain, Canberra, ACT, 2601, Australia
Md Zakir Hossain
North South University, Dhaka, Bangladesh
Shafin Rahman

Authors

Yuhao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Md Zakir Hossain
View author publications
You can also search for this author in PubMed Google Scholar
Shafin Rahman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shafin Rahman .

Editor information

Editors and Affiliations

Department of Electrical and Information Engineering, Polytechnic University of Bari, Bari, Italy
Carmelo Ardito
Computer Science Department, University of Bari Aldo Moro, Bari, Italy
Rosa Lanzilotti
Computer Science Department, University of Pisa, Pisa, Italy
Alessio Malizia
Department of Computer Science, University of York, York, UK
Helen Petrie
Computer Science Department, University of Bari Aldo Moro, Bari, Italy
Antonio Piccinno
Computer Science Department, University of Bari Aldo Moro, Bari, Italy
Giuseppe Desolda
Microsoft Research, Redmond, WA, USA
Kori Inkpen

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 874 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Hossain, M.Z., Rahman, S. (2021). DeepVANet: A Deep End-to-End Network for Multi-modal Emotion Recognition. In: Ardito, C., et al. Human-Computer Interaction – INTERACT 2021. INTERACT 2021. Lecture Notes in Computer Science(), vol 12934. Springer, Cham. https://doi.org/10.1007/978-3-030-85613-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-85613-7_16
Published: 26 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85612-0
Online ISBN: 978-3-030-85613-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)