Skip to main content

DeepVANet: A Deep End-to-End Network for Multi-modal Emotion Recognition

  • Conference paper
  • First Online:
Human-Computer Interaction – INTERACT 2021 (INTERACT 2021)

Abstract

Human facial expressions and bio-signals (e.g., electroencephalogram and electrocardiogram) play a vital role in emotion recognition. Recent approaches employ both vision-based and bio-sensing data to design multi-modal recognition systems. However, these approaches require tremendous domain-specific knowledge, complex pre-processing steps and fail to take full advantage of the end-to-end nature of deep learning techniques. This paper proposes a deep end-to-end framework, DeepVANet, for multi-modal valence-arousal-based emotion recognition that applies deep learning methods to extract face appearance features and bio-sensing features. We use convolutional long short-term memory (ConvLSTM) techniques in face appearance feature extraction to capture spatial and temporal information from face image sequences. Unlike conventional time or frequency domain features (e.g., spectral power and average signal intensity), we use a 1D convolutional neural network (Conv1D) to learn bio-sensing features automatically. In experiments, we evaluate our method using DEAP and MAHNOB-HCI datasets. Our proposed multi-modal framework successfully outperforms both single- and multi-modal methods achieving superior performance compared to state-of-the-art approaches and reaches as high as 99.22% correctness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Code and evaluation are available at: https://github.com/geekdanielz/DeepVANet.

References

  1. Alhagry, S., Fahmy, A.A., El-Khoribi, R.A.: Emotion recognition based on EEG using LSTM recurrent neural network. Emotion 8(10), 355–358 (2017)

    Google Scholar 

  2. Anubhav, Nath, D., Singh, M., Sethia, D., Kalra, D., Indu, S.: An efficient approach to EEG-based emotion recognition using LSTM network. In: IEEE International Colloquium on Signal Processing & Its Applications (CSPA), pp. 88–92 (2020)

    Google Scholar 

  3. Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). In: International Conference on Computer Vision (2017)

    Google Scholar 

  4. Delorme, A., Makeig, S.: EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 134(1), 9–21 (2004)

    Article  Google Scholar 

  5. Dzedzickis, A., Kaklauskas, A., Bucinskas, V.: Human emotion recognition: review of sensors and methods. Sensors 20(3), 592 (2020)

    Article  Google Scholar 

  6. Ekman, P.: Are there basic emotions? Psychol. Rev. 99, 550–553 (1992)

    Article  Google Scholar 

  7. Freund, Y., Schapire, R.E., et al.: Experiments with a new boosting algorithm. In: ICML, vol. 96, pp. 148–156 (1996)

    Google Scholar 

  8. Ghimire, D., Lee, J.: Geometric feature-based facial expression recognition in image sequences using multi-class adaboost and support vector machines. Sensors 13(6), 7714–7734 (2013)

    Article  Google Scholar 

  9. Haddad, J., Lezoray, O., Hamel, P.: 3D-CNN for facial emotion recognition in videos. In: Bebis, G., et al. (eds.) ISVC 2020. LNCS, vol. 12510, pp. 298–309. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64559-5_23

    Chapter  Google Scholar 

  10. Happy, S., George, A., Routray, A.: A real time facial expression classification system using local binary patterns. In: 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), pp. 1–5. IEEE (2012)

    Google Scholar 

  11. Huang, J., Li, Y., Tao, J., Lian, Z., Yi, J.: End-to-end continuous emotion recognition from video using 3D ConvLSTM networks. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6837–6841 (2018)

    Google Scholar 

  12. Huang, Y., Yang, J., Liu, S., Pan, J.: Combining facial expressions and electroencephalography to enhance emotion recognition. Future Internet 11(5), 105 (2019)

    Article  Google Scholar 

  13. Jinliang, G., Fang, F., Wang, W., Ren, F.: EEG emotion recognition based on granger causality and CapsNet neural network. In: International Conference on Cloud Computing and Intelligence Systems (CCIS), pp. 47–52 (2018)

    Google Scholar 

  14. Kahou, S.E., Michalski, V., Konda, K., Memisevic, R., Pal, C.: Recurrent neural networks for emotion recognition in video. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 467–474 (2015)

    Google Scholar 

  15. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  16. Ko, B.C.: A brief review of facial emotion recognition based on visual information. Sensors 18(2), 401 (2018)

    Article  Google Scholar 

  17. Koelstra, S., et al.: DEAP: a database for emotion analysis using physiological signals. IEEE Trans. Affect. Comput. 3(1), 18–31 (2012)

    Article  Google Scholar 

  18. Koelstra, S., Patras, I.: Fusion of facial expressions and EEG for implicit affective tagging. Image Vis. Comput. 31(2), 164–174 (2013)

    Article  Google Scholar 

  19. Kossaifi, J., Tzimiropoulos, G., Todorovic, S., Pantic, M.: AFEW-VA database for valence and arousal estimation in-the-wild. Image Vis. Comput. 65, 23–36 (2017)

    Article  Google Scholar 

  20. Kumar, P., Happy, S., Routray, A.: A real-time robust facial expression recognition system using HOG features. In: 2016 International Conference on Computing, Analytics and Security Trends (CAST), pp. 289–293. IEEE (2016)

    Google Scholar 

  21. Lee, J., Kim, S., Kim, S., Park, J., Sohn, K.: Context-aware emotion recognition networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10143–10152 (2019)

    Google Scholar 

  22. Liu, W., Zheng, W.-L., Lu, B.-L.: Emotion recognition using multimodal deep learning. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9948, pp. 521–529. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46672-9_58

    Chapter  Google Scholar 

  23. Nie, W., Ren, M., Nie, J., Zhao, S.: C-GCN: correlation based graph convolutional network for audio-video emotion recognition. IEEE Trans. Multimed. 1 (2020)

    Google Scholar 

  24. Panoulas, K.J., Hadjileontiadis, L.J., Panas, S.M.: Brain-computer interface (BCI): types, processing perspectives and applications. In: Tsihrintzis, G.A., Jain, L.C. (eds.) Multimedia Services in Intelligent Environments. Smart Innovation, Systems and Technologies, vol. 3, pp. 299–321. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13396-1_14

    Chapter  Google Scholar 

  25. Salama, E.S., El-Khoribi, R.A., Shoman, M.E., Shalaby, M.A.W.: A 3D-convolutional neural network framework with ensemble learning techniques for multi-modal emotion recognition. Egypt. Inform. J. 22, 167–176 (2020)

    Article  Google Scholar 

  26. Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, vol. 28, pp. 802–810 (2015)

    Google Scholar 

  27. Shu, L., et al.: A review of emotion recognition using physiological signals. Sensors 18(7), 2074 (2018)

    Article  Google Scholar 

  28. Siddharth, J., T.P., Sejnowski, T.J.: Utilizing deep learning towards multi-modal bio-sensing and vision-based affective computing. IEEE Trans. Affect. Comput. (2019)

    Google Scholar 

  29. Soleymani, M., Lichtenauer, J., Pun, T., Pantic, M.: A multimodal database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 3(1), 42–55 (2012)

    Article  Google Scholar 

  30. Tang, H., Liu, W., Zheng, W.L., Lu, B.L.: Multimodal emotion recognition using deep neural networks. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds.) ICONIP 2017. LNCS, vol. 10637, pp. 811–819. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70093-9_86

    Chapter  Google Scholar 

  31. Torres, E.P., Torres, E.A., Hernandez-Alvarez, M., Yoo, S.G.: EEG-based BCI emotion recognition: a survey. Sensors 20(18), 5083 (2020)

    Article  Google Scholar 

  32. Ullah, I., Hussain, M., Aboalsamh, H., et al.: An automated system for epilepsy detection using EEG brain signals based on deep learning approach. Expert Syst. Appl. 107, 61–71 (2018)

    Article  Google Scholar 

  33. Wiem, M.B.H., Lachiri, Z.: Emotion classification in arousal valence model using MAHNOB-HCI database. Int. J. Adv. Comput. Sci. Appl. 8(3), 1–6 (2017)

    Google Scholar 

  34. Yan, J., Zheng, W., Xu, Q., Lu, G., Li, H., Wang, B.: Sparse kernel reduced-rank regression for bimodal emotion recognition from facial expression and speech. IEEE Trans. Multimed. 18(7), 1319–1329 (2016)

    Article  Google Scholar 

  35. Yang, Y., Hossain, M.Z., Gedeon, T., Rahman, S.: RealSmileNet: a deep end-to-end network for spontaneous and posed smile recognition. In: Ishikawa, H., Liu, C.-L., Pajdla, T., Shi, J. (eds.) ACCV 2020. LNCS, vol. 12626, pp. 21–37. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69541-5_2

    Chapter  Google Scholar 

  36. Yang, Y., Wu, Q., Qiu, M., Wang, Y., Chen, X.: Emotion recognition from multi-channel EEG through parallel convolutional recurrent neural network. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2018)

    Google Scholar 

  37. Zeng, Z., et al.: Audio-visual affect recognition. IEEE Trans. Multimed. 9(2), 424–428 (2007)

    Article  Google Scholar 

  38. Zhang, H.: Expression-EEG based collaborative multimodal emotion recognition using deep autoencoder. IEEE Access 8, 164130–164143 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shafin Rahman .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 874 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Y., Hossain, M.Z., Rahman, S. (2021). DeepVANet: A Deep End-to-End Network for Multi-modal Emotion Recognition. In: Ardito, C., et al. Human-Computer Interaction – INTERACT 2021. INTERACT 2021. Lecture Notes in Computer Science(), vol 12934. Springer, Cham. https://doi.org/10.1007/978-3-030-85613-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85613-7_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85612-0

  • Online ISBN: 978-3-030-85613-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics