Skip to main content

Hierarchical fusion of visual and physiological signals for emotion recognition


Emotion recognition is an attractive and essential topic in image and signal processing. In this paper, we propose a multi-level fusion method to combine visual information and physiological signals for emotion recognition. For visual information, we propose a serial fusion of two-stage features to enhance the representation of facial expression in a video sequence. We propose to integrate the Neural Aggregation Network with Convolutional Neural Network feature map to reinforce the vital emotional frames. For physiological signals, we propose a parallel fusion scheme to widen the band of the annotation of the electroencephalogram signals. We extract the frequency feature with the Linear-Frequency Cepstral Coefficients and enhance it with the signal complexity denoted by Sample Entropy (SampEn). In the classification stage, we realize both feature level and decision level fusion of both visual and physiological information. Experimental results validate the effectiveness of the proposed multi-level multi-modal feature representation method.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. Albanie, S., Nagrani, A., Vedaldi, A., & Zisserman, A. (2018). Emotion recognition in speech using cross-modal transfer in the wild. arXiv preprint arXiv:1808.05561

  2. Arriaga, O., Valdenegro-Toro, M & Plöger, P. (2017). Real-time convolutional neural networks for emotion and gender classification. arXiv preprint arXiv:1710.07557

  3. Bailenson, J. N., Pontikakis, E. D., Mauss, I. B., Gross, J. J., Jabon, M. E., Hutcherson, C. A., et al. (2008). Real-time classification of evoked emotions using facial feature tracking and physiological responses. International Journal of Human-Computer Studies, 66(5), 303–317.

    Article  Google Scholar 

  4. Chang, C. Y., Tsai, J. S., Wang, C. J., & Chung, P. C. (2009). Emotion recognition with consideration of facial expression and physiological signals. In: 2009 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 278–283, IEEE.

  5. Chaparro, V., Gomez, A., Salgado, A., Quintero, O. L., Lopez, N., & Villa, L. F. (2018). Emotion recognition from eeg and facial expressions: A multimodal approach. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 530–533, IEEE.

  6. Cohn, J. F., & Schmidt, K. (2003). The timing of facial motion in posed and spontaneous smiles. In Active Media Technology (pp. 57–69). World Scientific.

  7. Costa, M., Goldberger, A. L., & Peng, C. K. (2005). Multiscale entropy analysis of biological signals. Physical Review E, 71(2), 021906.

    MathSciNet  Article  Google Scholar 

  8. Duan, R. N., Zhu, J. Y., & Lu, B. L. (2013). Differential entropy feature for eeg-based emotion classification. In: 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER), pp. 81–84, IEEE.

  9. Ekman, P., Friesen, W. V., O’sullivan, M., Chan, A., Diacoyanni-Tarlatzis, I., Heider, K., Krause, R., LeCompte, W.A., Pitcairn, T., & Ricci-Bitti, P.E., , et al. (1987). Universals and cultural differences in the judgments of facial expressions of emotion. Journal of Personality and Social Psychology, 53(4), 712.

  10. Goodfellow, I. J., Erhan, D., Carrier, P. L., Courville, A., Mirza, M., Hamner, B., et al. (2013). Challenges in representation learning: A report on three machine learning contests. Neural Network, 64, 59–63.

    Article  Google Scholar 

  11. Gunes, H., & Schuller, B. (2013). Categorical and dimensional affect analysis in continuous input: Current trends and future directions. Image and Vision Computing, 31(2), 120–136.

    Article  Google Scholar 

  12. Guntekin, B., & Basar, E. (2010). Event-related beta oscillations are affected by emotional eliciting stimuli. Neuroscience Letters, 483(3), 173–178.

    Article  Google Scholar 

  13. Hossain, M. S., & Muhammad, G. (2019). Emotion recognition using deep learning approach from audio-visual emotional big data. Information Fusion, 49, 69–78.

    Article  Google Scholar 

  14. Huang, X., Kortelainen, J., Zhao, G., Li, X., Moilanen, A., Seppänen, T., & Pietikäinen, M. (2016). Multi-modal emotion analysis from facial expressions and electroencephalogram. Computer Vision and Image Understanding, 147, 114–124.

    Article  Google Scholar 

  15. Huang, Y., Yang, J., Liao, P., & Pan, J. (2017). Computational intelligence and neuroscience: Fusion of facial expressions and eeg for multimodal emotion recognition, 2017.

  16. Huang, Y., Yang, J., Liu, S., & Pan, J. (2019). Combining facial expressions and electroencephalography to enhance emotion recognition. Future Internet, 11(5), 105.

    Article  Google Scholar 

  17. Khalili, Z., & Moradi, M.H. (2009). Emotion recognition system using brain and peripheral signals: using correlation dimension to improve the results of eeg. In: 2009 International Joint Conference on Neural Networks, pp. 1571–1575, IEEE.

  18. Koelstra, S., Muhl, C., Soleymani, M., Lee, J. S., Yazdani, A., Ebrahimi, T., et al. (2011). Deap: A database for emotion analysis; using physiological signals. IEEE transactions on affective computing, 3(1), 18–31.

    Article  Google Scholar 

  19. Koelstra, S., & Patras, I. (2013). Fusion of facial expressions and eeg for implicit affective tagging. Image and Vision Computing, 31(2), 164–174.

    Article  Google Scholar 

  20. Kortelainen, J., Tiinanen, S., Huang, X., Li, X., Laukka, S., Pietikäinen, M., & Seppänen, T. (2012). Multimodal emotion recognition by combining physiological signals and facial expressions: a preliminary study. In: 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 5238–5241, IEEE.

  21. Li, D., Wang, Z., Wang, C., Liu, S., Chi, W., Dong, E., et al. (2019). The fusion of electroencephalography and facial expression for continuous emotion recognition. IEEE Access, 7, 155724–155736.

    Article  Google Scholar 

  22. Li, S., & Deng, W. (2018). Deep facial expression recognition: A survey. arXiv preprint arXiv:1804.08348

  23. Lin, Y. P., Wang, C. H., Jung, T. P., Wu, T. L., Jeng, S. K., Duann, J. R., & Chen, J. H. (2010). Eeg-based emotion recognition in music listening. IEEE Transactions on Biomedical Engineering, 57(7), 1798–1806.

    Article  Google Scholar 

  24. Littlewort, G., Bartlett, M.S., Fasel, I., Susskind, J., & Movellan, J. (2004). Dynamics of facial expression extracted automatically from video. In: 2004 Conference on Computer Vision and Pattern Recognition Workshop, pp. 80–80, IEEE.

  25. Liu, N., Fang, Y., Li, L., Hou, L., Yang, F., & Guo, Y. (2018). Multiple feature fusion for automatic emotion recognition using eeg signals. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 896–900, IEEE.

  26. Naab, P. J., & Russell, J. A. (2007). Judgments of emotion from spontaneous facial expressions of new guineans. Emotion, 7(4), 736.

    Article  Google Scholar 

  27. Peng, H., Xia, C., Wang, Z., Zhu, J., Zhang, X., Sun, S., et al. (2019). Multivariate pattern analysis of eeg-based functional connectivity: A study on the identification of depression. IEEE Access, 7, 92630–92641.

    Article  Google Scholar 

  28. Petrantonakis, P. C., & Hadjileontiadis, L. J. (2009). Emotion recognition from eeg using higher order crossings. IEEE Transactions on information Technology in Biomedicine, 14(2), 186–197.

    Article  Google Scholar 

  29. Poria, S., Cambria, E., Bajpai, R., & Hussain, A. (2017). A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion, 37, 98–125.

    Article  Google Scholar 

  30. Richman, J. S., & Moorman, J. R. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039–H2049.

    Article  Google Scholar 

  31. Schuller, B., Valstar, M., Eyben, F., McKeown, G., Cowie, R., & Pantic, M. (2011). Avec 2011–the first international audio/visual emotion challenge. In: International Conference on Affective Computing and Intelligent Interaction, pp. 415–424, Springer.

  32. Şen, D., & Sert, M. (2018). Continuous valence prediction using recurrent neural networks with facial expressions and eeg signals. In: 2018 26th Signal Processing and Communications Applications Conference (SIU), pp. 1–4, IEEE.

  33. Shan, C., Gong, S., & McOwan, P. W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study. Image and vision Computing, 27(6), 803–816.

    Article  Google Scholar 

  34. Shu, L., Xie, J., Yang, M., Li, Z., Li, Z., Liao, D., et al. (2018). A review of emotion recognition using physiological signals. Sensors, 18(7), 2074.

    Article  Google Scholar 

  35. Sokolov, S., Velchev, Y., Radeva, S., & Radev, D. (2017). Human emotion estimation from eeg and face using statistical features and svm. In: Proceedings of International Conference of Computer Science and Information Technology, pp. 37–47.

  36. Soleymani, M., Asghari-Esfeden, S., Fu, Y., & Pantic, M. (2015). Analysis of eeg signals and facial expressions for continuous emotion detection. IEEE Transactions on Affective Computing, 7(1), 17–28.

    Article  Google Scholar 

  37. Soleymani, M., Asghari-Esfeden, S., Pantic, M., & Fu, Y. (2014). Continuous emotion detection using eeg signals and facial expressions. In: 2014 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE.

  38. Soleymani, M., Lichtenauer, J., Pun, T., & Pantic, M. (2011). A multimodal database for affect recognition and implicit tagging. IEEE transactions on affective computing, 3(1), 42–55.

    Article  Google Scholar 

  39. Song, T., Zheng, W., Song, P., & Cui, Z. (2018). Eeg emotion recognition using dynamical graph convolutional neural networks. IEEE Transactions on Affective Computing.

  40. Valstar, M.F., Pantic, M., Ambadar, Z., & Cohn, J.F. (2006). Spontaneous vs. posed facial behavior: automatic analysis of brow actions. In: Proceedings of the 8th International Conference on Multimodal Interfaces, pp. 162–170.

  41. Walecki, R., Rudovic, O., Pavlovic, V., & Pantic, M. (2015). Variable-state latent conditional random fields for facial expression recognition and action unit detection. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 1, pp. 1–8. IEEE.

  42. Wan, S., & Aggarwal, J. (2014). Spontaneous facial expression recognition: A robust metric learning approach. Pattern Recognition, 47(5), 1859–1868.

    Article  Google Scholar 

  43. Wang, S., & Chen, Y. (2019). A joint loss function for deep face recognition. Multidimensional Systems and Signal Processing, 30(3), 1517–1530.

    MathSciNet  Article  Google Scholar 

  44. Wesley, A., Lindner, P., & Pavlidis, I. (2012). Eustressed or distressed?: Combining physiology with observation in user studies. ACM.

  45. Wöllmer, M., Kaiser, M., Eyben, F., Schuller, B., & Rigoll, G. (2013). Lstm-modeling of continuous emotions in an audiovisual affect recognition framework. Image and Vision Computing, 31(2), 153–163.

    Article  Google Scholar 

  46. Yang, J., Ren, P., Zhang, D., Chen, D., Wen, F., Li, H., & Hua, G. (2017). Neural aggregation network for video face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4362–4371.

  47. Yohanes, R.E., Ser, W., & Huang, G.b. (2012). Discrete wavelet transform coefficients for emotion recognition from eeg signals. In: 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 2251–2254. IEEE.

  48. Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2008). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE transactions on pattern analysis and machine intelligence, 31(1), 39–58.

    Article  Google Scholar 

  49. Zhang, F., Zhang, T., Mao, Q., & Xu, C. (2018). Joint pose and expression modeling for facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3359–3368.

  50. Zhang, Y., Ji, X., & Zhang, S. (2016). An approach to eeg-based emotion recognition using combined feature extraction method. Neuroscience letters, 633, 152–157.

    Article  Google Scholar 

  51. Zheng, W.L., Dong, B.N., & Lu, B.L. (2014). Multimodal emotion recognition using eeg and eye tracking data. In: 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 5040–5043. IEEE.

Download references

Author information



Corresponding author

Correspondence to Yuchun Fang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work is supported by the National Natural Science Foundation of China under Grant (No. 61976132) and the Natural Science Foundation of Shanghai under Grant (No. 19ZR1419200).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fang, Y., Rong, R. & Huang, J. Hierarchical fusion of visual and physiological signals for emotion recognition. Multidim Syst Sign Process 32, 1103–1121 (2021).

Download citation


  • Emotion recognition
  • Facial expression
  • Electroencephalogram