Deep Learning Based Video Spatio-Temporal Modeling for Emotion Recognition

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10901)


Affective Computing is a growing research area, which aims to determine the emotional user states through their conscious and unconscious actions and use it to modify the machine interaction. This paper investigates the discriminative abilities of convolutional and recurrent neural networks to modeling spatio-temporal features from video sequences of the face region. In a deep learning architecture, dense convolutional layers are used for analyzing spatial information changes in frames during short time periods, while dense recurrent layers are used to model changes in frames as temporal sequences that change across the time. Those layers are then connected to a multilayer perceptron (MLP) to perform the classification task, which consists in to distinguish between six different emotion categories. The performance was twofold evaluated: gender independent and gender-dependent classifications. Experimental results show that the proposed approach achieves an accuracy of \(81.84\%\), in the gender independent experiment, which outperforms previous works using the same experimental data. In the gender-dependent experiment, accuracy was \(80.79\%\) and \(82.75\%\) for male and female, respectively.


Deep learning Facial emotion recognition Spatio-temporal modeling 


  1. 1.
    Blascovich, J., Bailenson, J.: Infinite reality: Avatars, Eternal Life, New Worlds, and the Dawn of the Virtual Revolution. William Morrow & Co., New York (2011)Google Scholar
  2. 2.
    Balducci, F., Grana, C., Cucchiara, R.: Affective level design for a role-playing videogame evaluated by a brain-computer interface and machine learning methods. Visual Comput. 33(4), 413–427 (2017)CrossRefGoogle Scholar
  3. 3.
    Bartsch, A., Hartmann, T.: The role of cognitive and affective challenge in entertainment experience. Commun. Res. 44(1), 29–53 (2017)CrossRefGoogle Scholar
  4. 4.
    Corneanu, C.A., Simón, M.O., Cohn, J.F., Guerrero, S.E.: Survey on rgb, 3D, thermal, and multimodal approaches for facial expression recognition: history, trends, and affect-related applications. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1548–1568 (2016)CrossRefGoogle Scholar
  5. 5.
    Zhou, X., Shen, W.: Research on interactive device ergonomics designed for elderly users in the human-computer interaction. Int. J. Smart Home 10(2), 49–62 (2016)CrossRefGoogle Scholar
  6. 6.
    Bernal, G., Maes, P.: Emotional beasts: visually expressing emotions through avatars in VR. In: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pp. 2395–2402. ACM (2017)Google Scholar
  7. 7.
    Yan, J., Zheng, W., Xu, Q., Lu, G., Li, H., Wang, B.: Sparse kernel reduced-rank regression for bimodal emotion recognition from facial expression and speech. IEEE Trans. Multimed. 18(7), 1319–1329 (2016)CrossRefGoogle Scholar
  8. 8.
    Mavridou, I., McGhee, J.T., Hamedi, M., Fatoorechi, M., Cleal, A., Ballaguer-Balester, E., Seiss, E., Cox, G., Nduka, C.: FACETEQ interface demo for emotion expression in VR. In: 2017 IEEE Virtual Reality (VR), pp. 441–442. IEEE (2017)Google Scholar
  9. 9.
    Bekele, E., Bian, D., Peterman, J., Park, S., Sarkar, N.: Design of a virtual reality system for affect analysis in facial expressions (VR-saafe); application to schizophrenia. IEEE Trans. Neural Syst. Rehabil. Eng. 25(6), 739–749 (2017)CrossRefGoogle Scholar
  10. 10.
    Marrero-Fernández, P., Montoya-Padrón, A., i Capó, A.J., Rubio, J.M.B.: Evaluating the research in automatic emotion recognition. IETE Tech. Rev. 31(3), 220–232 (2014)CrossRefGoogle Scholar
  11. 11.
    Goyal, S.J., Upadhyay, A.K., Jadon, R.S., Goyal, R.: Real-life facial expression recognition systems: a review. In: Satapathy, S.C., Bhateja, V., Das, S. (eds.) Smart Computing and Informatics. SIST, vol. 77, pp. 311–331. Springer, Singapore (2018). Scholar
  12. 12.
    Lien, J.J., Kanade, T., Cohn, J.F., Li, C.C.: Automated facial expression recognition based on FACS action units. In: 1998 Third IEEE International Conference on Automatic Face and Gesture Recognition, Proceedings, pp. 390–395. IEEE (1998)Google Scholar
  13. 13.
    Cheng, F., Yu, J., Xiong, H.: Facial expression recognition in jaffe dataset based on gaussian process classification. IEEE Trans. Neural Netw. 21(10), 1685–1690 (2010)CrossRefGoogle Scholar
  14. 14.
    Ji, Q., Moeslund, T.B., Hua, G., Nasrollahi, K. (eds.): FFER 2014. LNCS, vol. 8912. Springer, Cham (2015). Scholar
  15. 15.
    Fasel, B., Luettin, J.: Automatic facial expression analysis: a survey. Pattern Recogn. 36(1), 259–275 (2003)CrossRefGoogle Scholar
  16. 16.
    Deldjoo, Y., Elahi, M., Cremonesi, P., Garzotto, F., Piazzolla, P., Quadrana, M.: Content-based video recommendation system based on stylistic visual features. J. Data Seman. 5(2), 99–113 (2016)CrossRefGoogle Scholar
  17. 17.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)Google Scholar
  18. 18.
    Poria, S., Cambria, E., Bajpai, R., Hussain, A.: A review of affective computing: from unimodal analysis to multimodal fusion. Inf. Fusion 37, 98–125 (2017)CrossRefGoogle Scholar
  19. 19.
    Wang, S.H., Phillips, P., Dong, Z.C., Zhang, Y.D.: Intelligent facial emotion recognition based on stationary wavelet entropy and Jaya algorithm. Neurocomputing (2017)Google Scholar
  20. 20.
    Yan, H.: Collaborative discriminative multi-metric learning for facial expression recognition in video. Pattern Recogn. (2017)Google Scholar
  21. 21.
    Mühl, C., Allison, B., Nijholt, A., Chanel, G.: A survey of affective brain computer interfaces: principles, state-of-the-art, and challenges. Brain-Comput. Interfaces 1(2), 66–84 (2014)CrossRefGoogle Scholar
  22. 22.
    Wang, S., Ji, Q.: Video affective content analysis: a survey of state-of-the-art methods. IEEE Trans. Affect. Comput. 6(4), 410–430 (2015)CrossRefGoogle Scholar
  23. 23.
    Dobrišek, S., Gajšek, R., Mihelič, F., Pavešić, N., Štruc, V.: Towards efficient multi-modal emotion recognition. Int. J. Adv. Rob. Syst. 10(1), 53 (2013)CrossRefGoogle Scholar
  24. 24.
    Zhalehpour, S., Akhtar, Z., Erdem, C.E.: Multimodal emotion recognition based on peak frame selection from video. Signal Image Video Process. 10(5), 827–834 (2016)CrossRefGoogle Scholar
  25. 25.
    Poria, S., Cambria, E., Hussain, A., Huang, G.B.: Towards an intelligent framework for multimodal affective data analysis. Neural Netw. 63, 104–116 (2015)CrossRefGoogle Scholar
  26. 26.
    Rashid, M., Abu-Bakar, S., Mokji, M.: Human emotion recognition from videos using spatio-temporal and audio features. Visual Comput. 29(12), 1269–1275 (2013)CrossRefGoogle Scholar
  27. 27.
    Huang, K.C., Huang, S.Y., Kuo, Y.H.: Emotion recognition based on a novel triangular facial feature extraction method. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–6. IEEE (2010)Google Scholar
  28. 28.
    Hossain, M.S., Muhammad, G., Alhamid, M.F., Song, B., Al-Mutib, K.: Audio-visual emotion recognition using big data towards 5G. Mobile Netw. Appl. 21(5), 753–763 (2016)CrossRefGoogle Scholar
  29. 29.
    Martin, O., Kotsia, I., Macq, B., Pitas, I.: The eNTERFACE’05 audio-visual emotion database. In: 2006 22nd International Conference on Data Engineering Workshops, Proceedings, p. 8. IEEE (2006)Google Scholar
  30. 30.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)Google Scholar
  31. 31.
    Jarrett, K., Kavukcuoglu, K., LeCun, Y., et al.: What is the best multi-stage architecture for object recognition? In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2146–2153. IEEE (2009)Google Scholar
  32. 32.
    Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8609–8613. IEEE (2013)Google Scholar
  33. 33.
    Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
  34. 34.
    Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  35. 35.
    Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw Mach. Learn. 4(2), 26–31 (2012)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Instituto Tecnológico MetropolitanoMedellínColombia

Personalised recommendations