Emotion Recognition of a Group of People in Video Analytics Using Deep Off-the-Shelf Image Embeddings

  • Alexander V. TarasovEmail author
  • Andrey V. Savchenko
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11179)


In this paper we address the group-level emotion classification problem in video analytic systems. We propose to apply the MTCNN face detector to obtain facial regions on each video frame. Next, off-the-shelf image features are extracted from each located face using preliminary trained convolutional neural networks. The features of the whole frame are computed as a mean average of image embeddings of individual faces. The resulted frame features are recognized with an ensemble of state-of-the-art classifiers computed as a weighted sum of their outputs. Experimental results with EmotiW 2017 dataset demonstrate that the proposed approach is 2–20% more accurate when compared to the conventional group-level emotion classifiers.


Group emotion recognition Video-analytic system Convolutional neural network Face detection 



The article was prepared within the framework of the Academic Fund Program at the National Research University Higher School of Economics (HSE) in 2017 (grant №17-05-0007) and by the Russian Academic Excellence Project “5-100”.


  1. 1.
    Krakovsky, M.: Artificial (emotional) intelligence. Commun. ACM 61(4), 18–19 (2018)CrossRefGoogle Scholar
  2. 2.
    Dhall, A., Goecke, R., Ghosh, S., Joshi, J., Hoey, J., Gedeon, T.: From individual to group-level emotion recognition: EmotiW 5.0. In: 19th ACM International Conference on Multimodal Interaction (ICMI), pp. 524–528. ACM (2017)Google Scholar
  3. 3.
    Vielzeuf, V., Pateux, S., Jurie, F.: Temporal multimodal fusion for video emotion classification in the wild. In: 19th ACM International Conference on Multimodal Interaction (ICMI), pp. 569–576. ACM (2017)Google Scholar
  4. 4.
    Fan, Y., Lu, X., Li, D., Liu Y.: Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In: 18th ACM International Conference on Multimodal Interaction (ICMI), pp. 445–450. ACM (2016)Google Scholar
  5. 5.
    Surace, L., Patacchiola, M., Sönmez, E.B., Spataro, W., Cangelosi, A.: Emotion recognition in the wild using deep neural networks and Bayesian classifiers. In: 19th ACM International Conference on Multimodal Interaction (ICMI), pp. 593–597. ACM (2017)Google Scholar
  6. 6.
    Kaya, H., Gürpınar, F., Salah, A.A.: Video-based emotion recognition in the wild using deep transfer learning and score fusion. Image Vis. Comput. 65, 66–75 (2017)CrossRefGoogle Scholar
  7. 7.
    Rassadin, A., Gruzdev, A., Savchenko, A.: Group-level emotion recognition using transfer learning from face identification. In: 19th ACM International Conference on Multimodal Interaction (ICMI), pp. 544–548. ACM (2017)Google Scholar
  8. 8.
    Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)CrossRefGoogle Scholar
  9. 9.
    Savchenko, A.V., Belova, N.S., Savchenko, L.V.: Fuzzy analysis and deep convolution neural networks in still-to-video recognition. Opt. Mem. Neural Netw. (Inf. Opt.) 27(1), 23–31 (2018)CrossRefGoogle Scholar
  10. 10.
    Hu, P., Ramanan, D.: Finding tiny faces. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1522–1530. IEEE (2017)Google Scholar
  11. 11.
    Arriaga, O., Valdenegro-Toro, M., Plöger, P.: Real-time convolutional neural networks for emotion and gender classification. arXiv preprint arXiv:1710.07557 (2017)
  12. 12.
    Guo, X., Polania, L., Barner, K.: Group-level emotion recognition using deep models on image scene, faces, and skeletons. In: 19th ACM International Conference on Multimodal Interaction (ICMI), pp. 603–608. ACM (2017)Google Scholar
  13. 13.
    Rassadin, A.G., Savchenko, A.V.: Compressing deep convolutional neural networks in visual emotion recognition. In: Proceedings of the International Conference on Information Technology and Nanotechnology (ITNT). Session Image Processing, Geoinformation Technology and Information Security Image Processing (IPGTIS), CEUR-WS, vol. 1901, pp. 207–213 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.National Research University Higher School of EconomicsNizhny NovgorodRussia

Personalised recommendations