Neural Computing and Applications

, Volume 31, Issue 12, pp 9061–9072 | Cite as

Emotional sentiment analysis for a group of people based on transfer learning with a multi-modal system

  • Vivek Singh BawaEmail author
  • Vinay Kumar
Original Article


Identifying emotional sentiment projected in an image is a tedious task, considering the fact that sentiment represented by an image could depend on a very diverse set of factors. This paper presents a novel approach to predict the emotional sentiment of a group of people in a variety of environments. The proposed technique uses local facial features of subjects along with global scene features to estimate the type of emotional sentiment in group-level emotion recognition. Two separate convolutional neural networks based on different architectures are designed to predict group-level emotions into three categories: negative, neutral and positive. The first convolutional neural network referred as Scene-model, learns the global features in data. A novel partial fine-tuning process is proposed to train the model on task-specific data. The second convolutional model referred as Face-model is trained on facial expression datasets to learn the emotional status of subjects in an image. Joint distribution of the global (scene) and local (face) features is modeled using long short-term memory networks. This joint distribution is converted into class scores using softmax regression-based model.


Group emotion analysis Convolutional neural networks Long short-term memory Image-based sentiment analysis Facial expression analysis 


  1. 1.
    Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, et al (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467
  2. 2.
    Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In: European conference on computer vision, Springer, Berlin, pp 404–417Google Scholar
  3. 3.
    Borth D, Chen T, Ji R, Chang SF (2013) Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In: Proceedings of the 21st ACM international conference on multimedia, ACM, pp 459–460Google Scholar
  4. 4.
    Bradski G, Kaehler A (2000) Opencv. Dr. Dobbs journal of software toolsGoogle Scholar
  5. 5.
    Calonder M, Lepetit V, Strecha C, Fua P (2010) Brief: binary robust independent elementary features. In: European conference on computer vision, Springer, Berlin, pp 778–792Google Scholar
  6. 6.
    Chollet F (2016) Xception: deep learning with depthwise separable convolutions. arXiv preprint arXiv:1610.02357
  7. 7.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition, CVPR 2005, vol 1, IEEE Computer Society conference, IEEE, pp 886–893Google Scholar
  8. 8.
    Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, CVPR 2009, pp 248–255Google Scholar
  9. 9.
    Dhall A, Goecke R, Ghosh S, Joshi J, Hoey J, Gedeon T (2017) From individual to group-level emotion recognition: EmotiW 5.0. In: Proceedings of the 19th ACM international conference on multimodal interaction, ACM, pp 524–528Google Scholar
  10. 10.
    Dhall A, Joshi J, Sikka K, Goecke R, Sebe N (2015) The more the merrier: analysing the affect of a group of people in images. In: 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG), vol 1, IEEE, pp 1–8Google Scholar
  11. 11.
    Dhall A, Ramana Murthy O, Goecke R, Joshi J, Gedeon T (2015) Video and image based emotion recognition challenges in the wild: Emotiw 2015. In: Proceedings of the 2015 ACM on international conference on multimodal interaction, ACM, pp. 423–426Google Scholar
  12. 12.
    Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee DH et al (2015) Challenges in representation learning: a report on three machine learning contests. Neural Netw 64:59–63CrossRefGoogle Scholar
  13. 13.
    Guo Z, Zhang L, Zhang D (2010) A completed modeling of local binary pattern operator for texture classification. IEEE Trans Image Process 19(6):1657–1663MathSciNetCrossRefGoogle Scholar
  14. 14.
    Kalchbrenner N, Danihelka I, Graves A (2015) Grid long short-term memory. arXiv preprint arXiv:1507.01526
  15. 15.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Null, vol 2, IEEE, pp 2169–2178Google Scholar
  16. 16.
    Li J, Roy S, Feng J, Sim T (2016) Happiness level prediction with sequential inputs via multiple regressions. In: Proceedings of the 18th ACM international conference on multimodal interaction, ACM, pp 487–493Google Scholar
  17. 17.
    Li LJ, Socher R, Fei-Fei L (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: IEEE conference on computer vision and pattern recognition CVPR 2009, IEEE, pp 2036–2043Google Scholar
  18. 18.
    Lowe DG (1999) Object recognition from local scale-invariant features. In: The proceedings of the seventh IEEE international conference, vol 2, IEEE, pp 1150–1157Google Scholar
  19. 19.
    Rosten E, Porter R, Drummond T (2010) Faster and better: a machine learning approach to corner detection. IEEE Trans Pattern Anal Mach Intell 32(1):105–119CrossRefGoogle Scholar
  20. 20.
    Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to sift or surf. In: IEEE international conference on ICCV, pp 2564–2571Google Scholar
  21. 21.
    Scharwächter T, Enzweiler M, Franke U, Roth S (2014) Stixmantics: a medium-level model for real-time semantic scene understanding. In: European conference on computer vision, Springer, Cham, pp 533–548Google Scholar
  22. 22.
    Sun B, Wei Q, Li L, Xu Q, He J, Yu L (2016) Lstm for dynamic emotion and group emotion recognition in the wild. In: Proceedings of the 18th ACM international conference on multimodal interaction, ACM, pp 451–457Google Scholar
  23. 23.
    Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826Google Scholar
  24. 24.
    Tirilly P, Claveau V, Gros P (2008) Language modeling for bag-of-visual words image categorization. In: Proceedings of the 2008 international conference on content-based image and video retrieval, ACM, pp 249–258Google Scholar
  25. 25.
    Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154CrossRefGoogle Scholar
  26. 26.
    Vonikakis V, Yazici Y, Nguyen VD, Winkler S (2016) Group happiness assessment using geometric features and dataset balancing. In: Proceedings of the 18th ACM international conference on multimodal interaction, ACM, pp 479–486Google Scholar
  27. 27.
    Wang JG, Li J, Yau WY, Sung E (2010) Boosting dense sift descriptors and shape contexts of face images for gender recognition. In: 2010 IEEE computer society conference, IEEE, pp 96–102Google Scholar
  28. 28.
    Wang X, Jia J, Tang J, Wu B, Cai L, Xie L (2015) Modeling emotion influence in image social networks. IEEE Trans Affect Comput 6(3):286–297CrossRefGoogle Scholar
  29. 29.
    Wu J, Rehg JM (2011) Centrist: a visual descriptor for scene categorization. IEEE Trans Pattern Anal Mach Intell 33(8):1489–1501CrossRefGoogle Scholar
  30. 30.
    Yang J, Jiang YG, Hauptmann AG, Ngo CW (2007) Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the international workshop on workshop on multimedia information retrieval, ACM, pp 197–206Google Scholar
  31. 31.
    You Q, Luo J, Jin H, Yang J (2015) Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: AAAI, pp 381–388Google Scholar
  32. 32.
    Yuan J, Mcdonough S, You Q, Luo J (2013) Sentribute: image sentiment analysis from a mid-level perspective. In: Proceedings of the second international workshop on issues of sentiment discovery and opinion mining, ACM, p 10Google Scholar
  33. 33.
    Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, Springer, Cham, pp 818–833Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Electronics and Communications Engineering DepartmentThapar Institute of Engineering and TechnologyPatialaIndia

Personalised recommendations