Skip to main content
Log in

Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

Emotion recognition in the wild is a very challenging task. In this paper, we investigate a variety of different multimodal features (acoustic and visual) from video clips to evaluate their discriminative abilities in human emotion analysis. For each clip, we extract MSDF BoW, LBP-TOP, PHOG, LPQ-TOP and Audio features. We train different classifiers for every type of feature on the AFEW dataset from the ICMI 2014 EmotiW Challenge, and we propose a novel hierarchical classification framework, which combines the feature-level and decision-level fusion strategy for all of the extracted multimodal features. The final achievement we gain on the AFEW test set is 47.17 %, which is considerably better than the best baseline recognition rate of 33.7 %. Among all of the teams participating in the ICMI 2014 EmotiW challenge, our recognition performance won the first runner-up award. Furthermore, we test our method on FERA and CK datasets, the experimental results also show good performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Knapp M, Hall J, Horgan T (2013) Nonverbal communication in human interaction. Cengage Learning, Oklahoma

    Google Scholar 

  2. Pantic M, Rothkrantz LJM (2000) Automatic analysis of facial expressions: the state of the art. Pattern Anal Mach Intell IEEE Trans 22(12):1424–1445

    Article  Google Scholar 

  3. Wu T, Bartlett MS, Movellan JR (2010) Facial expression recognition using Gabor motion energy filters. In: Computer vision and pattern recognition workshops (CVPRW), 2010 IEEE computer society conference on IEEE, pp 42–47

  4. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Anal Mach Intell IEEE Trans 24(7):971–987

    Article  MATH  Google Scholar 

  5. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition. CVPR 2005. IEEE computer society conference, vol 1. IEEE, pp 886–893

  6. Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685

    Article  Google Scholar 

  7. Gönen M, Alpaydın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268

    MathSciNet  MATH  Google Scholar 

  8. Dhall A, Goecke R, Joshi J, Sikka K, Gedeon T (2014) Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 461–466

  9. Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on IEEE, pp 2879–2886

  10. Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: Computer vision and pattern recognition (CVPR), IEEE conference on IEEE, pp 532–539

  11. Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE Multimed 19(3):34–41

    Article  Google Scholar 

  12. Vedaldi A, Fulkerson B (2010) VLFeat: an open and portable library of computer vision algorithms. In: Proceedings of the international conference on multimedia. ACM, pp 1469–1472

  13. Zhang J, Marszałek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73(2):213–238

    Article  Google Scholar 

  14. Sikka K, Wu T, Susskind J, Bartlett M (2012) Exploring bag of words architectures in the facial expression domain. In: Computer vision-ECCV 2012. Workshops and demonstrations. Springer, Berlin, pp 250–259

  15. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Computer Vision and pattern recognition (CVPR), 2010 IEEE conference on IEEE, pp 3360–3367

  16. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Computer vision and pattern recognition, IEEE computer society conference on IEEE, vol. 2, pp 2169–2178

  17. Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: Computer vision and pattern recognition, CVPR 2009. IEEE conference on IEEE, pp 1794–1801

  18. Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. BMVC 2(4):239–259

  19. Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on image and video retrieval. ACM, pp 401–408

  20. Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Computer vision, ICCV 2005. Tenth IEEE international conference on IEEE, vol. 2, pp 1458–1465

  21. Dhall A, Asthana A, Goecke R, Gedeon T (2011) Emotion recognition using PHOG and LPQ features. In: Automatic face & gesture recognition and workshops (FG 2011), IEEE international conference on IEEE, pp 878–883

  22. Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. Pattern Anal Mach Intell IEEE Trans 29(6):915–928

    Article  Google Scholar 

  23. Päivärinta J, Rahtu E, Heikkilä J (2011) Volume local phase quantization for blur-insensitive dynamic texture classification. In: Image analysis. Springer, Berlin, pp 360–369

  24. Eyben F, Wöllmer M, Schuller B (2010) Opensmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the international conference on multimedia. ACM, pp 1459–1462

  25. Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Woodland P (2006) The HTK book (for HTK version 3.4). Camb Univ Eng Dep 2(2):2–3

    Google Scholar 

  26. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27

    Google Scholar 

  27. Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874

    MATH  Google Scholar 

  28. Liu M, Wang R, Huang Z, Shan S, Chen X (2013) Partial least squares regression on grassmannian manifold for emotion recognition. In: Proceedings of the 15th ACM on international conference on multimodal interaction. ACM, pp 525–530

  29. Zafeiriou S, Zhang C, Zhang Z (2015) A survey on face detection in the wild: past, present and future. Comput Vis Image Underst 138:1–24

    Article  Google Scholar 

  30. Peng Y, Ganesh A, Wright J, Xu W, Ma Y (2012) RASL: robust alignment by sparse and low-rank decomposition for linearly correlated images. Pattern Anal Mach Intell IEEE Trans 34(11):2233–2246

    Article  Google Scholar 

  31. Hassner T, Harel S, Paz E, Enbar R (2014) Effective face frontalization in unconstrained images. Preprint arXiv:1411.7964

  32. Ekman P, Friesen WV (1977) Facial action coding system. In: Blacking J (ed) Anthropology of the body. Academic Press, New York

    Google Scholar 

  33. Sikka K, Dykstra K, Sathyanarayana S, Littlewort G, Bartlett M (2013) Multiple kernel learning for emotion recognition in the wild. In: Proceedings of the 15th ACM on international conference on multimodal interaction. ACM, pp 517–524

  34. Kahou SE, Pal C, Bouthillier X, Froumenty P, Gülçehre Ç, Memisevic R, Wu Z (2013) Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on international conference on multimodal interaction. ACM, pp 543–550

  35. Liu M, Wang R, Li S, Shan S, Huang Z, Chen X (2014) Combining multiple kernel methods on iemannian manifold for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 494–501

  36. Chen J, Chen Z, Chi Z, Fu H (2014) Emotion recognition in the wild with feature fusion and multiple kernel learning. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 508–513

  37. El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit 44(3):572–587

    Article  MATH  Google Scholar 

  38. De la Torre F, Cohn JF (2011) Facial expression analysis. In: Moeslund TB, Hilton A, Krüger V, Sigal L (eds) Visual analysis of humans. Springer, London, pp 377–409

    Chapter  Google Scholar 

  39. Huang X, He Q, Hong X, Zhao G, Pietikainen M (2014) Improved spatiotemporal local monogenic binary pattern for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 514–520

  40. Xia H, Hoi SC (2013) Mkboost: a framework of multiple kernel boosting. Knowl Data Eng IEEE Trans 25(7):1574–1586

    Article  Google Scholar 

  41. Bucak SS, Jin R, Jain AK (2014) Multiple kernel learning for visual object recognition: a review. Pattern Anal Mach Intell IEEE Trans 36(7):1354–1369

    Article  Google Scholar 

  42. Valstar M, Girard J, Almaev T, McKeown G, Mehu M, Yin L, Cohn J (2015) Fera 2015-second facial expression recognition and analysis challenge. Proceeding of the IEEE ICFG

  43. Almaev TR, Valstar MF (2013) Local gabor binary patterns from three orthogonal planes for automatic facial expression recognition. In: Affective computing and intelligent interaction (ACII), humaine association conference on IEEE, pp 356–361

  44. Valstar MF, Jiang B, Mehu M, Pantic M, Scherer K (2011) The first facial expression recognition and analysis challenge. In: Automatic face & gesture recognition and workshops (FG 2011), IEEE international conference on IEEE, pp 921–926

  45. Tian YL, Kanade T, Cohn JF (2001) Recognizing action units for facial expression analysis. Pattern Anal Mach Intell IEEE Trans 23(2):97–115

    Article  Google Scholar 

  46. Sun B, Li L, Zuo T, Chen Y, Zhou G, Wu X (2014) Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 481–486

  47. Day M (2013) Emotion recognition with boosted tree classifiers. In: Proceedings of the 15th ACM on international conference on multimodal interaction. ACM, pp 531–534

  48. Tariq U, Lin KH, Li Z, Zhou X, Wang Z, Le V, Han TX (2011) Emotion recognition from an ensemble of features. In: Automatic face & gesture recognition and workshops (FG 2011), IEEE international conference on IEEE, pp 872–877

  49. Meudt S, Zharkov D, Kächele M, Schwenker F (2013) Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech. In: Proceedings of the 15th ACM on international conference on multimodal interaction. ACM, pp 551–556

Download references

Acknowledgments

This work is supported by the Fundamental Research Funds for the Central Universities of China (2014KJJCA15, 2012YBXS10) and the National Education Science Twelfth Five-Year Plan Key Issues of the Ministry of Education (DCA140229).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuewen Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, B., Li, L., Wu, X. et al. Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild. J Multimodal User Interfaces 10, 125–137 (2016). https://doi.org/10.1007/s12193-015-0203-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-015-0203-6

Keywords

Navigation