Skip to main content
Log in

Hierarchical committee of deep convolutional neural networks for robust facial expression recognition

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

This paper describes our approach towards robust facial expression recognition (FER) for the third Emotion Recognition in the Wild (EmotiW2015) challenge. We train multiple deep convolutional neural networks (deep CNNs) as committee members and combine their decisions. To improve this committee of deep CNNs, we present two strategies: (1) in order to obtain diverse decisions from deep CNNs, we vary network architecture, input normalization, and random weight initialization in training these deep models, and (2) in order to form a better committee in structural and decisional aspects, we construct a hierarchical architecture of the committee with exponentially-weighted decision fusion. In solving a seven-class problem of static FER in the wild for the EmotiW2015, we achieve a test accuracy of 61.6 %. Moreover, on other public FER databases, our hierarchical committee of deep CNNs yields superior performance, outperforming or competing with state-of-the-art results for these databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Agostinelli F, Anderson MR, Lee H (2013) Adaptive multi-column deep neural networks with application to robust image denoising. In: Advances in Neural Information Processing Systems, pp 1493–1501

  2. Aksela M, Laaksonen J (2006) Using diversity of errors for selecting members of a committee classifier. Patt Recog 39(4):608–623

    Article  MATH  Google Scholar 

  3. Bell D, JwW Guan, Bi Y et al (2005) On combining classifier mass functions for text categorization. Know Data Eng IEEE Trans 17(10):1307–1319

    Article  Google Scholar 

  4. Boulesteix AL, Porzelius C, Daumer M (2008) Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value. Bioinformatics 24(15):1698–1706

    Article  Google Scholar 

  5. Cireşan D, Meier U, Masci J, Schmidhuber J (2012a) Multi-column deep neural network for traffic sign classification. Neural Networks 32:333–338

    Article  Google Scholar 

  6. Cireşan D, Meier U, Schmidhuber J (2012b) Multi-column deep neural networks for image classification. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, pp 3642–3649

  7. Cireşan DC, Meier U, Gambardella LM, Schmidhuber J (2010) Deep, big, simple neural nets for handwritten digit recognition. Neural Comput 22(12):3207–3220

    Article  Google Scholar 

  8. Cireşan DC, Meier U, Gambardella LM, Schmidhuber J (2011) Convolutional neural network committees for handwritten character classification. In: Document Analysis and Recognition (ICDAR), 2011 International Conference on, IEEE, pp 1135–1139

  9. Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. MultiMedia IEEE 19(3):34–41

    Article  Google Scholar 

  10. Dhall A, Goecke R, Joshi J, Wagner M, Gedeon T (2013) Emotion recognition in the wild challenge 2013. In: Proceedings of the 15th ACM on International conference on multimodal interaction, ACM, pp 509–516

  11. Dhall A, Goecke R, Joshi J, Sikka K, Gedeon T (2014) Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In: Proceedings of the 16th International Conference on Multimodal Interaction, ACM, pp 461–466

  12. Dhall A, Murthy OVR, Goecke R, Joshi J, Gedeon T (2015) Video and image based emotion recognition challenges in the wild: Emotiw 2015. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ACM, pp 423–426

  13. Dietterich TG (2000) Ensemble methods in machine learning. In: Multiple classifier systems, Springer, pp 1–15

  14. Ebrahimi Kahou S, Michalski V, Konda K, Memisevic R, Pal C (2015) Recurrent neural networks for emotion recognition in video. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ACM, pp 467–474

  15. Giacinto G, Roli F (2001) Design of effective neural network ensembles for image classification purposes. Image Vision Comput 19(9):699–707

    Article  Google Scholar 

  16. Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee DH et al (2015) Challenges in representation learning: A report on three machine learning contests. Neural Networks 64:59–63

    Article  Google Scholar 

  17. Gross R, Brajovic V (2003) An image preprocessing algorithm for illumination invariant face recognition. In: Audio-and Video-Based Biometric Person Authentication, Springer, pp 10–18

  18. Hansen LK, Salamon P (1990) Neural network ensembles. Patt Anal Mach Intell IEEE Trans 12(10):993–1001

    Article  Google Scholar 

  19. Huang Y, Suen C (1993) The behavior-knowledge space method for combination of multiple classifiers. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp 347–347

  20. Ionescu RT, Popescu M, Grozea C (2013) Local learning to improve bag of visual words model for facial expression recognition. In: Workshop on Challenges in Representation Learning, ICML

  21. Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87

    Article  Google Scholar 

  22. Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the em algorithm. Neural Comput 6(2):181–214

    Article  Google Scholar 

  23. Kahou SE, Pal C, Bouthillier X, Froumenty P, Gülçehre Ç, Memisevic R, Vincent P, Courville A, Bengio Y, Ferrari RC, et al. (2013) Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on International conference on multimodal interaction, ACM, pp 543–550

  24. Kahou SE, Froumenty P, Pal C (2014) Facial expression analysis based on high dimensional binary features. In: Computer Vision-ECCV 2014 Workshops, Springer, pp 135–147

  25. Khorrami P, Paine TL, Huang TS (2015) Do deep neural networks learn facial action units when doing expression recognition? arXiv preprint arXiv:1510.02969

  26. Kim BK, Lee H, Roh J, Lee SY (2015) Hierarchical committee of deep cnns with exponentially-weighted decision fusion for static facial expression recognition. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ACM, pp 427–434

  27. Kittler J, Hatef M, Duin RP, Matas J (1998) On combining classifiers. Patt Anal Mach Intell IEEE Trans 20(3):226–239

    Article  Google Scholar 

  28. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  29. Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley, USA

    Book  MATH  Google Scholar 

  30. Kuncheva LI, Bezdek JC, Duin RP (2001) Decision templates for multiple classifier fusion: an experimental comparison. Patt Recogn 34(2):299–314

    Article  MATH  Google Scholar 

  31. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Procee IEEE 86(11):2278–2324

    Article  Google Scholar 

  32. Liu M, Zhang D, Yap PT, Shen D (2012) Hierarchical ensemble of multi-level classifiers for diagnosis of alzheimer’s disease. In: Machine Learning in Medical Imaging, Springer, pp 27–35

  33. Liu M, Li S, Shan S, Chen X (2013) Enhancing expression recognition in the wild with unlabeled reference data. In: Computer Vision-ACCV 2012, Springer, pp 577–588

  34. Liu M, Wang R, Li S, Shan S, Huang Z, Chen X (2014) Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In: Proceedings of the 16th International Conference on Multimodal Interaction, ACM, pp 494–501

  35. Pajares G, Guijarro M, Ribeiro A (2010) A hopfield neural network for combining classifiers applied to textured images. Neural Networks 23(1):144–153

    Article  Google Scholar 

  36. Pan SJ, Yang Q (2010) A survey on transfer learning. Knowl Data Eng IEEE Trans 22(10):1345–1359

    Article  Google Scholar 

  37. Polikar R (2006) Ensemble based systems in decision making. Circ Syst Magaz IEEE 6(3):21–45

    Article  Google Scholar 

  38. Reed S, Lee H, Anguelov D, Szegedy C, Erhan D, Rabinovich A (2014a) Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596

  39. Reed S, Sohn K, Zhang Y, Lee H (2014b) Learning to disentangle factors of variation with manifold interaction. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp 1431–1439

  40. Rifai S, Bengio Y, Courville A, Vincent P, Mirza M (2012) Disentangling factors of variation for facial expression recognition. In: Computer Vision-ECCV 2012, Springer, pp 808–822

  41. Rodríguez-Liñares L, García-Mateo C, Alba-Castro JL (2003) On combining classifiers for speaker authentication. Patt Recogn 36(2):347–359

    Article  Google Scholar 

  42. Schuller B, Valstar M, Eyben F, McKeown G, Cowie R, Pantic M (2011) Avec 2011-the first international audio/visual emotion challenge. In: Affective Computing and Intelligent Interaction, Springer, pp 415–424

  43. Shan C (2012) Smile detection by boosting pixel differences. Image Process IEEE Trans 21(1):431–436

    Article  MathSciNet  Google Scholar 

  44. Sharkey AJC (1996) On combining artificial neural nets. Conn Sci 8(3–4):299–314

    Article  Google Scholar 

  45. Shipp CA, Kuncheva LI (2002) Relationships between combination methods and measures of diversity in combining classifiers. Inform Fusion 3(2):135–148

    Article  Google Scholar 

  46. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  47. Štruc V, Pavešic N (2011) Photometric normalization techniques for illumination invariance. Advances in Face Image Analysis: Techniques and Technologies pp 279–300

  48. Su Y, Shan S, Chen X, Gao W (2009) Hierarchical ensemble of global and local classifiers for face recognition. Image Process IEEE Trans 18(8):1885–1896

    Article  MathSciNet  Google Scholar 

  49. Sun Y, Wang X, Tang X (2014) Deep learning face representation from predicting 10,000 classes. In: Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, IEEE, pp 1891–1898

  50. Susskind JM, Anderson AK, Hinton GE (2010) The toronto face database. Department of Computer Science, University of Toronto, Toronto, ON, Canada, Tech Rep

  51. Tang Y (2013a) deep-learning-faces. https://code.google.com/p/deep-learning-faces/

  52. Tang Y (2013b) Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239

  53. Titsias MK, Likas A (2002) Mixture of experts classification using a hierarchical mixture model. Neural Comput 14(9):2221–2244

    Article  MATH  Google Scholar 

  54. Valstar MF, Jiang B, Mehu M, Pantic M, Scherer K (2011) The first facial expression recognition and analysis challenge. In: Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, IEEE, pp 921–926

  55. Vedaldi A, Lenc K (2014) Matconvnet-convolutional neural networks for matlab. arXiv preprint arXiv:1412.4564

  56. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154

  57. Whitehill J, Littlewort G, Fasel I, Bartlett M, Movellan J (2009) Toward practical smile detection. Patt Anal Mach Intell IEEE Trans 31(11):2106–2111

    Article  Google Scholar 

  58. Wolpert DH (1992) Stacked generalization. Neural Networks 5(2):241–259

    Article  MathSciNet  Google Scholar 

  59. Wu CH, Liang WB (2011) Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. Affect Comp IEEE Trans 2(1):10–21

    Article  MathSciNet  Google Scholar 

  60. Wu D, Shao L (2014) Deep dynamic neural networks for gesture segmentation and recognition. In: Computer Vision-ECCV 2014 Workshops, Springer, pp 552–571

  61. Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, IEEE, pp 532–539

  62. Yao A, Shao J, Ma N, Chen Y (2015) Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ACM, pp 451–458

  63. Yu Z, Zhang C (2015) Image based static facial expression recognition with multiple deep network learning. In: Proceedings of the 2015 ACM Int Confer Multi Inter ACM, pp 435–442

  64. Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, pp 2879–2886

Download references

Acknowledgments

This work was supported by the Industrial Strategic Technology Development Program (10044009, Development of a self-improving bidirectional sustainable HRI technology for 95 % of successful responses with understanding users complex emotion and transactional intent through continuous interactions) funded by the Ministry of Knowledge Economy (MKE, Korea).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo-Kyeong Kim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, BK., Roh, J., Dong, SY. et al. Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. J Multimodal User Interfaces 10, 173–189 (2016). https://doi.org/10.1007/s12193-015-0209-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-015-0209-0

Keywords

Navigation