Journal on Multimodal User Interfaces

, Volume 10, Issue 2, pp 173–189 | Cite as

Hierarchical committee of deep convolutional neural networks for robust facial expression recognition

  • Bo-Kyeong KimEmail author
  • Jihyeon Roh
  • Suh-Yeon Dong
  • Soo-Young Lee
Original Paper


This paper describes our approach towards robust facial expression recognition (FER) for the third Emotion Recognition in the Wild (EmotiW2015) challenge. We train multiple deep convolutional neural networks (deep CNNs) as committee members and combine their decisions. To improve this committee of deep CNNs, we present two strategies: (1) in order to obtain diverse decisions from deep CNNs, we vary network architecture, input normalization, and random weight initialization in training these deep models, and (2) in order to form a better committee in structural and decisional aspects, we construct a hierarchical architecture of the committee with exponentially-weighted decision fusion. In solving a seven-class problem of static FER in the wild for the EmotiW2015, we achieve a test accuracy of 61.6 %. Moreover, on other public FER databases, our hierarchical committee of deep CNNs yields superior performance, outperforming or competing with state-of-the-art results for these databases.


Hierarchical committee Exponentially-weighted decision fusion Deep convolutional neural network Facial expression recognition 



This work was supported by the Industrial Strategic Technology Development Program (10044009, Development of a self-improving bidirectional sustainable HRI technology for 95 % of successful responses with understanding users complex emotion and transactional intent through continuous interactions) funded by the Ministry of Knowledge Economy (MKE, Korea).


  1. 1.
    Agostinelli F, Anderson MR, Lee H (2013) Adaptive multi-column deep neural networks with application to robust image denoising. In: Advances in Neural Information Processing Systems, pp 1493–1501Google Scholar
  2. 2.
    Aksela M, Laaksonen J (2006) Using diversity of errors for selecting members of a committee classifier. Patt Recog 39(4):608–623CrossRefzbMATHGoogle Scholar
  3. 3.
    Bell D, JwW Guan, Bi Y et al (2005) On combining classifier mass functions for text categorization. Know Data Eng IEEE Trans 17(10):1307–1319CrossRefGoogle Scholar
  4. 4.
    Boulesteix AL, Porzelius C, Daumer M (2008) Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value. Bioinformatics 24(15):1698–1706CrossRefGoogle Scholar
  5. 5.
    Cireşan D, Meier U, Masci J, Schmidhuber J (2012a) Multi-column deep neural network for traffic sign classification. Neural Networks 32:333–338CrossRefGoogle Scholar
  6. 6.
    Cireşan D, Meier U, Schmidhuber J (2012b) Multi-column deep neural networks for image classification. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, pp 3642–3649Google Scholar
  7. 7.
    Cireşan DC, Meier U, Gambardella LM, Schmidhuber J (2010) Deep, big, simple neural nets for handwritten digit recognition. Neural Comput 22(12):3207–3220CrossRefGoogle Scholar
  8. 8.
    Cireşan DC, Meier U, Gambardella LM, Schmidhuber J (2011) Convolutional neural network committees for handwritten character classification. In: Document Analysis and Recognition (ICDAR), 2011 International Conference on, IEEE, pp 1135–1139Google Scholar
  9. 9.
    Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. MultiMedia IEEE 19(3):34–41CrossRefGoogle Scholar
  10. 10.
    Dhall A, Goecke R, Joshi J, Wagner M, Gedeon T (2013) Emotion recognition in the wild challenge 2013. In: Proceedings of the 15th ACM on International conference on multimodal interaction, ACM, pp 509–516Google Scholar
  11. 11.
    Dhall A, Goecke R, Joshi J, Sikka K, Gedeon T (2014) Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In: Proceedings of the 16th International Conference on Multimodal Interaction, ACM, pp 461–466Google Scholar
  12. 12.
    Dhall A, Murthy OVR, Goecke R, Joshi J, Gedeon T (2015) Video and image based emotion recognition challenges in the wild: Emotiw 2015. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ACM, pp 423–426Google Scholar
  13. 13.
    Dietterich TG (2000) Ensemble methods in machine learning. In: Multiple classifier systems, Springer, pp 1–15Google Scholar
  14. 14.
    Ebrahimi Kahou S, Michalski V, Konda K, Memisevic R, Pal C (2015) Recurrent neural networks for emotion recognition in video. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ACM, pp 467–474Google Scholar
  15. 15.
    Giacinto G, Roli F (2001) Design of effective neural network ensembles for image classification purposes. Image Vision Comput 19(9):699–707CrossRefGoogle Scholar
  16. 16.
    Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee DH et al (2015) Challenges in representation learning: A report on three machine learning contests. Neural Networks 64:59–63CrossRefGoogle Scholar
  17. 17.
    Gross R, Brajovic V (2003) An image preprocessing algorithm for illumination invariant face recognition. In: Audio-and Video-Based Biometric Person Authentication, Springer, pp 10–18Google Scholar
  18. 18.
    Hansen LK, Salamon P (1990) Neural network ensembles. Patt Anal Mach Intell IEEE Trans 12(10):993–1001CrossRefGoogle Scholar
  19. 19.
    Huang Y, Suen C (1993) The behavior-knowledge space method for combination of multiple classifiers. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp 347–347Google Scholar
  20. 20.
    Ionescu RT, Popescu M, Grozea C (2013) Local learning to improve bag of visual words model for facial expression recognition. In: Workshop on Challenges in Representation Learning, ICMLGoogle Scholar
  21. 21.
    Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87CrossRefGoogle Scholar
  22. 22.
    Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the em algorithm. Neural Comput 6(2):181–214CrossRefGoogle Scholar
  23. 23.
    Kahou SE, Pal C, Bouthillier X, Froumenty P, Gülçehre Ç, Memisevic R, Vincent P, Courville A, Bengio Y, Ferrari RC, et al. (2013) Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on International conference on multimodal interaction, ACM, pp 543–550Google Scholar
  24. 24.
    Kahou SE, Froumenty P, Pal C (2014) Facial expression analysis based on high dimensional binary features. In: Computer Vision-ECCV 2014 Workshops, Springer, pp 135–147Google Scholar
  25. 25.
    Khorrami P, Paine TL, Huang TS (2015) Do deep neural networks learn facial action units when doing expression recognition? arXiv preprint arXiv:1510.02969
  26. 26.
    Kim BK, Lee H, Roh J, Lee SY (2015) Hierarchical committee of deep cnns with exponentially-weighted decision fusion for static facial expression recognition. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ACM, pp 427–434Google Scholar
  27. 27.
    Kittler J, Hatef M, Duin RP, Matas J (1998) On combining classifiers. Patt Anal Mach Intell IEEE Trans 20(3):226–239CrossRefGoogle Scholar
  28. 28.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105Google Scholar
  29. 29.
    Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley, USACrossRefzbMATHGoogle Scholar
  30. 30.
    Kuncheva LI, Bezdek JC, Duin RP (2001) Decision templates for multiple classifier fusion: an experimental comparison. Patt Recogn 34(2):299–314CrossRefzbMATHGoogle Scholar
  31. 31.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Procee IEEE 86(11):2278–2324CrossRefGoogle Scholar
  32. 32.
    Liu M, Zhang D, Yap PT, Shen D (2012) Hierarchical ensemble of multi-level classifiers for diagnosis of alzheimer’s disease. In: Machine Learning in Medical Imaging, Springer, pp 27–35Google Scholar
  33. 33.
    Liu M, Li S, Shan S, Chen X (2013) Enhancing expression recognition in the wild with unlabeled reference data. In: Computer Vision-ACCV 2012, Springer, pp 577–588Google Scholar
  34. 34.
    Liu M, Wang R, Li S, Shan S, Huang Z, Chen X (2014) Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In: Proceedings of the 16th International Conference on Multimodal Interaction, ACM, pp 494–501Google Scholar
  35. 35.
    Pajares G, Guijarro M, Ribeiro A (2010) A hopfield neural network for combining classifiers applied to textured images. Neural Networks 23(1):144–153CrossRefGoogle Scholar
  36. 36.
    Pan SJ, Yang Q (2010) A survey on transfer learning. Knowl Data Eng IEEE Trans 22(10):1345–1359CrossRefGoogle Scholar
  37. 37.
    Polikar R (2006) Ensemble based systems in decision making. Circ Syst Magaz IEEE 6(3):21–45CrossRefGoogle Scholar
  38. 38.
    Reed S, Lee H, Anguelov D, Szegedy C, Erhan D, Rabinovich A (2014a) Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596
  39. 39.
    Reed S, Sohn K, Zhang Y, Lee H (2014b) Learning to disentangle factors of variation with manifold interaction. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp 1431–1439Google Scholar
  40. 40.
    Rifai S, Bengio Y, Courville A, Vincent P, Mirza M (2012) Disentangling factors of variation for facial expression recognition. In: Computer Vision-ECCV 2012, Springer, pp 808–822Google Scholar
  41. 41.
    Rodríguez-Liñares L, García-Mateo C, Alba-Castro JL (2003) On combining classifiers for speaker authentication. Patt Recogn 36(2):347–359CrossRefGoogle Scholar
  42. 42.
    Schuller B, Valstar M, Eyben F, McKeown G, Cowie R, Pantic M (2011) Avec 2011-the first international audio/visual emotion challenge. In: Affective Computing and Intelligent Interaction, Springer, pp 415–424Google Scholar
  43. 43.
    Shan C (2012) Smile detection by boosting pixel differences. Image Process IEEE Trans 21(1):431–436MathSciNetCrossRefGoogle Scholar
  44. 44.
    Sharkey AJC (1996) On combining artificial neural nets. Conn Sci 8(3–4):299–314CrossRefGoogle Scholar
  45. 45.
    Shipp CA, Kuncheva LI (2002) Relationships between combination methods and measures of diversity in combining classifiers. Inform Fusion 3(2):135–148CrossRefGoogle Scholar
  46. 46.
    Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetzbMATHGoogle Scholar
  47. 47.
    Štruc V, Pavešic N (2011) Photometric normalization techniques for illumination invariance. Advances in Face Image Analysis: Techniques and Technologies pp 279–300Google Scholar
  48. 48.
    Su Y, Shan S, Chen X, Gao W (2009) Hierarchical ensemble of global and local classifiers for face recognition. Image Process IEEE Trans 18(8):1885–1896MathSciNetCrossRefGoogle Scholar
  49. 49.
    Sun Y, Wang X, Tang X (2014) Deep learning face representation from predicting 10,000 classes. In: Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, IEEE, pp 1891–1898Google Scholar
  50. 50.
    Susskind JM, Anderson AK, Hinton GE (2010) The toronto face database. Department of Computer Science, University of Toronto, Toronto, ON, Canada, Tech RepGoogle Scholar
  51. 51.
    Tang Y (2013a) deep-learning-faces.
  52. 52.
    Tang Y (2013b) Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239
  53. 53.
    Titsias MK, Likas A (2002) Mixture of experts classification using a hierarchical mixture model. Neural Comput 14(9):2221–2244CrossRefzbMATHGoogle Scholar
  54. 54.
    Valstar MF, Jiang B, Mehu M, Pantic M, Scherer K (2011) The first facial expression recognition and analysis challenge. In: Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, IEEE, pp 921–926Google Scholar
  55. 55.
    Vedaldi A, Lenc K (2014) Matconvnet-convolutional neural networks for matlab. arXiv preprint arXiv:1412.4564
  56. 56.
    Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154Google Scholar
  57. 57.
    Whitehill J, Littlewort G, Fasel I, Bartlett M, Movellan J (2009) Toward practical smile detection. Patt Anal Mach Intell IEEE Trans 31(11):2106–2111CrossRefGoogle Scholar
  58. 58.
    Wolpert DH (1992) Stacked generalization. Neural Networks 5(2):241–259MathSciNetCrossRefGoogle Scholar
  59. 59.
    Wu CH, Liang WB (2011) Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. Affect Comp IEEE Trans 2(1):10–21MathSciNetCrossRefGoogle Scholar
  60. 60.
    Wu D, Shao L (2014) Deep dynamic neural networks for gesture segmentation and recognition. In: Computer Vision-ECCV 2014 Workshops, Springer, pp 552–571Google Scholar
  61. 61.
    Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, IEEE, pp 532–539Google Scholar
  62. 62.
    Yao A, Shao J, Ma N, Chen Y (2015) Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ACM, pp 451–458Google Scholar
  63. 63.
    Yu Z, Zhang C (2015) Image based static facial expression recognition with multiple deep network learning. In: Proceedings of the 2015 ACM Int Confer Multi Inter ACM, pp 435–442Google Scholar
  64. 64.
    Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, pp 2879–2886Google Scholar

Copyright information

© OpenInterface Association 2016

Authors and Affiliations

  • Bo-Kyeong Kim
    • 1
    Email author
  • Jihyeon Roh
    • 1
  • Suh-Yeon Dong
    • 1
  • Soo-Young Lee
    • 1
  1. 1.Department of Electrical EngineeringKorea Advanced Institute of Science and Technology (KAIST)DaejeonRepublic of Korea

Personalised recommendations