Confidence-Weighted Local Expression Predictions for Occlusion Handling in Expression Recognition and Action Unit Detection

  • Arnaud Dapogny
  • Kevin Bailly
  • Séverine Dubuisson


Fully-automatic facial expression recognition (FER) is a key component of human behavior analysis. Performing FER from still images is a challenging task as it involves handling large interpersonal morphological differences, and as partial occlusions can occasionally happen. Furthermore, labelling expressions is a time-consuming process that is prone to subjectivity, thus the variability may not be fully covered by the training data. In this work, we propose to train random forests upon spatially-constrained random local subspaces of the face. The output local predictions form a categorical expression-driven high-level representation that we call local expression predictions (LEPs). LEPs can be combined to describe categorical facial expressions as well as action units (AUs). Furthermore, LEPs can be weighted by confidence scores provided by an autoencoder network. Such network is trained to locally capture the manifold of the non-occluded training data in a hierarchical way. Extensive experiments show that the proposed LEP representation yields high descriptive power for categorical expressions and AU occurrence prediction, and leads to interesting perspectives towards the design of occlusion-robust and confidence-aware FER systems.


Facial expressions Action unit Random forest Occlusions Autoencoder Real-time 



This work has been supported by the French National Agency (ANR) in the frame of its Technological Research CONTINT program (JEMImE, project number ANR-13-CORD-0004).


  1. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.CrossRefMATHGoogle Scholar
  2. Bylander, T. (2002). Estimating generalization error on two-class datasets using out-of-bag estimates. Machine Learning, 48(1–3), 287–297.CrossRefMATHGoogle Scholar
  3. Chen, C., Liaw, A., & Breiman, L. (2004). Using random forest to learn imbalanced data (Vol. 110). Technical report. Berkeley: University of California.Google Scholar
  4. Chu, W.-S., De la Torre, F., & Cohn, J. F. (2013). Selective transfer machine for personalized facial action unit detection. In CVPR (pp. 3515–3522).Google Scholar
  5. Cotter, S. F. (2010). Sparse representation for accurate classification of corrupted and occluded facial expressions. In ICASSP (pp. 838–841).Google Scholar
  6. Dapogny, A., Bailly, K., & Dubuisson, S. (2015) Pairwise conditional random forests for facial expression recognition. In ICCV.Google Scholar
  7. Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2011). Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In ICCV Workshops (pp. 2106–2112).Google Scholar
  8. Dollár, P., Tu, Z., Perona, P., & Belongie, S. (2009). Integral channel features. In BMVC.Google Scholar
  9. Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. In Proceedings of the National Academy of Sciences (pp. 111).Google Scholar
  10. Ekman, P., & Friesen, W. V. (1977). Facial action coding system. Palo Alto: Consulting Psychologists Press.Google Scholar
  11. Ekman, Paul, & Friesen, W. V. (1971). Constants across cultures in the face and emotion. Journal of Personality and Social Psychology, 17(2), 124.CrossRefGoogle Scholar
  12. Eleftheriadis, S., Rudovic, O., & Pantic, M. (2015). Multi-conditional latent variable model for joint facial action unit detection. In ICCV.Google Scholar
  13. Eleftheriadis, S., Rudovic, O., & Pantic, M. (2015). Discriminative shared gaussian processes for multiview and view-invariant facial expression recognition. IEEE Transactions on Image Processing, 24(1), 189–204.MathSciNetCrossRefGoogle Scholar
  14. Ghiasi, G., & Fowlkes, C. C. (2014). Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model. In CVPR (pp. 1899–1906).Google Scholar
  15. Ghosh, S., Laksana, E., Scherer, S., & Morency, L.-P. (2015) A multi-label convolutional neural network approach to cross-domain action unit detection. In ACII.Google Scholar
  16. Greenwald, M. K., Cook, E. W., & Lang, P. J. (1989). Affective judgment and psychophysiological response: Dimensional covariation in the evaluation of pictorial stimuli. Journal of Psychophysiology, 3(1), 51–64.Google Scholar
  17. Hayat, M., Bennamoun, M., & El-Sallam, A. A. (2012). Evaluation of spatiotemporal detectors and descriptors for facial expression recognition. In International Conference on Human-System Interaction (pp. 43–47).Google Scholar
  18. Huang, X., Zhao, G., Zheng, W., & Pietikäinen, M. (2012). Towards a dynamic expression recognition system under facial occlusion. Pattern Recognition Letters, 33(16), 2181–2191.CrossRefGoogle Scholar
  19. Jeni, L., Cohn, J. F, & Kanade, J. F. (2015). Dense 3d face alignment from 2d videos in real-time. In FG.Google Scholar
  20. Jiang, B., Valstar, M. F, & Pantic, M. (2011). Action unit detection using sparse appearance descriptors in space-time video volumes. In FG (pp. 314–321).Google Scholar
  21. Jolliffe, I. (2002). Principal component analysis. NewYork: Wiley.MATHGoogle Scholar
  22. Kotsia, I., Buciu, I., & Pitas, I. (2008). An analysis of facial expression recognition under partial facial image occlusion. Image and Vision Computing, 26(7), 1052–1067.CrossRefGoogle Scholar
  23. Linusson, H. (2013). Multi-output random forests. University of Borås/School of Business and IT.Google Scholar
  24. Liu, M., Li, S., Shan, Shiguang, S., & Chen, X. (2013). Enhancing expression recognition in the wild with unlabeled reference data. In ACCV (pp. 577–588).Google Scholar
  25. Liu, M., Li, S., Shan, S., & Chen, X. (2015). Au-inspired deep networks for facial expression feature learning. Neurocomputing, 159, 126–136.CrossRefGoogle Scholar
  26. Lucey, P., Cohn J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended cohn-kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In CVPR Workshops (pp. 94–101).Google Scholar
  27. Mavadati, S. M., Mahoor, M. H., Bartlett, K., Trinh, P., & Cohn, J. F. (2013). DISFA: A spontaneous facial action intensity database. Transactions on Affective Computing, 4(2), 151–160.CrossRefGoogle Scholar
  28. Nicolle, J., Bailly, K., & Chetouani, M. (2015). Facial action unit intensity prediction via hard multi-task metric learning for kernel regression. In FG.Google Scholar
  29. Pei, Y., Kim, T.-K., & Zha, H. (2013). Unsupervised random forest manifold alignment for lipreading. In ICCV (pp. 129–136).Google Scholar
  30. Ranzato, M. A., Susskind, J., Mnih, V., & Hinton, G. (2011). On deep generative models with applications to recognition. In CVPR (pp. 2857–2864).Google Scholar
  31. Ren, S., Cao, X., Wei, Y., & Sun, J. (2014). Face alignment at 3000 fps via regressing local binary features. In CVPR (pp. 1685–1692).Google Scholar
  32. Rifai, S., Bengio, Y., Courville, A., Vincent, P., & Mirza, M. (2012). Disentangling factors of variation for facial expression recognition. In ECCV.Google Scholar
  33. Rifai, S., Vincent, P., Muller, X., Glorot, X., & Bengio, Y. (2011). Contractive auto-encoders: Explicit invariance during feature extraction. In ICML (pp. 833–840).Google Scholar
  34. Sandbach, G., Zafeiriou, S., Pantic, M., & Rueck, D. (2011). A dynamic approach to the recognition of 3D facial expressions and their temporal models. In FG (pp. 406–413).Google Scholar
  35. Savran, A., Alyüz, N., Dibeklioğlu, H., Çeliktutan, O., Gökberk, B., Sankur, B., & Akarun, L. (2008). Bosphorus database for 3d face analysis. In Biometrics and Identity Management (pp. 47–56).Google Scholar
  36. Sénéchal, T., Rapp, V., Salam, H., Seguier, R., Bailly, K., & Prevost, L. (2012). Facial action recognition combining heterogeneous features via multikernel learning. TSMC-B (pp. 42).Google Scholar
  37. Shan, C., Gong, S., & McOwan, P. W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6), 803–816.CrossRefGoogle Scholar
  38. Sun, Y., & Yin, L. (2008). Facial expression recognition based on 3D dynamic range model sequences. In ECCV (pp. 58–71).Google Scholar
  39. Van de Weijer, J., Ruiz, A., & Binefa, X. (2015). From emotions to action units with hidden and semi-hidden-task learning. In ICCV.Google Scholar
  40. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P.-A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408.MathSciNetMATHGoogle Scholar
  41. Wallhoff, F. (2006). Database with facial expressions and emotions from technical university of Munich (feedtum).Google Scholar
  42. Xiong, X., & De la Torre, F. (2013). Supervised descent method and its applications to face alignment. In CVPR (pp. 532–539).Google Scholar
  43. Xu, L., & Mordohai, P. (2010). Automatic facial expression recognition using bags of motion words. In BMVC (pp. 1–13).Google Scholar
  44. Yin, L., Chen, X., & Sun, Y. (2008). Tony Worm, and Michael Reale. A high-resolution 3D dynamic facial expression database. In FG (pp. 1–6).Google Scholar
  45. Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39–58.CrossRefGoogle Scholar
  46. Zhang, L., Tjondronegoro, D., & Chandran, V. (2014). Random Gabor based templates for facial expression recognition in images with facial occlusion. Neurocomputing, 145, 451–464.CrossRefGoogle Scholar
  47. Zhang, X., Yin, L., Cohn, J. F., Canavan, S., Reale, M., Horowitz, A., et al. (2014). BP4D-spontaneous a high-resolution spontaneous 3D dynamic facial expression database. Image and Vision Computing, 32(10), 692–706.CrossRefGoogle Scholar
  48. Zhao, K., Chu, W.-S., De la Torre, F., Jeffrey, F. C., & Honggang, Z. (2015). Joint patch and multi-label learning for facial action unit detection. In CVPR.Google Scholar
  49. Zhao, K., Chu, W.-S., & Zhang, H. (2016). Deep region and multi-label learning for facial action unit detection. In CVPR.Google Scholar
  50. Zhao, X., Kim, T. K., & Luo, W. (2014). Unified face analysis by iterative multi-output random forests. In CVPR (pp. 1765–1772).Google Scholar
  51. Zhong, L., Liu, Q., Yang, P., Liu, B., Huang, J., & Metaxas, D. N. (2012). Learning active facial patches for expression analysis. In CVPR (pp. 2562–2569).Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.UPMC Univ Paris 06, CNRS, UMR 7222Sorbonne UniversitésParisFrance

Personalised recommendations