Skip to main content
Log in

Confidence-Weighted Local Expression Predictions for Occlusion Handling in Expression Recognition and Action Unit Detection

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Fully-automatic facial expression recognition (FER) is a key component of human behavior analysis. Performing FER from still images is a challenging task as it involves handling large interpersonal morphological differences, and as partial occlusions can occasionally happen. Furthermore, labelling expressions is a time-consuming process that is prone to subjectivity, thus the variability may not be fully covered by the training data. In this work, we propose to train random forests upon spatially-constrained random local subspaces of the face. The output local predictions form a categorical expression-driven high-level representation that we call local expression predictions (LEPs). LEPs can be combined to describe categorical facial expressions as well as action units (AUs). Furthermore, LEPs can be weighted by confidence scores provided by an autoencoder network. Such network is trained to locally capture the manifold of the non-occluded training data in a hierarchical way. Extensive experiments show that the proposed LEP representation yields high descriptive power for categorical expressions and AU occurrence prediction, and leads to interesting perspectives towards the design of occlusion-robust and confidence-aware FER systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  MATH  Google Scholar 

  • Bylander, T. (2002). Estimating generalization error on two-class datasets using out-of-bag estimates. Machine Learning, 48(1–3), 287–297.

    Article  MATH  Google Scholar 

  • Chen, C., Liaw, A., & Breiman, L. (2004). Using random forest to learn imbalanced data (Vol. 110). Technical report. Berkeley: University of California.

  • Chu, W.-S., De la Torre, F., & Cohn, J. F. (2013). Selective transfer machine for personalized facial action unit detection. In CVPR (pp. 3515–3522).

  • Cotter, S. F. (2010). Sparse representation for accurate classification of corrupted and occluded facial expressions. In ICASSP (pp. 838–841).

  • Dapogny, A., Bailly, K., & Dubuisson, S. (2015) Pairwise conditional random forests for facial expression recognition. In ICCV.

  • Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2011). Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In ICCV Workshops (pp. 2106–2112).

  • Dollár, P., Tu, Z., Perona, P., & Belongie, S. (2009). Integral channel features. In BMVC.

  • Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. In Proceedings of the National Academy of Sciences (pp. 111).

  • Ekman, P., & Friesen, W. V. (1977). Facial action coding system. Palo Alto: Consulting Psychologists Press.

  • Ekman, Paul, & Friesen, W. V. (1971). Constants across cultures in the face and emotion. Journal of Personality and Social Psychology, 17(2), 124.

    Article  Google Scholar 

  • Eleftheriadis, S., Rudovic, O., & Pantic, M. (2015). Multi-conditional latent variable model for joint facial action unit detection. In ICCV.

  • Eleftheriadis, S., Rudovic, O., & Pantic, M. (2015). Discriminative shared gaussian processes for multiview and view-invariant facial expression recognition. IEEE Transactions on Image Processing, 24(1), 189–204.

    Article  MathSciNet  Google Scholar 

  • Ghiasi, G., & Fowlkes, C. C. (2014). Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model. In CVPR (pp. 1899–1906).

  • Ghosh, S., Laksana, E., Scherer, S., & Morency, L.-P. (2015) A multi-label convolutional neural network approach to cross-domain action unit detection. In ACII.

  • Greenwald, M. K., Cook, E. W., & Lang, P. J. (1989). Affective judgment and psychophysiological response: Dimensional covariation in the evaluation of pictorial stimuli. Journal of Psychophysiology, 3(1), 51–64.

    Google Scholar 

  • Hayat, M., Bennamoun, M., & El-Sallam, A. A. (2012). Evaluation of spatiotemporal detectors and descriptors for facial expression recognition. In International Conference on Human-System Interaction (pp. 43–47).

  • Huang, X., Zhao, G., Zheng, W., & Pietikäinen, M. (2012). Towards a dynamic expression recognition system under facial occlusion. Pattern Recognition Letters, 33(16), 2181–2191.

    Article  Google Scholar 

  • Jeni, L., Cohn, J. F, & Kanade, J. F. (2015). Dense 3d face alignment from 2d videos in real-time. In FG.

  • Jiang, B., Valstar, M. F, & Pantic, M. (2011). Action unit detection using sparse appearance descriptors in space-time video volumes. In FG (pp. 314–321).

  • Jolliffe, I. (2002). Principal component analysis. NewYork: Wiley.

    MATH  Google Scholar 

  • Kotsia, I., Buciu, I., & Pitas, I. (2008). An analysis of facial expression recognition under partial facial image occlusion. Image and Vision Computing, 26(7), 1052–1067.

    Article  Google Scholar 

  • Linusson, H. (2013). Multi-output random forests. University of Borås/School of Business and IT.

  • Liu, M., Li, S., Shan, Shiguang, S., & Chen, X. (2013). Enhancing expression recognition in the wild with unlabeled reference data. In ACCV (pp. 577–588).

  • Liu, M., Li, S., Shan, S., & Chen, X. (2015). Au-inspired deep networks for facial expression feature learning. Neurocomputing, 159, 126–136.

    Article  Google Scholar 

  • Lucey, P., Cohn J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended cohn-kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In CVPR Workshops (pp. 94–101).

  • Mavadati, S. M., Mahoor, M. H., Bartlett, K., Trinh, P., & Cohn, J. F. (2013). DISFA: A spontaneous facial action intensity database. Transactions on Affective Computing, 4(2), 151–160.

    Article  Google Scholar 

  • Nicolle, J., Bailly, K., & Chetouani, M. (2015). Facial action unit intensity prediction via hard multi-task metric learning for kernel regression. In FG.

  • Pei, Y., Kim, T.-K., & Zha, H. (2013). Unsupervised random forest manifold alignment for lipreading. In ICCV (pp. 129–136).

  • Ranzato, M. A., Susskind, J., Mnih, V., & Hinton, G. (2011). On deep generative models with applications to recognition. In CVPR (pp. 2857–2864).

  • Ren, S., Cao, X., Wei, Y., & Sun, J. (2014). Face alignment at 3000 fps via regressing local binary features. In CVPR (pp. 1685–1692).

  • Rifai, S., Bengio, Y., Courville, A., Vincent, P., & Mirza, M. (2012). Disentangling factors of variation for facial expression recognition. In ECCV.

  • Rifai, S., Vincent, P., Muller, X., Glorot, X., & Bengio, Y. (2011). Contractive auto-encoders: Explicit invariance during feature extraction. In ICML (pp. 833–840).

  • Sandbach, G., Zafeiriou, S., Pantic, M., & Rueck, D. (2011). A dynamic approach to the recognition of 3D facial expressions and their temporal models. In FG (pp. 406–413).

  • Savran, A., Alyüz, N., Dibeklioğlu, H., Çeliktutan, O., Gökberk, B., Sankur, B., & Akarun, L. (2008). Bosphorus database for 3d face analysis. In Biometrics and Identity Management (pp. 47–56).

  • Sénéchal, T., Rapp, V., Salam, H., Seguier, R., Bailly, K., & Prevost, L. (2012). Facial action recognition combining heterogeneous features via multikernel learning. TSMC-B (pp. 42).

  • Shan, C., Gong, S., & McOwan, P. W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6), 803–816.

    Article  Google Scholar 

  • Sun, Y., & Yin, L. (2008). Facial expression recognition based on 3D dynamic range model sequences. In ECCV (pp. 58–71).

  • Van de Weijer, J., Ruiz, A., & Binefa, X. (2015). From emotions to action units with hidden and semi-hidden-task learning. In ICCV.

  • Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P.-A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408.

    MathSciNet  MATH  Google Scholar 

  • Wallhoff, F. (2006). Database with facial expressions and emotions from technical university of Munich (feedtum).

  • Xiong, X., & De la Torre, F. (2013). Supervised descent method and its applications to face alignment. In CVPR (pp. 532–539).

  • Xu, L., & Mordohai, P. (2010). Automatic facial expression recognition using bags of motion words. In BMVC (pp. 1–13).

  • Yin, L., Chen, X., & Sun, Y. (2008). Tony Worm, and Michael Reale. A high-resolution 3D dynamic facial expression database. In FG (pp. 1–6).

  • Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39–58.

    Article  Google Scholar 

  • Zhang, L., Tjondronegoro, D., & Chandran, V. (2014). Random Gabor based templates for facial expression recognition in images with facial occlusion. Neurocomputing, 145, 451–464.

    Article  Google Scholar 

  • Zhang, X., Yin, L., Cohn, J. F., Canavan, S., Reale, M., Horowitz, A., et al. (2014). BP4D-spontaneous a high-resolution spontaneous 3D dynamic facial expression database. Image and Vision Computing, 32(10), 692–706.

    Article  Google Scholar 

  • Zhao, K., Chu, W.-S., De la Torre, F., Jeffrey, F. C., & Honggang, Z. (2015). Joint patch and multi-label learning for facial action unit detection. In CVPR.

  • Zhao, K., Chu, W.-S., & Zhang, H. (2016). Deep region and multi-label learning for facial action unit detection. In CVPR.

  • Zhao, X., Kim, T. K., & Luo, W. (2014). Unified face analysis by iterative multi-output random forests. In CVPR (pp. 1765–1772).

  • Zhong, L., Liu, Q., Yang, P., Liu, B., Huang, J., & Metaxas, D. N. (2012). Learning active facial patches for expression analysis. In CVPR (pp. 2562–2569).

Download references

Acknowledgements

This work has been supported by the French National Agency (ANR) in the frame of its Technological Research CONTINT program (JEMImE, project number ANR-13-CORD-0004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnaud Dapogny.

Additional information

Communicated by Thomas Brox, Cordelia Schmid.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dapogny, A., Bailly, K. & Dubuisson, S. Confidence-Weighted Local Expression Predictions for Occlusion Handling in Expression Recognition and Action Unit Detection. Int J Comput Vis 126, 255–271 (2018). https://doi.org/10.1007/s11263-017-1010-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-017-1010-1

Keywords

Navigation