Confidence-Weighted Local Expression Predictions for Occlusion Handling in Expression Recognition and Action Unit Detection

Dapogny, Arnaud; Bailly, Kevin; Dubuisson, Séverine

doi:10.1007/s11263-017-1010-1

Confidence-Weighted Local Expression Predictions for Occlusion Handling in Expression Recognition and Action Unit Detection

Published: 08 April 2017

Volume 126, pages 255–271, (2018)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

1396 Accesses
53 Citations
Explore all metrics

Abstract

Fully-automatic facial expression recognition (FER) is a key component of human behavior analysis. Performing FER from still images is a challenging task as it involves handling large interpersonal morphological differences, and as partial occlusions can occasionally happen. Furthermore, labelling expressions is a time-consuming process that is prone to subjectivity, thus the variability may not be fully covered by the training data. In this work, we propose to train random forests upon spatially-constrained random local subspaces of the face. The output local predictions form a categorical expression-driven high-level representation that we call local expression predictions (LEPs). LEPs can be combined to describe categorical facial expressions as well as action units (AUs). Furthermore, LEPs can be weighted by confidence scores provided by an autoencoder network. Such network is trained to locally capture the manifold of the non-occluded training data in a hierarchical way. Extensive experiments show that the proposed LEP representation yields high descriptive power for categorical expressions and AU occurrence prediction, and leads to interesting perspectives towards the design of occlusion-robust and confidence-aware FER systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Facial Expression Recognition Using Cascaded Random Forest Based on Local Features

Learning facial expression-aware global-to-local representation for robust action unit detection

Article 06 January 2024

Self-supervised facial expression recognition with fine-grained feature selection

Article 17 March 2024

References

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Article MATH Google Scholar
Bylander, T. (2002). Estimating generalization error on two-class datasets using out-of-bag estimates. Machine Learning, 48(1–3), 287–297.
Article MATH Google Scholar
Chen, C., Liaw, A., & Breiman, L. (2004). Using random forest to learn imbalanced data (Vol. 110). Technical report. Berkeley: University of California.
Chu, W.-S., De la Torre, F., & Cohn, J. F. (2013). Selective transfer machine for personalized facial action unit detection. In CVPR (pp. 3515–3522).
Cotter, S. F. (2010). Sparse representation for accurate classification of corrupted and occluded facial expressions. In ICASSP (pp. 838–841).
Dapogny, A., Bailly, K., & Dubuisson, S. (2015) Pairwise conditional random forests for facial expression recognition. In ICCV.
Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2011). Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In ICCV Workshops (pp. 2106–2112).
Dollár, P., Tu, Z., Perona, P., & Belongie, S. (2009). Integral channel features. In BMVC.
Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. In Proceedings of the National Academy of Sciences (pp. 111).
Ekman, P., & Friesen, W. V. (1977). Facial action coding system. Palo Alto: Consulting Psychologists Press.
Ekman, Paul, & Friesen, W. V. (1971). Constants across cultures in the face and emotion. Journal of Personality and Social Psychology, 17(2), 124.
Article Google Scholar
Eleftheriadis, S., Rudovic, O., & Pantic, M. (2015). Multi-conditional latent variable model for joint facial action unit detection. In ICCV.
Eleftheriadis, S., Rudovic, O., & Pantic, M. (2015). Discriminative shared gaussian processes for multiview and view-invariant facial expression recognition. IEEE Transactions on Image Processing, 24(1), 189–204.
Article MathSciNet Google Scholar
Ghiasi, G., & Fowlkes, C. C. (2014). Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model. In CVPR (pp. 1899–1906).
Ghosh, S., Laksana, E., Scherer, S., & Morency, L.-P. (2015) A multi-label convolutional neural network approach to cross-domain action unit detection. In ACII.
Greenwald, M. K., Cook, E. W., & Lang, P. J. (1989). Affective judgment and psychophysiological response: Dimensional covariation in the evaluation of pictorial stimuli. Journal of Psychophysiology, 3(1), 51–64.
Google Scholar
Hayat, M., Bennamoun, M., & El-Sallam, A. A. (2012). Evaluation of spatiotemporal detectors and descriptors for facial expression recognition. In International Conference on Human-System Interaction (pp. 43–47).
Huang, X., Zhao, G., Zheng, W., & Pietikäinen, M. (2012). Towards a dynamic expression recognition system under facial occlusion. Pattern Recognition Letters, 33(16), 2181–2191.
Article Google Scholar
Jeni, L., Cohn, J. F, & Kanade, J. F. (2015). Dense 3d face alignment from 2d videos in real-time. In FG.
Jiang, B., Valstar, M. F, & Pantic, M. (2011). Action unit detection using sparse appearance descriptors in space-time video volumes. In FG (pp. 314–321).
Jolliffe, I. (2002). Principal component analysis. NewYork: Wiley.
MATH Google Scholar
Kotsia, I., Buciu, I., & Pitas, I. (2008). An analysis of facial expression recognition under partial facial image occlusion. Image and Vision Computing, 26(7), 1052–1067.
Article Google Scholar
Linusson, H. (2013). Multi-output random forests. University of Borås/School of Business and IT.
Liu, M., Li, S., Shan, Shiguang, S., & Chen, X. (2013). Enhancing expression recognition in the wild with unlabeled reference data. In ACCV (pp. 577–588).
Liu, M., Li, S., Shan, S., & Chen, X. (2015). Au-inspired deep networks for facial expression feature learning. Neurocomputing, 159, 126–136.
Article Google Scholar
Lucey, P., Cohn J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended cohn-kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In CVPR Workshops (pp. 94–101).
Mavadati, S. M., Mahoor, M. H., Bartlett, K., Trinh, P., & Cohn, J. F. (2013). DISFA: A spontaneous facial action intensity database. Transactions on Affective Computing, 4(2), 151–160.
Article Google Scholar
Nicolle, J., Bailly, K., & Chetouani, M. (2015). Facial action unit intensity prediction via hard multi-task metric learning for kernel regression. In FG.
Pei, Y., Kim, T.-K., & Zha, H. (2013). Unsupervised random forest manifold alignment for lipreading. In ICCV (pp. 129–136).
Ranzato, M. A., Susskind, J., Mnih, V., & Hinton, G. (2011). On deep generative models with applications to recognition. In CVPR (pp. 2857–2864).
Ren, S., Cao, X., Wei, Y., & Sun, J. (2014). Face alignment at 3000 fps via regressing local binary features. In CVPR (pp. 1685–1692).
Rifai, S., Bengio, Y., Courville, A., Vincent, P., & Mirza, M. (2012). Disentangling factors of variation for facial expression recognition. In ECCV.
Rifai, S., Vincent, P., Muller, X., Glorot, X., & Bengio, Y. (2011). Contractive auto-encoders: Explicit invariance during feature extraction. In ICML (pp. 833–840).
Sandbach, G., Zafeiriou, S., Pantic, M., & Rueck, D. (2011). A dynamic approach to the recognition of 3D facial expressions and their temporal models. In FG (pp. 406–413).
Savran, A., Alyüz, N., Dibeklioğlu, H., Çeliktutan, O., Gökberk, B., Sankur, B., & Akarun, L. (2008). Bosphorus database for 3d face analysis. In Biometrics and Identity Management (pp. 47–56).
Sénéchal, T., Rapp, V., Salam, H., Seguier, R., Bailly, K., & Prevost, L. (2012). Facial action recognition combining heterogeneous features via multikernel learning. TSMC-B (pp. 42).
Shan, C., Gong, S., & McOwan, P. W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6), 803–816.
Article Google Scholar
Sun, Y., & Yin, L. (2008). Facial expression recognition based on 3D dynamic range model sequences. In ECCV (pp. 58–71).
Van de Weijer, J., Ruiz, A., & Binefa, X. (2015). From emotions to action units with hidden and semi-hidden-task learning. In ICCV.
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P.-A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408.
MathSciNet MATH Google Scholar
Wallhoff, F. (2006). Database with facial expressions and emotions from technical university of Munich (feedtum).
Xiong, X., & De la Torre, F. (2013). Supervised descent method and its applications to face alignment. In CVPR (pp. 532–539).
Xu, L., & Mordohai, P. (2010). Automatic facial expression recognition using bags of motion words. In BMVC (pp. 1–13).
Yin, L., Chen, X., & Sun, Y. (2008). Tony Worm, and Michael Reale. A high-resolution 3D dynamic facial expression database. In FG (pp. 1–6).
Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39–58.
Article Google Scholar
Zhang, L., Tjondronegoro, D., & Chandran, V. (2014). Random Gabor based templates for facial expression recognition in images with facial occlusion. Neurocomputing, 145, 451–464.
Article Google Scholar
Zhang, X., Yin, L., Cohn, J. F., Canavan, S., Reale, M., Horowitz, A., et al. (2014). BP4D-spontaneous a high-resolution spontaneous 3D dynamic facial expression database. Image and Vision Computing, 32(10), 692–706.
Article Google Scholar
Zhao, K., Chu, W.-S., De la Torre, F., Jeffrey, F. C., & Honggang, Z. (2015). Joint patch and multi-label learning for facial action unit detection. In CVPR.
Zhao, K., Chu, W.-S., & Zhang, H. (2016). Deep region and multi-label learning for facial action unit detection. In CVPR.
Zhao, X., Kim, T. K., & Luo, W. (2014). Unified face analysis by iterative multi-output random forests. In CVPR (pp. 1765–1772).
Zhong, L., Liu, Q., Yang, P., Liu, B., Huang, J., & Metaxas, D. N. (2012). Learning active facial patches for expression analysis. In CVPR (pp. 2562–2569).

Download references

Acknowledgements

This work has been supported by the French National Agency (ANR) in the frame of its Technological Research CONTINT program (JEMImE, project number ANR-13-CORD-0004).

Author information

Authors and Affiliations

UPMC Univ Paris 06, CNRS, UMR 7222, Sorbonne Universités, 75005, Paris, France
Arnaud Dapogny, Kevin Bailly & Séverine Dubuisson

Authors

Arnaud Dapogny
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Bailly
View author publications
You can also search for this author in PubMed Google Scholar
Séverine Dubuisson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arnaud Dapogny.

Additional information

Communicated by Thomas Brox, Cordelia Schmid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dapogny, A., Bailly, K. & Dubuisson, S. Confidence-Weighted Local Expression Predictions for Occlusion Handling in Expression Recognition and Action Unit Detection. Int J Comput Vis 126, 255–271 (2018). https://doi.org/10.1007/s11263-017-1010-1

Download citation

Received: 11 February 2016
Accepted: 03 April 2017
Published: 08 April 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s11263-017-1010-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Confidence-Weighted Local Expression Predictions for Occlusion Handling in Expression Recognition and Action Unit Detection

Abstract

Access this article

Similar content being viewed by others

Facial Expression Recognition Using Cascaded Random Forest Based on Local Features

Learning facial expression-aware global-to-local representation for robust action unit detection

Self-supervised facial expression recognition with fine-grained feature selection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Confidence-Weighted Local Expression Predictions for Occlusion Handling in Expression Recognition and Action Unit Detection

Abstract

Access this article

Similar content being viewed by others

Facial Expression Recognition Using Cascaded Random Forest Based on Local Features

Learning facial expression-aware global-to-local representation for robust action unit detection

Self-supervised facial expression recognition with fine-grained feature selection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation