Random Forests for Real Time 3D Face Analysis

Fanelli, Gabriele; Dantone, Matthias; Gall, Juergen; Fossati, Andrea; Van Gool, Luc

doi:10.1007/s11263-012-0549-0

Random Forests for Real Time 3D Face Analysis

Published: 01 August 2012

Volume 101, pages 437–458, (2013)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Gabriele Fanelli¹,
Matthias Dantone¹,
Juergen Gall²,
Andrea Fossati¹ &
…
Luc Van Gool^1,3

4507 Accesses
360 Citations
3 Altmetric
Explore all metrics

Abstract

We present a random forest-based framework for real time head pose estimation from depth images and extend it to localize a set of facial features in 3D. Our algorithm takes a voting approach, where each patch extracted from the depth image can directly cast a vote for the head pose or each of the facial features. Our system proves capable of handling large rotations, partial occlusions, and the noisy depth data acquired using commercial sensors. Moreover, the algorithm works on each frame independently and achieves real time performance without resorting to parallel computations on a GPU. We present extensive experiments on publicly available, challenging datasets and present a new annotated head pose database recorded using a Microsoft Kinect.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Head Pose Estimation with Improved Random Regression Forests

Head-Pose Estimation In-the-Wild Using a Random Forest

Keypoint Recognition Using Random Forests and Random Ferns

Notes

Most of the datasets are publicly available at http://www.vision.ee.ethz.ch/datasets.
Because of the proprietary license for Paysan et al. (2009), we cannot share the above database. The PCA model, however, can be obtained from the University of Basel.
We used the source code provided by the authors.
www.vision.ee.ethz.ch/~gfanelli/head_pose/head_forest.html.
Commercially available: http://www.faceshift.com.

References

Amberg, B., & Vetter, T. (2011). Optimal landmark detection using shape models and branch and bound slides. In International conference on computer vision.
Google Scholar
Balasubramanian, V. N., Ye, J., & Panchanathan, S. (2007). Biased manifold embedding: A framework for person-independent head pose estimation. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Belhumeur, P. N., Jacobs, D. W., Kriegman, D. J., & Kumar, N. (2011). Localizing parts of faces using a consensus of exemplars. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Besl, P., & McKay, N. (1992). A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2), 239–256.
Article Google Scholar
Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3d faces. In ACM international conference on computer graphics and interactive techniques (SIGGRAPH) (pp. 187–194).
Google Scholar
Breidt, M., Buelthoff, H., & Curio, C. (2011). Robust semantic analysis by synthesis of 3d facial motion. In Automatic face and gesture recognition.
Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Article MATH Google Scholar
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Monterey: Wadsworth and Brooks.
MATH Google Scholar
Breitenstein, M. D., Jensen, J., Hoilund, C., Moeslund, T. B., & Van Gool, L. (2009). Head pose estimation from passive stereo images. In Scandinavian conference on image analysis.
Google Scholar
Breitenstein, M. D., Kuettel, D., Weise, T., Van Gool, L., & Pfister, H. (2008). Real-time face pose estimation from single range images. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Cai, Q., Gallup, D., Zhang, C., & Zhang, Z. (2010). 3d deformable face tracking with a commodity depth camera. In European conference on computer vision.
Google Scholar
Chang, K. I., Bowyer, K. W., & Flynn, P. J. (2006). Multiple nose region matching for 3d face recognition under varying facial expression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1695–1700.
Article Google Scholar
Chen, L., Zhang, L., Hu, Y., Li, M., & Zhang, H. (2003). Head pose estimation using fisher manifold learning. In Analysis and modeling of faces and gestures.
Google Scholar
Chua, C. S., & Jarvis, R. (1997). Point signatures: A new representation for 3d object recognition. International Journal of Computer Vision, 25, 63–85.
Article Google Scholar
Colbry, D., Stockman, G., & Jain, A. (2005). Detection of anchor points for 3d face verification. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 681–685.
Article Google Scholar
Cootes, T. F., Wheeler, G. V., Walker, K. N., & Taylor, C. J. (2002). View-based active appearance models. Image and Vision Computing, 20(9–10), 657–664.
Article Google Scholar
Criminisi, A., Shotton, J., & Konukoglu, E. (2011). Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning. Tech. Rep. TR-2011-114, Microsoft Research.
Criminisi, A., Shotton, J., Robertson, D., & Konukoglu, E. (2010). Regression forests for efficient anatomy detection and localization in ct studies. In Recognition techniques and applications in medical imaging.
Google Scholar
Cristinacce, D., & Cootes, T. (2008). Automatic feature localisation with constrained local models. Journal of Pattern Recognition, 41(10), 3054–3067.
Article MATH Google Scholar
Dantone, M., Gall, J., Fanelli, G., & Van Gool, L. (2012). Real-time facial feature detection using conditional regression forests. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Dorai, C., & Jain, A. K. (1997). COSMOS—A representation scheme for 3D Free-Form objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(10), 1115–1130.
Article Google Scholar
Everingham, M., Sivic, J., & Zisserman, A. (2006). Hello! my name is… buffy—automatic naming of characters in tv video. In British machine vision conference.
Google Scholar
Fanelli, G., Gall, J., Romsdorfer, H., Weise, T., & Van Gool, L. (2010). A 3-d audio-visual corpus of affective communication. IEEE Transactions on Multimedia, 12(6), 591–598.
Article Google Scholar
Fanelli, G., Gall, J., & Van Gool, L. (2011a). Real time head pose estimation with random regression forests. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Fanelli, G., Weise, T., Gall, J., & Van Gool, L. (2011b). Real time head pose estimation from consumer depth cameras. In German association for pattern recognition.
Google Scholar
Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.
Article Google Scholar
Gall, J., & Lempitsky, V. (2009). Class-specic hough forests for object detection. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Gall, J., Yao, A., Razavi, N., Van Gool, L., & Lempitsky, V. (2011). Hough forests for object detection, tracking, and action recognition. In IEEE transactions on pattern analysis and machine intelligence.
Google Scholar
Girshick, R., Shotton, J., Kohli, P., Criminisi, A., & Fitzgibbon, A. (2011). Efficient regression of general-activity human poses from depth images. In International conference on computer vision.
Google Scholar
Gross, R., Matthews, I., & Baker, S. (2005). Generic vs. person specific active appearance models. Image and Vision Computing, 23(12), 1080–2093.
Article Google Scholar
Huang, C., Ding, X., & Fang, C. (2010). Head pose estimation based on random forests for multiclass classification. In International conference on pattern recognition.
Google Scholar
Jones, M., & Viola, P. (2003). Fast multi-view face detection. Tech. Rep. TR2003-096, Mitsubishi Electric Research Laboratories.
Ju, Q., O’keefe, S., & Austin, J. (2009). Binary neural network based 3d facial feature localization. In International joint conference on neural networks.
Google Scholar
Kakadiaris, I. A., Passalis, G., Toderici, G., Murtuza, M. N., Lu, Y., Karampatziakis, N., & Theoharis, T. (2007). Three-dimensional face recognition in the presence of facial expressions: an annotated deformable model approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(4), 640–649.
Article Google Scholar
Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77(1–3), 259–289.
Article Google Scholar
Lepetit, V., Lagger, P., & Fua, P. (2005). Randomized trees for real-time keypoint recognition. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Li, H., Adams, B., Guibas, L. J., & Pauly, M. (2009). Robust single-view geometry and motion reconstruction. ACM Transactions on Graphics (Proceedings SIGGRAPH Asia), 28(5). 2009.
Lu, X., & Jain, A. K. (2006). Automatic feature extraction for multiview 3d face recognition. In Automatic face and gesture recognition.
Google Scholar
Martins, P., & Batista, J. (2008). Accurate single view model-based head pose estimation. In Automatic face and gesture recognition.
Google Scholar
Matthews, I., & Baker, S. (2003). Active appearance models revisited. International Journal of Computer Vision, 60(2), 135–164.
Article Google Scholar
Mehryar, S., Martin, K., Plataniotis, K., & Stergiopoulos, S. (2010). Automatic landmark detection for 3d face image processing. In Evolutionary computation.
Google Scholar
Mian, A., Bennamoun, M., & Owens, R. (2006). Automatic 3d face detection, normalization and recognition. In 3D data processing, visualization, and transmission.
Google Scholar
Morency, L. P., Sundberg, P., & Darrell, T. (2003). Pose estimation using 3d view-based eigenspaces. In Automatic face and gesture recognition.
Google Scholar
Morency, L. P., Whitehill, J., & Movellan, J. R. (2008). Generalized adaptive view-based appearance model: integrated framework for monocular head pose estimation. In Automatic face and gesture recognition.
Google Scholar
Mpiperis, I., Malassiotis, S., & Strintzis, M. (2008). Bilinear models for 3-d face and facial expression recognition. IEEE Transactions on Information Forensics and Security, 3(3), 498–511.
Article Google Scholar
Murphy-Chutorian, E., & Trivedi, M. (2009). Head pose estimation in computer vision: A survey. Transactions on Pattern Analysis and Machine Intelligence, 31(4), 607–626.
Article Google Scholar
Nair, P., & Cavallaro, A. (2009). 3-d face detection, landmark localization, and registration using a point distribution model. IEEE Transactions on Multimedia, 11(4), 611–623.
Article Google Scholar
Okada, R. (2009). Discriminative generalized hough transform for object detection. In International conference on computer vision.
Google Scholar
Osadchy, M., Miller, M. L., & LeCun, Y. (2005). Synergistic face detection and pose estimation with energy-based models. In Neural information processing systems.
Google Scholar
Papageorgiou, C., Oren, M., & Poggio, T. (1998). A general framework for object detection. In International conference on computer vision.
Google Scholar
Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T. (2009). A 3d face model for pose and illumination invariant face recognition. In Advanced video and signal based surveillance.
Google Scholar
Ramnath, K., Koterba, S., Xiao, J., Hu, C., Matthews, I., Baker, S., Cohn, J., & Kanade, T. (2008). Multi-view aam fitting and construction. International Journal of Computer Vision, 76(2), 183–204.
Article Google Scholar
Seemann, E., Nickel, K., & Stiefelhagen, R. (2004). Head pose estimation using stereo vision for human-robot interaction. In Automatic face and gesture recognition.
Google Scholar
Segundo, M., Silva, L., Bellon, O., & Queirolo, C. (2010). Automatic face segmentation and facial landmark detection in range images. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 40(5), 1319–1330.
Article Google Scholar
Sharp, T. (2008). Implementing decision trees and forests on a GPU. In European conference on computer vision.
Google Scholar
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Storer, M., Urschler, M., & Bischof, H. (2009). 3d-mam: 3d morphable appearance model for efficient fine head pose estimation from still images. In Workshop on subspace methods.
Google Scholar
Sun, Y., & Yin, L. (2008). Automatic pose estimation of 3d facial models. In International conference on pattern recognition.
Google Scholar
Valstar, M., Martinez, B., Binefa, X., & Pantic, M. (2010). Facial point detection using boosted regression and graph models. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Vatahska, T., Bennewitz, M., & Behnke, S. (2007). Feature-based head pose estimation from images. In International conference on humanoid robots.
Google Scholar
Viola, P., & Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.
Article Google Scholar
Wang, Y., Chua, C., & Ho, Y. (2002). Facial feature detection and face recognition from 2d and 3d images. Pattern Recognition Letters, 10(23), 1191–1202.
Article Google Scholar
Weise, T., Bouaziz, S., Li, H., & Pauly, M. (2011). Realtime performance-based facial animation. In ACM international conference on computer graphics and interactive techniques (SIGGRAPH).
Google Scholar
Weise, T., Leibe, B., & Van Gool, L. (2007). Fast 3d scanning with automatic motion compensation. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Weise, T., Li, H., Van Gool, L., & Pauly, M. (2009a). Face/off live facial puppetry. In Symposium on computer animation.
Google Scholar
Weise, T., Wismer, T., Leibe, B., & Van Gool, L. (2009b). In-hand scanning with online loop closure. In 3-D digital imaging and modeling.
Google Scholar
Whitehill, J., & Movellan, J. R. (2008). A discriminative approach to frame-by-frame head pose tracking. In Automatic face and gesture recognition.
Google Scholar
Yao, A., Gall, J., & Van Gool, L. (2010). A hough transform-based voting framework for action recognition. In IEEE conference on computer vision and pattern recognition.
Google Scholar
Yin, L., Wei, X., Sun, Y., Wang, J., & Rosato, M. J. (2006). A 3d facial expression database for facial behavior research. In Face and gesture recognition.
Google Scholar
Yu, T. H., & Moon, Y. S. (2008). A novel genetic algorithm for 3d facial landmark localization. In Biometrics: theory, applications and systems.
Google Scholar
Zhao, X., Dellandréa, E., Chen, L., & Kakadiaris, I. (2011). Accurate landmarking of three-dimensional facial data in the presence of facial expressions and occlusions using a three-dimensional statistical facial feature model. IEEE Transactions on Systems, Man, and Cybernetics, part B: Cybernetics, 41(5), 1417–1428.
Article Google Scholar

Download references

Acknowledgements

We thank Thibaut Weise for useful code and discussions. We acknowledge financial support from EU projects RADHAR (FP7-ICT-248873) and TANGO (FP7-ICT-249858), and from the SNF project Vision-supported Speech-based Human Machine Interaction (200021-130224).

Author information

Authors and Affiliations

Computer Vision Laboratory, ETH Zurich, Sternwartstrasse 7, 8092, Zurich, Switzerland
Gabriele Fanelli, Matthias Dantone, Andrea Fossati & Luc Van Gool
Perceiving Systems Department, Max Planck Institute for Intelligent Systems, Spemannstrasse 41, 72076, Tübingen, Germany
Juergen Gall
Department of Electrical Engineering/IBBT, K.U. Leuven, Kasteelpark Arenberg 10, 3001, Heverlee, Belgium
Luc Van Gool

Authors

Gabriele Fanelli
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Dantone
View author publications
You can also search for this author in PubMed Google Scholar
Juergen Gall
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Fossati
View author publications
You can also search for this author in PubMed Google Scholar
Luc Van Gool
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriele Fanelli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fanelli, G., Dantone, M., Gall, J. et al. Random Forests for Real Time 3D Face Analysis. Int J Comput Vis 101, 437–458 (2013). https://doi.org/10.1007/s11263-012-0549-0

Download citation

Received: 05 December 2011
Accepted: 16 July 2012
Published: 01 August 2012
Issue Date: February 2013
DOI: https://doi.org/10.1007/s11263-012-0549-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Random Forests for Real Time 3D Face Analysis

Abstract

Access this article

Similar content being viewed by others

Head Pose Estimation with Improved Random Regression Forests

Head-Pose Estimation In-the-Wild Using a Random Forest

Keypoint Recognition Using Random Forests and Random Ferns

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Random Forests for Real Time 3D Face Analysis

Abstract

Access this article

Similar content being viewed by others

Head Pose Estimation with Improved Random Regression Forests

Head-Pose Estimation In-the-Wild Using a Random Forest

Keypoint Recognition Using Random Forests and Random Ferns

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation