Skip to main content
Log in

Random Forests for Real Time 3D Face Analysis

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We present a random forest-based framework for real time head pose estimation from depth images and extend it to localize a set of facial features in 3D. Our algorithm takes a voting approach, where each patch extracted from the depth image can directly cast a vote for the head pose or each of the facial features. Our system proves capable of handling large rotations, partial occlusions, and the noisy depth data acquired using commercial sensors. Moreover, the algorithm works on each frame independently and achieves real time performance without resorting to parallel computations on a GPU. We present extensive experiments on publicly available, challenging datasets and present a new annotated head pose database recorded using a Microsoft Kinect.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 9
Fig. 7
Fig. 8
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
Fig. 34

Similar content being viewed by others

Notes

  1. Most of the datasets are publicly available at http://www.vision.ee.ethz.ch/datasets.

  2. Because of the proprietary license for Paysan et al. (2009), we cannot share the above database. The PCA model, however, can be obtained from the University of Basel.

  3. We used the source code provided by the authors.

  4. www.vision.ee.ethz.ch/~gfanelli/head_pose/head_forest.html.

  5. Commercially available: http://www.faceshift.com.

References

  • Amberg, B., & Vetter, T. (2011). Optimal landmark detection using shape models and branch and bound slides. In International conference on computer vision.

    Google Scholar 

  • Balasubramanian, V. N., Ye, J., & Panchanathan, S. (2007). Biased manifold embedding: A framework for person-independent head pose estimation. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Belhumeur, P. N., Jacobs, D. W., Kriegman, D. J., & Kumar, N. (2011). Localizing parts of faces using a consensus of exemplars. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Besl, P., & McKay, N. (1992). A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2), 239–256.

    Article  Google Scholar 

  • Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3d faces. In ACM international conference on computer graphics and interactive techniques (SIGGRAPH) (pp. 187–194).

    Google Scholar 

  • Breidt, M., Buelthoff, H., & Curio, C. (2011). Robust semantic analysis by synthesis of 3d facial motion. In Automatic face and gesture recognition.

    Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  MATH  Google Scholar 

  • Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Monterey: Wadsworth and Brooks.

    MATH  Google Scholar 

  • Breitenstein, M. D., Jensen, J., Hoilund, C., Moeslund, T. B., & Van Gool, L. (2009). Head pose estimation from passive stereo images. In Scandinavian conference on image analysis.

    Google Scholar 

  • Breitenstein, M. D., Kuettel, D., Weise, T., Van Gool, L., & Pfister, H. (2008). Real-time face pose estimation from single range images. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Cai, Q., Gallup, D., Zhang, C., & Zhang, Z. (2010). 3d deformable face tracking with a commodity depth camera. In European conference on computer vision.

    Google Scholar 

  • Chang, K. I., Bowyer, K. W., & Flynn, P. J. (2006). Multiple nose region matching for 3d face recognition under varying facial expression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1695–1700.

    Article  Google Scholar 

  • Chen, L., Zhang, L., Hu, Y., Li, M., & Zhang, H. (2003). Head pose estimation using fisher manifold learning. In Analysis and modeling of faces and gestures.

    Google Scholar 

  • Chua, C. S., & Jarvis, R. (1997). Point signatures: A new representation for 3d object recognition. International Journal of Computer Vision, 25, 63–85.

    Article  Google Scholar 

  • Colbry, D., Stockman, G., & Jain, A. (2005). Detection of anchor points for 3d face verification. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 681–685.

    Article  Google Scholar 

  • Cootes, T. F., Wheeler, G. V., Walker, K. N., & Taylor, C. J. (2002). View-based active appearance models. Image and Vision Computing, 20(9–10), 657–664.

    Article  Google Scholar 

  • Criminisi, A., Shotton, J., & Konukoglu, E. (2011). Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning. Tech. Rep. TR-2011-114, Microsoft Research.

  • Criminisi, A., Shotton, J., Robertson, D., & Konukoglu, E. (2010). Regression forests for efficient anatomy detection and localization in ct studies. In Recognition techniques and applications in medical imaging.

    Google Scholar 

  • Cristinacce, D., & Cootes, T. (2008). Automatic feature localisation with constrained local models. Journal of Pattern Recognition, 41(10), 3054–3067.

    Article  MATH  Google Scholar 

  • Dantone, M., Gall, J., Fanelli, G., & Van Gool, L. (2012). Real-time facial feature detection using conditional regression forests. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Dorai, C., & Jain, A. K. (1997). COSMOS—A representation scheme for 3D Free-Form objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(10), 1115–1130.

    Article  Google Scholar 

  • Everingham, M., Sivic, J., & Zisserman, A. (2006). Hello! my name is… buffy—automatic naming of characters in tv video. In British machine vision conference.

    Google Scholar 

  • Fanelli, G., Gall, J., Romsdorfer, H., Weise, T., & Van Gool, L. (2010). A 3-d audio-visual corpus of affective communication. IEEE Transactions on Multimedia, 12(6), 591–598.

    Article  Google Scholar 

  • Fanelli, G., Gall, J., & Van Gool, L. (2011a). Real time head pose estimation with random regression forests. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Fanelli, G., Weise, T., Gall, J., & Van Gool, L. (2011b). Real time head pose estimation from consumer depth cameras. In German association for pattern recognition.

    Google Scholar 

  • Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.

    Article  Google Scholar 

  • Gall, J., & Lempitsky, V. (2009). Class-specic hough forests for object detection. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Gall, J., Yao, A., Razavi, N., Van Gool, L., & Lempitsky, V. (2011). Hough forests for object detection, tracking, and action recognition. In IEEE transactions on pattern analysis and machine intelligence.

    Google Scholar 

  • Girshick, R., Shotton, J., Kohli, P., Criminisi, A., & Fitzgibbon, A. (2011). Efficient regression of general-activity human poses from depth images. In International conference on computer vision.

    Google Scholar 

  • Gross, R., Matthews, I., & Baker, S. (2005). Generic vs. person specific active appearance models. Image and Vision Computing, 23(12), 1080–2093.

    Article  Google Scholar 

  • Huang, C., Ding, X., & Fang, C. (2010). Head pose estimation based on random forests for multiclass classification. In International conference on pattern recognition.

    Google Scholar 

  • Jones, M., & Viola, P. (2003). Fast multi-view face detection. Tech. Rep. TR2003-096, Mitsubishi Electric Research Laboratories.

  • Ju, Q., O’keefe, S., & Austin, J. (2009). Binary neural network based 3d facial feature localization. In International joint conference on neural networks.

    Google Scholar 

  • Kakadiaris, I. A., Passalis, G., Toderici, G., Murtuza, M. N., Lu, Y., Karampatziakis, N., & Theoharis, T. (2007). Three-dimensional face recognition in the presence of facial expressions: an annotated deformable model approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(4), 640–649.

    Article  Google Scholar 

  • Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77(1–3), 259–289.

    Article  Google Scholar 

  • Lepetit, V., Lagger, P., & Fua, P. (2005). Randomized trees for real-time keypoint recognition. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Li, H., Adams, B., Guibas, L. J., & Pauly, M. (2009). Robust single-view geometry and motion reconstruction. ACM Transactions on Graphics (Proceedings SIGGRAPH Asia), 28(5). 2009.

  • Lu, X., & Jain, A. K. (2006). Automatic feature extraction for multiview 3d face recognition. In Automatic face and gesture recognition.

    Google Scholar 

  • Martins, P., & Batista, J. (2008). Accurate single view model-based head pose estimation. In Automatic face and gesture recognition.

    Google Scholar 

  • Matthews, I., & Baker, S. (2003). Active appearance models revisited. International Journal of Computer Vision, 60(2), 135–164.

    Article  Google Scholar 

  • Mehryar, S., Martin, K., Plataniotis, K., & Stergiopoulos, S. (2010). Automatic landmark detection for 3d face image processing. In Evolutionary computation.

    Google Scholar 

  • Mian, A., Bennamoun, M., & Owens, R. (2006). Automatic 3d face detection, normalization and recognition. In 3D data processing, visualization, and transmission.

    Google Scholar 

  • Morency, L. P., Sundberg, P., & Darrell, T. (2003). Pose estimation using 3d view-based eigenspaces. In Automatic face and gesture recognition.

    Google Scholar 

  • Morency, L. P., Whitehill, J., & Movellan, J. R. (2008). Generalized adaptive view-based appearance model: integrated framework for monocular head pose estimation. In Automatic face and gesture recognition.

    Google Scholar 

  • Mpiperis, I., Malassiotis, S., & Strintzis, M. (2008). Bilinear models for 3-d face and facial expression recognition. IEEE Transactions on Information Forensics and Security, 3(3), 498–511.

    Article  Google Scholar 

  • Murphy-Chutorian, E., & Trivedi, M. (2009). Head pose estimation in computer vision: A survey. Transactions on Pattern Analysis and Machine Intelligence, 31(4), 607–626.

    Article  Google Scholar 

  • Nair, P., & Cavallaro, A. (2009). 3-d face detection, landmark localization, and registration using a point distribution model. IEEE Transactions on Multimedia, 11(4), 611–623.

    Article  Google Scholar 

  • Okada, R. (2009). Discriminative generalized hough transform for object detection. In International conference on computer vision.

    Google Scholar 

  • Osadchy, M., Miller, M. L., & LeCun, Y. (2005). Synergistic face detection and pose estimation with energy-based models. In Neural information processing systems.

    Google Scholar 

  • Papageorgiou, C., Oren, M., & Poggio, T. (1998). A general framework for object detection. In International conference on computer vision.

    Google Scholar 

  • Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T. (2009). A 3d face model for pose and illumination invariant face recognition. In Advanced video and signal based surveillance.

    Google Scholar 

  • Ramnath, K., Koterba, S., Xiao, J., Hu, C., Matthews, I., Baker, S., Cohn, J., & Kanade, T. (2008). Multi-view aam fitting and construction. International Journal of Computer Vision, 76(2), 183–204.

    Article  Google Scholar 

  • Seemann, E., Nickel, K., & Stiefelhagen, R. (2004). Head pose estimation using stereo vision for human-robot interaction. In Automatic face and gesture recognition.

    Google Scholar 

  • Segundo, M., Silva, L., Bellon, O., & Queirolo, C. (2010). Automatic face segmentation and facial landmark detection in range images. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 40(5), 1319–1330.

    Article  Google Scholar 

  • Sharp, T. (2008). Implementing decision trees and forests on a GPU. In European conference on computer vision.

    Google Scholar 

  • Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Storer, M., Urschler, M., & Bischof, H. (2009). 3d-mam: 3d morphable appearance model for efficient fine head pose estimation from still images. In Workshop on subspace methods.

    Google Scholar 

  • Sun, Y., & Yin, L. (2008). Automatic pose estimation of 3d facial models. In International conference on pattern recognition.

    Google Scholar 

  • Valstar, M., Martinez, B., Binefa, X., & Pantic, M. (2010). Facial point detection using boosted regression and graph models. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Vatahska, T., Bennewitz, M., & Behnke, S. (2007). Feature-based head pose estimation from images. In International conference on humanoid robots.

    Google Scholar 

  • Viola, P., & Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.

    Article  Google Scholar 

  • Wang, Y., Chua, C., & Ho, Y. (2002). Facial feature detection and face recognition from 2d and 3d images. Pattern Recognition Letters, 10(23), 1191–1202.

    Article  Google Scholar 

  • Weise, T., Bouaziz, S., Li, H., & Pauly, M. (2011). Realtime performance-based facial animation. In ACM international conference on computer graphics and interactive techniques (SIGGRAPH).

    Google Scholar 

  • Weise, T., Leibe, B., & Van Gool, L. (2007). Fast 3d scanning with automatic motion compensation. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Weise, T., Li, H., Van Gool, L., & Pauly, M. (2009a). Face/off live facial puppetry. In Symposium on computer animation.

    Google Scholar 

  • Weise, T., Wismer, T., Leibe, B., & Van Gool, L. (2009b). In-hand scanning with online loop closure. In 3-D digital imaging and modeling.

    Google Scholar 

  • Whitehill, J., & Movellan, J. R. (2008). A discriminative approach to frame-by-frame head pose tracking. In Automatic face and gesture recognition.

    Google Scholar 

  • Yao, A., Gall, J., & Van Gool, L. (2010). A hough transform-based voting framework for action recognition. In IEEE conference on computer vision and pattern recognition.

    Google Scholar 

  • Yin, L., Wei, X., Sun, Y., Wang, J., & Rosato, M. J. (2006). A 3d facial expression database for facial behavior research. In Face and gesture recognition.

    Google Scholar 

  • Yu, T. H., & Moon, Y. S. (2008). A novel genetic algorithm for 3d facial landmark localization. In Biometrics: theory, applications and systems.

    Google Scholar 

  • Zhao, X., Dellandréa, E., Chen, L., & Kakadiaris, I. (2011). Accurate landmarking of three-dimensional facial data in the presence of facial expressions and occlusions using a three-dimensional statistical facial feature model. IEEE Transactions on Systems, Man, and Cybernetics, part B: Cybernetics, 41(5), 1417–1428.

    Article  Google Scholar 

Download references

Acknowledgements

We thank Thibaut Weise for useful code and discussions. We acknowledge financial support from EU projects RADHAR (FP7-ICT-248873) and TANGO (FP7-ICT-249858), and from the SNF project Vision-supported Speech-based Human Machine Interaction (200021-130224).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriele Fanelli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fanelli, G., Dantone, M., Gall, J. et al. Random Forests for Real Time 3D Face Analysis. Int J Comput Vis 101, 437–458 (2013). https://doi.org/10.1007/s11263-012-0549-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-012-0549-0

Keywords

Navigation