Real Time Head Pose Estimation from Consumer Depth Cameras

  • Gabriele Fanelli
  • Thibaut Weise
  • Juergen Gall
  • Luc Van Gool
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6835)


We present a system for estimating location and orientation of a person’s head, from depth data acquired by a low quality device. Our approach is based on discriminative random regression forests: ensembles of random trees trained by splitting each node so as to simultaneously reduce the entropy of the class labels distribution and the variance of the head position and orientation. We evaluate three different approaches to jointly take classification and regression performance into account during training. For evaluation, we acquired a new dataset and propose a method for its automatic annotation.


Random Forest Depth Image Multivariate Gaussian Distribution Angle Error Active Appearance Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Besl, P.J., McKay, N.D.: A method for registration of 3-d shapes. IEEE TPAMI 14(2), 239–256 (1992)CrossRefGoogle Scholar
  2. 2.
    Blanz, V., Vetter, T.: A morphable model for the synthesis of 3d faces. In: SIGGRAPH 1999, pp. 187–194 (1999)Google Scholar
  3. 3.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks, Monterey (1984)zbMATHGoogle Scholar
  4. 4.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)zbMATHCrossRefGoogle Scholar
  5. 5.
    Breitenstein, M.D., Kuettel, D., Weise, T., Van Gool, L., Pfister, H.: Real-time face pose estimation from single range images. In: CVPR, pp. 1–8 (2008)Google Scholar
  6. 6.
    Cai, Q., Gallup, D., Zhang, C., Zhang, Z.: 3d deformable face tracking with a commodity depth camera. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 229–242. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Cootes, T.F., Wheeler, G.V., Walker, K.N., Taylor, C.J.: View-based active appearance models. Image and Vision Computing 20(9-10), 657–664 (2002)CrossRefGoogle Scholar
  8. 8.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE TPAMI 23, 681–685 (2001)CrossRefGoogle Scholar
  9. 9.
    Criminisi, A., Shotton, J., Robertson, D., Konukoglu, E.: Regression forests for efficient anatomy detection and localization in ct studies. In: Recognition Techniques and Applications in Medical Imaging, pp. 106–117 (2010)Google Scholar
  10. 10.
    Fanelli, G., Gall, J., Van Gool, L.: Real time head pose estimation with random regression forests. In: CVPR, pp. 617–624 (2011)Google Scholar
  11. 11.
    Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. IEEE TPAMI (2011)Google Scholar
  12. 12.
    Huang, C., Ding, X., Fang, C.: Head pose estimation based on random forests for multiclass classification. In: Ünay, D., Çataltepe, Z., Aksoy, S. (eds.) ICPR 2010. LNCS, vol. 6388, pp. 934–937. Springer, Heidelberg (2010)Google Scholar
  13. 13.
    Jones, M., Viola, P.: Fast multi-view face detection. Tech. Rep. TR2003-096, Mitsubishi Electric Research Laboratories (2003)Google Scholar
  14. 14.
    Lepetit, V., Fua, P.: Keypoint recognition using randomized trees. IEEE TPAMI 28, 1465–1479 (2006)CrossRefGoogle Scholar
  15. 15.
    Li, H., Adams, B., Guibas, L.J., Pauly, M.: Robust single-view geometry and motion reconstruction. ACM Trans. Graph. 28(5) (2009)Google Scholar
  16. 16.
    Matsumoto, Y., Zelinsky, A.: An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement. In: Aut. Face and Gesture Rec., pp. 499–504 (2000)Google Scholar
  17. 17.
    Morency, L.-P., Sundberg, P., Darrell, T.: Pose estimation using 3d view-based eigenspaces. In: Aut. Face and Gesture Rec., pp. 45–52 (2003)Google Scholar
  18. 18.
    Okada, R.: Discriminative generalized hough transform for object dectection. In: ICCV, pp. 2000–2005 (2009)Google Scholar
  19. 19.
    Ramnath, K., Koterba, S., Xiao, J., Hu, C., Matthews, I., Baker, S., Cohn, J., Kanade, T.: Multi-view aam fitting and construction. IJCV 76, 183–204 (2008)CrossRefGoogle Scholar
  20. 20.
    Seemann, E., Nickel, K., Stiefelhagen, R.: Head pose estimation using stereo vision for human-robot interaction. Aut. Face and Gesture Rec., 626–631 (2004)Google Scholar
  21. 21.
    Viola, P., Jones, M.: Robust real-time face detection. IJCV 57(2), 137–154 (2004)CrossRefGoogle Scholar
  22. 22.
    Weise, T., Leibe, B., Van Gool, L.: Fast 3d scanning with automatic motion compensation. In: CVPR, pp. 1–8 (2007)Google Scholar
  23. 23.
    Weise, T., Wismer, T., Leibe, B., Van Gool, L.: In-hand scanning with online loop closure. In: 3DIM 2009, pp. 1630–1637 (2009)Google Scholar
  24. 24.
    Yang, R., Zhang, Z.: Model-based head pose tracking with stereovision. In: Aut. Face and Gesture Rec., pp. 255–260 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Gabriele Fanelli
    • 1
  • Thibaut Weise
    • 2
  • Juergen Gall
    • 1
  • Luc Van Gool
    • 1
    • 3
  1. 1.ETH ZurichSwitzerland
  2. 2.EPFL LausanneSwitzerland
  3. 3.KU LeuvenBelgium

Personalised recommendations