International Journal of Computer Vision

, Volume 101, Issue 3, pp 437–458 | Cite as

Random Forests for Real Time 3D Face Analysis

  • Gabriele Fanelli
  • Matthias Dantone
  • Juergen Gall
  • Andrea Fossati
  • Luc Van Gool
Article

Abstract

We present a random forest-based framework for real time head pose estimation from depth images and extend it to localize a set of facial features in 3D. Our algorithm takes a voting approach, where each patch extracted from the depth image can directly cast a vote for the head pose or each of the facial features. Our system proves capable of handling large rotations, partial occlusions, and the noisy depth data acquired using commercial sensors. Moreover, the algorithm works on each frame independently and achieves real time performance without resorting to parallel computations on a GPU. We present extensive experiments on publicly available, challenging datasets and present a new annotated head pose database recorded using a Microsoft Kinect.

Keywords

Random forests Head pose estimation 3D facial features detection Real time 

Notes

Acknowledgements

We thank Thibaut Weise for useful code and discussions. We acknowledge financial support from EU projects RADHAR (FP7-ICT-248873) and TANGO (FP7-ICT-249858), and from the SNF project Vision-supported Speech-based Human Machine Interaction (200021-130224).

References

  1. Amberg, B., & Vetter, T. (2011). Optimal landmark detection using shape models and branch and bound slides. In International conference on computer vision. Google Scholar
  2. Balasubramanian, V. N., Ye, J., & Panchanathan, S. (2007). Biased manifold embedding: A framework for person-independent head pose estimation. In IEEE conference on computer vision and pattern recognition. Google Scholar
  3. Belhumeur, P. N., Jacobs, D. W., Kriegman, D. J., & Kumar, N. (2011). Localizing parts of faces using a consensus of exemplars. In IEEE conference on computer vision and pattern recognition. Google Scholar
  4. Besl, P., & McKay, N. (1992). A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2), 239–256. CrossRefGoogle Scholar
  5. Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3d faces. In ACM international conference on computer graphics and interactive techniques (SIGGRAPH) (pp. 187–194). Google Scholar
  6. Breidt, M., Buelthoff, H., & Curio, C. (2011). Robust semantic analysis by synthesis of 3d facial motion. In Automatic face and gesture recognition. Google Scholar
  7. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. MATHCrossRefGoogle Scholar
  8. Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Monterey: Wadsworth and Brooks. MATHGoogle Scholar
  9. Breitenstein, M. D., Jensen, J., Hoilund, C., Moeslund, T. B., & Van Gool, L. (2009). Head pose estimation from passive stereo images. In Scandinavian conference on image analysis. Google Scholar
  10. Breitenstein, M. D., Kuettel, D., Weise, T., Van Gool, L., & Pfister, H. (2008). Real-time face pose estimation from single range images. In IEEE conference on computer vision and pattern recognition. Google Scholar
  11. Cai, Q., Gallup, D., Zhang, C., & Zhang, Z. (2010). 3d deformable face tracking with a commodity depth camera. In European conference on computer vision. Google Scholar
  12. Chang, K. I., Bowyer, K. W., & Flynn, P. J. (2006). Multiple nose region matching for 3d face recognition under varying facial expression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1695–1700. CrossRefGoogle Scholar
  13. Chen, L., Zhang, L., Hu, Y., Li, M., & Zhang, H. (2003). Head pose estimation using fisher manifold learning. In Analysis and modeling of faces and gestures. Google Scholar
  14. Chua, C. S., & Jarvis, R. (1997). Point signatures: A new representation for 3d object recognition. International Journal of Computer Vision, 25, 63–85. CrossRefGoogle Scholar
  15. Colbry, D., Stockman, G., & Jain, A. (2005). Detection of anchor points for 3d face verification. In IEEE conference on computer vision and pattern recognition. Google Scholar
  16. Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 681–685. CrossRefGoogle Scholar
  17. Cootes, T. F., Wheeler, G. V., Walker, K. N., & Taylor, C. J. (2002). View-based active appearance models. Image and Vision Computing, 20(9–10), 657–664. CrossRefGoogle Scholar
  18. Criminisi, A., Shotton, J., & Konukoglu, E. (2011). Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning. Tech. Rep. TR-2011-114, Microsoft Research. Google Scholar
  19. Criminisi, A., Shotton, J., Robertson, D., & Konukoglu, E. (2010). Regression forests for efficient anatomy detection and localization in ct studies. In Recognition techniques and applications in medical imaging. Google Scholar
  20. Cristinacce, D., & Cootes, T. (2008). Automatic feature localisation with constrained local models. Journal of Pattern Recognition, 41(10), 3054–3067. MATHCrossRefGoogle Scholar
  21. Dantone, M., Gall, J., Fanelli, G., & Van Gool, L. (2012). Real-time facial feature detection using conditional regression forests. In IEEE conference on computer vision and pattern recognition. Google Scholar
  22. Dorai, C., & Jain, A. K. (1997). COSMOS—A representation scheme for 3D Free-Form objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(10), 1115–1130. CrossRefGoogle Scholar
  23. Everingham, M., Sivic, J., & Zisserman, A. (2006). Hello! my name is… buffy—automatic naming of characters in tv video. In British machine vision conference. Google Scholar
  24. Fanelli, G., Gall, J., Romsdorfer, H., Weise, T., & Van Gool, L. (2010). A 3-d audio-visual corpus of affective communication. IEEE Transactions on Multimedia, 12(6), 591–598. CrossRefGoogle Scholar
  25. Fanelli, G., Gall, J., & Van Gool, L. (2011a). Real time head pose estimation with random regression forests. In IEEE conference on computer vision and pattern recognition. Google Scholar
  26. Fanelli, G., Weise, T., Gall, J., & Van Gool, L. (2011b). Real time head pose estimation from consumer depth cameras. In German association for pattern recognition. Google Scholar
  27. Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79. CrossRefGoogle Scholar
  28. Gall, J., & Lempitsky, V. (2009). Class-specic hough forests for object detection. In IEEE conference on computer vision and pattern recognition. Google Scholar
  29. Gall, J., Yao, A., Razavi, N., Van Gool, L., & Lempitsky, V. (2011). Hough forests for object detection, tracking, and action recognition. In IEEE transactions on pattern analysis and machine intelligence. Google Scholar
  30. Girshick, R., Shotton, J., Kohli, P., Criminisi, A., & Fitzgibbon, A. (2011). Efficient regression of general-activity human poses from depth images. In International conference on computer vision. Google Scholar
  31. Gross, R., Matthews, I., & Baker, S. (2005). Generic vs. person specific active appearance models. Image and Vision Computing, 23(12), 1080–2093. CrossRefGoogle Scholar
  32. Huang, C., Ding, X., & Fang, C. (2010). Head pose estimation based on random forests for multiclass classification. In International conference on pattern recognition. Google Scholar
  33. Jones, M., & Viola, P. (2003). Fast multi-view face detection. Tech. Rep. TR2003-096, Mitsubishi Electric Research Laboratories. Google Scholar
  34. Ju, Q., O’keefe, S., & Austin, J. (2009). Binary neural network based 3d facial feature localization. In International joint conference on neural networks. Google Scholar
  35. Kakadiaris, I. A., Passalis, G., Toderici, G., Murtuza, M. N., Lu, Y., Karampatziakis, N., & Theoharis, T. (2007). Three-dimensional face recognition in the presence of facial expressions: an annotated deformable model approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(4), 640–649. CrossRefGoogle Scholar
  36. Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77(1–3), 259–289. CrossRefGoogle Scholar
  37. Lepetit, V., Lagger, P., & Fua, P. (2005). Randomized trees for real-time keypoint recognition. In IEEE conference on computer vision and pattern recognition. Google Scholar
  38. Li, H., Adams, B., Guibas, L. J., & Pauly, M. (2009). Robust single-view geometry and motion reconstruction. ACM Transactions on Graphics (Proceedings SIGGRAPH Asia), 28(5). 2009. Google Scholar
  39. Lu, X., & Jain, A. K. (2006). Automatic feature extraction for multiview 3d face recognition. In Automatic face and gesture recognition. Google Scholar
  40. Martins, P., & Batista, J. (2008). Accurate single view model-based head pose estimation. In Automatic face and gesture recognition. Google Scholar
  41. Matthews, I., & Baker, S. (2003). Active appearance models revisited. International Journal of Computer Vision, 60(2), 135–164. CrossRefGoogle Scholar
  42. Mehryar, S., Martin, K., Plataniotis, K., & Stergiopoulos, S. (2010). Automatic landmark detection for 3d face image processing. In Evolutionary computation. Google Scholar
  43. Mian, A., Bennamoun, M., & Owens, R. (2006). Automatic 3d face detection, normalization and recognition. In 3D data processing, visualization, and transmission. Google Scholar
  44. Morency, L. P., Sundberg, P., & Darrell, T. (2003). Pose estimation using 3d view-based eigenspaces. In Automatic face and gesture recognition. Google Scholar
  45. Morency, L. P., Whitehill, J., & Movellan, J. R. (2008). Generalized adaptive view-based appearance model: integrated framework for monocular head pose estimation. In Automatic face and gesture recognition. Google Scholar
  46. Mpiperis, I., Malassiotis, S., & Strintzis, M. (2008). Bilinear models for 3-d face and facial expression recognition. IEEE Transactions on Information Forensics and Security, 3(3), 498–511. CrossRefGoogle Scholar
  47. Murphy-Chutorian, E., & Trivedi, M. (2009). Head pose estimation in computer vision: A survey. Transactions on Pattern Analysis and Machine Intelligence, 31(4), 607–626. CrossRefGoogle Scholar
  48. Nair, P., & Cavallaro, A. (2009). 3-d face detection, landmark localization, and registration using a point distribution model. IEEE Transactions on Multimedia, 11(4), 611–623. CrossRefGoogle Scholar
  49. Okada, R. (2009). Discriminative generalized hough transform for object detection. In International conference on computer vision. Google Scholar
  50. Osadchy, M., Miller, M. L., & LeCun, Y. (2005). Synergistic face detection and pose estimation with energy-based models. In Neural information processing systems. Google Scholar
  51. Papageorgiou, C., Oren, M., & Poggio, T. (1998). A general framework for object detection. In International conference on computer vision. Google Scholar
  52. Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T. (2009). A 3d face model for pose and illumination invariant face recognition. In Advanced video and signal based surveillance. Google Scholar
  53. Ramnath, K., Koterba, S., Xiao, J., Hu, C., Matthews, I., Baker, S., Cohn, J., & Kanade, T. (2008). Multi-view aam fitting and construction. International Journal of Computer Vision, 76(2), 183–204. CrossRefGoogle Scholar
  54. Seemann, E., Nickel, K., & Stiefelhagen, R. (2004). Head pose estimation using stereo vision for human-robot interaction. In Automatic face and gesture recognition. Google Scholar
  55. Segundo, M., Silva, L., Bellon, O., & Queirolo, C. (2010). Automatic face segmentation and facial landmark detection in range images. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 40(5), 1319–1330. CrossRefGoogle Scholar
  56. Sharp, T. (2008). Implementing decision trees and forests on a GPU. In European conference on computer vision. Google Scholar
  57. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In IEEE conference on computer vision and pattern recognition. Google Scholar
  58. Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In IEEE conference on computer vision and pattern recognition. Google Scholar
  59. Storer, M., Urschler, M., & Bischof, H. (2009). 3d-mam: 3d morphable appearance model for efficient fine head pose estimation from still images. In Workshop on subspace methods. Google Scholar
  60. Sun, Y., & Yin, L. (2008). Automatic pose estimation of 3d facial models. In International conference on pattern recognition. Google Scholar
  61. Valstar, M., Martinez, B., Binefa, X., & Pantic, M. (2010). Facial point detection using boosted regression and graph models. In IEEE conference on computer vision and pattern recognition. Google Scholar
  62. Vatahska, T., Bennewitz, M., & Behnke, S. (2007). Feature-based head pose estimation from images. In International conference on humanoid robots. Google Scholar
  63. Viola, P., & Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154. CrossRefGoogle Scholar
  64. Wang, Y., Chua, C., & Ho, Y. (2002). Facial feature detection and face recognition from 2d and 3d images. Pattern Recognition Letters, 10(23), 1191–1202. CrossRefGoogle Scholar
  65. Weise, T., Bouaziz, S., Li, H., & Pauly, M. (2011). Realtime performance-based facial animation. In ACM international conference on computer graphics and interactive techniques (SIGGRAPH). Google Scholar
  66. Weise, T., Leibe, B., & Van Gool, L. (2007). Fast 3d scanning with automatic motion compensation. In IEEE conference on computer vision and pattern recognition. Google Scholar
  67. Weise, T., Li, H., Van Gool, L., & Pauly, M. (2009a). Face/off live facial puppetry. In Symposium on computer animation. Google Scholar
  68. Weise, T., Wismer, T., Leibe, B., & Van Gool, L. (2009b). In-hand scanning with online loop closure. In 3-D digital imaging and modeling. Google Scholar
  69. Whitehill, J., & Movellan, J. R. (2008). A discriminative approach to frame-by-frame head pose tracking. In Automatic face and gesture recognition. Google Scholar
  70. Yao, A., Gall, J., & Van Gool, L. (2010). A hough transform-based voting framework for action recognition. In IEEE conference on computer vision and pattern recognition. Google Scholar
  71. Yin, L., Wei, X., Sun, Y., Wang, J., & Rosato, M. J. (2006). A 3d facial expression database for facial behavior research. In Face and gesture recognition. Google Scholar
  72. Yu, T. H., & Moon, Y. S. (2008). A novel genetic algorithm for 3d facial landmark localization. In Biometrics: theory, applications and systems. Google Scholar
  73. Zhao, X., Dellandréa, E., Chen, L., & Kakadiaris, I. (2011). Accurate landmarking of three-dimensional facial data in the presence of facial expressions and occlusions using a three-dimensional statistical facial feature model. IEEE Transactions on Systems, Man, and Cybernetics, part B: Cybernetics, 41(5), 1417–1428. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Gabriele Fanelli
    • 1
  • Matthias Dantone
    • 1
  • Juergen Gall
    • 2
  • Andrea Fossati
    • 1
  • Luc Van Gool
    • 1
    • 3
  1. 1.Computer Vision LaboratoryETH ZurichZurichSwitzerland
  2. 2.Perceiving Systems DepartmentMax Planck Institute for Intelligent SystemsTübingenGermany
  3. 3.Department of Electrical Engineering/IBBTK.U. LeuvenHeverleeBelgium

Personalised recommendations