Skip to main content
Log in

Real-Time Accurate 3D Head Tracking and Pose Estimation with Consumer RGB-D Cameras

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We demonstrate how 3D head tracking and pose estimation can be effectively and efficiently achieved from noisy RGB-D sequences. Our proposal leverages on a random forest framework, designed to regress the 3D head pose at every frame in a temporal tracking manner. One peculiarity of the algorithm is that it exploits together (1) a generic training dataset of 3D head models, which is learned once offline; and, (2) an online refinement with subject-specific 3D data, which aims for the tracker to withstand slight facial deformations and to adapt its forest to the specific characteristics of an individual subject. The combination of these works allows our algorithm to be robust even under extreme poses, where the user’s face is no longer visible on the image. Finally, we also propose another solution that utilizes a multi-camera system such that the data simultaneously acquired from multiple RGB-D sensors helps the tracker to handle challenging conditions that affect a subset of the cameras. Notably, the proposed multi-camera frameworks yields a real-time performance of approximately 8 ms per frame given six cameras and one CPU core, and scales up linearly to 30 fps with 25 cameras.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  • Asthana, A., Zafeiriou, S., Cheng, S., & Pantic, M. (2014). Incremental face alignment in the wild. In Conference on computer vision and pattern recognition.

  • Bär, T., Reuter, J. F., & Zöllner, J. M. (2012). Driver head pose and gaze estimation based on multi-template icp 3-d point cloud alignment. In 2012 15th International IEEE conference on intelligent transportation systems (ITSC) pp. 1797–1802. Piscataway:IEEE.

  • Besl, P. J., & McKay, N. D. (1992). A method for registration of 3-D shapes. In IEEE Transactions on Pattern Analysis and Machine Intelligence (vol. 14, pp. 239–256). doi:10.1109/34.121791.

  • Bradski, G. (2000). The opencv library. Doctor Dobbs Journal, 25, 120–126.

    Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  MATH  Google Scholar 

  • Breitenstein, M. D., Kuettel, D., Weise, T., Van Gool, L., & Pfister, H. (2008). Real-time face pose estimation from single range images. In Conference on computer vision and pattern recognition.

  • Cai, Q., Gallup, D., Zhang, C., & Zhang, Z. (2010). 3d deformable face tracking with a commodity depth camera. In Computer vision–ECCV 2010 (pp. 229–242). Berlin: Springer.

  • Chen, Y., & Medioni, G. (1992). Object modelling by registration of multiple range images. Image and Vision Computing, 10(3), 145–155.

    Article  Google Scholar 

  • Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 681–685.

    Article  Google Scholar 

  • Dantone, M., Gall, J., Fanelli, G., & Van Gool, L. (2012). Real-time facial feature detection using conditional regression forests. In Conference on computer vision and pattern recognition.

  • Fanelli, G., Dantone, M., Gall, J., Fossati, A., & Van Gool, L. (2013). Random forests for real time 3d face analysis. International Journal of Computer Vision, 101, 437–458.

    Article  Google Scholar 

  • Fanelli, G., Gall, J., & Van Gool, L. (2011). Real time head pose estimation with random regression forests. In Conference on computer vision and pattern recognition.

  • Geng, X., & Xia, Y. (2014). Head pose estimation based on multivariate label distribution. In Conference on computer vision and pattern recognition.

  • Kan, M., Shan, S., Chang, H., & Chen, X. (2014). Stacked progressive auto-encoders (spae) for face recognition across poses. In Conference on computer vision and pattern recognition.

  • Kazemi, V., & Sullivan, J. (2014). One millisecond face alignment with an ensemble of regression trees. In Conference on computer vision and pattern recognition.

  • Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of the IEEE international conference on neural networks (vol. 4, pp. 1942–1948). doi:10.1109/ICNN.1995.488968.

  • Koterba, S., Baker, S., Matthews, I., Hu, C., Xiao, J., Cohn, J., & Kanade, T. (2005) Multi-view aam fitting and camera calibration. In International conference on computer vision.

  • Li, S., Ngan, K., Paramesran, R., & Sheng, L. (2015). Real-time head pose tracking with online face template reconstruction. In IEEE transactions on pattern analysis and machine intelligence.

  • Martin, M., Van De Camp, F., & Stiefelhagen, R. (2014). Real time head model creation and head pose estimation on consumer depth cameras. In 2014 2nd international conference on 3D vision (3DV) (vol. 1, pp. 641–648). Piscataway: IEEE.

  • Meyer, G. P., Gupta, S., Frosio, I., Reddy, D., & Kautz, J. (2015). Robust model-based 3d head pose estimation. In International conference on computer vision.

  • Murphy-Chutorian, E., & Trivedi, M. M. (2009). Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 607–626.

    Article  Google Scholar 

  • Padeleris, P., Zabulis, X., & Argyros, A. A. (2012). Head pose estimation on depth data based on particle swarm optimization. In 2012 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW) (pp. 42–49). Piscataway: IEEE.

  • Papazov, C., Marks, T. K., & Jones, M. (2015). Real-time 3d head pose and facial landmark estimation from depth images using triangular surface patch features. In Conference on computer vision and pattern recognition (pp. 4722–4730).

  • Redondo-Cabrera, C., Lopez-Sastre, R., & Tuytelaars, T. (2014). All together now: Simultaneous detection and continuous pose estimation using a hough forest with probabilistic locally enhanced voting. In British machine vision conference.

  • Rekik, A., Ben-Hamadou, A., & Mahdi, W. (2013). 3d face pose tracking using low quality depth cameras. VISAPP, 2, 223–228.

    Google Scholar 

  • Riegler, G., Ferstl, D., Rüther, M., & Bischof, H. (2014). Hough networks for head pose estimation and facial feature localization. In British machine vision conference.

  • Sadourny, R., Arakawa, A., & Mintz, Y. (1968). Integration of the nondivergent barotropic vorticity equation with an icosahedral-hexagonal grid for the sphere 1. Monthly Weather Review, 96, 351.

  • Schulter, S., Leistner, C., Wohlhart, P., Roth, P. M., & Bischof, H. (2013). Alternating regression forests for object detection and pose estimation. In International conference on computer vision.

  • Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In Conference on computer vision and pattern recognition.

  • Tan, D. J., & Ilic, S. (2014). Multi-forest tracker: A chameleon in tracking. In Conference on computer vision and pattern recognition.

  • Tan, D. J., Tombari, F., Ilic, S., & Navab, N. (2015). A versatile learning-based 3d temporal tracker: Scalable, robust, online. In International conference on computer vision.

  • Tan, D. J., Tombari, F., & Navab, N. (2015). A combined generalized and subject-specific 3d head pose estimation. In International conference on 3D vision (pp. 500–508). Piscataway: IEEE.

  • Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57, 137–154.

    Article  Google Scholar 

  • Weise, T., Leibe, B., & Van Gool, L. (2007). Fast 3d scanning with automatic motion compensation. In Conference on computer vision and pattern recognition (pp. 1–8). Piscataway: IEEE.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Joseph Tan.

Additional information

Communicated by Michael S. Brown, Cordelia Schmid .

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tan, D.J., Tombari, F. & Navab, N. Real-Time Accurate 3D Head Tracking and Pose Estimation with Consumer RGB-D Cameras. Int J Comput Vis 126, 158–183 (2018). https://doi.org/10.1007/s11263-017-0988-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-017-0988-8

Keywords

Navigation