Real-Time Accurate 3D Head Tracking and Pose Estimation with Consumer RGB-D Cameras

Tan, David Joseph; Tombari, Federico; Navab, Nassir

doi:10.1007/s11263-017-0988-8

Real-Time Accurate 3D Head Tracking and Pose Estimation with Consumer RGB-D Cameras

Published: 02 February 2017

Volume 126, pages 158–183, (2018)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

David Joseph Tan¹,
Federico Tombari^1,2 &
Nassir Navab¹

1990 Accesses
16 Citations
Explore all metrics

Abstract

We demonstrate how 3D head tracking and pose estimation can be effectively and efficiently achieved from noisy RGB-D sequences. Our proposal leverages on a random forest framework, designed to regress the 3D head pose at every frame in a temporal tracking manner. One peculiarity of the algorithm is that it exploits together (1) a generic training dataset of 3D head models, which is learned once offline; and, (2) an online refinement with subject-specific 3D data, which aims for the tracker to withstand slight facial deformations and to adapt its forest to the specific characteristics of an individual subject. The combination of these works allows our algorithm to be robust even under extreme poses, where the user’s face is no longer visible on the image. Finally, we also propose another solution that utilizes a multi-camera system such that the data simultaneously acquired from multiple RGB-D sensors helps the tracker to handle challenging conditions that affect a subset of the cameras. Notably, the proposed multi-camera frameworks yields a real-time performance of approximately 8 ms per frame given six cameras and one CPU core, and scales up linearly to 30 fps with 25 cameras.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accurate and fast 3D head pose estimation with noisy RGBD images

Article 14 August 2017

Head Pose Tracking from RGBD Sensor Based on Direct Motion Estimation

Robust 3D face modeling and tracking from RGB-D images

Article 26 April 2022

References

Asthana, A., Zafeiriou, S., Cheng, S., & Pantic, M. (2014). Incremental face alignment in the wild. In Conference on computer vision and pattern recognition.
Bär, T., Reuter, J. F., & Zöllner, J. M. (2012). Driver head pose and gaze estimation based on multi-template icp 3-d point cloud alignment. In 2012 15th International IEEE conference on intelligent transportation systems (ITSC) pp. 1797–1802. Piscataway:IEEE.
Besl, P. J., & McKay, N. D. (1992). A method for registration of 3-D shapes. In IEEE Transactions on Pattern Analysis and Machine Intelligence (vol. 14, pp. 239–256). doi:10.1109/34.121791.
Bradski, G. (2000). The opencv library. Doctor Dobbs Journal, 25, 120–126.
Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Article MATH Google Scholar
Breitenstein, M. D., Kuettel, D., Weise, T., Van Gool, L., & Pfister, H. (2008). Real-time face pose estimation from single range images. In Conference on computer vision and pattern recognition.
Cai, Q., Gallup, D., Zhang, C., & Zhang, Z. (2010). 3d deformable face tracking with a commodity depth camera. In Computer vision–ECCV 2010 (pp. 229–242). Berlin: Springer.
Chen, Y., & Medioni, G. (1992). Object modelling by registration of multiple range images. Image and Vision Computing, 10(3), 145–155.
Article Google Scholar
Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 681–685.
Article Google Scholar
Dantone, M., Gall, J., Fanelli, G., & Van Gool, L. (2012). Real-time facial feature detection using conditional regression forests. In Conference on computer vision and pattern recognition.
Fanelli, G., Dantone, M., Gall, J., Fossati, A., & Van Gool, L. (2013). Random forests for real time 3d face analysis. International Journal of Computer Vision, 101, 437–458.
Article Google Scholar
Fanelli, G., Gall, J., & Van Gool, L. (2011). Real time head pose estimation with random regression forests. In Conference on computer vision and pattern recognition.
Geng, X., & Xia, Y. (2014). Head pose estimation based on multivariate label distribution. In Conference on computer vision and pattern recognition.
Kan, M., Shan, S., Chang, H., & Chen, X. (2014). Stacked progressive auto-encoders (spae) for face recognition across poses. In Conference on computer vision and pattern recognition.
Kazemi, V., & Sullivan, J. (2014). One millisecond face alignment with an ensemble of regression trees. In Conference on computer vision and pattern recognition.
Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of the IEEE international conference on neural networks (vol. 4, pp. 1942–1948). doi:10.1109/ICNN.1995.488968.
Koterba, S., Baker, S., Matthews, I., Hu, C., Xiao, J., Cohn, J., & Kanade, T. (2005) Multi-view aam fitting and camera calibration. In International conference on computer vision.
Li, S., Ngan, K., Paramesran, R., & Sheng, L. (2015). Real-time head pose tracking with online face template reconstruction. In IEEE transactions on pattern analysis and machine intelligence.
Martin, M., Van De Camp, F., & Stiefelhagen, R. (2014). Real time head model creation and head pose estimation on consumer depth cameras. In 2014 2nd international conference on 3D vision (3DV) (vol. 1, pp. 641–648). Piscataway: IEEE.
Meyer, G. P., Gupta, S., Frosio, I., Reddy, D., & Kautz, J. (2015). Robust model-based 3d head pose estimation. In International conference on computer vision.
Murphy-Chutorian, E., & Trivedi, M. M. (2009). Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 607–626.
Article Google Scholar
Padeleris, P., Zabulis, X., & Argyros, A. A. (2012). Head pose estimation on depth data based on particle swarm optimization. In 2012 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW) (pp. 42–49). Piscataway: IEEE.
Papazov, C., Marks, T. K., & Jones, M. (2015). Real-time 3d head pose and facial landmark estimation from depth images using triangular surface patch features. In Conference on computer vision and pattern recognition (pp. 4722–4730).
Redondo-Cabrera, C., Lopez-Sastre, R., & Tuytelaars, T. (2014). All together now: Simultaneous detection and continuous pose estimation using a hough forest with probabilistic locally enhanced voting. In British machine vision conference.
Rekik, A., Ben-Hamadou, A., & Mahdi, W. (2013). 3d face pose tracking using low quality depth cameras. VISAPP, 2, 223–228.
Google Scholar
Riegler, G., Ferstl, D., Rüther, M., & Bischof, H. (2014). Hough networks for head pose estimation and facial feature localization. In British machine vision conference.
Sadourny, R., Arakawa, A., & Mintz, Y. (1968). Integration of the nondivergent barotropic vorticity equation with an icosahedral-hexagonal grid for the sphere 1. Monthly Weather Review, 96, 351.
Schulter, S., Leistner, C., Wohlhart, P., Roth, P. M., & Bischof, H. (2013). Alternating regression forests for object detection and pose estimation. In International conference on computer vision.
Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In Conference on computer vision and pattern recognition.
Tan, D. J., & Ilic, S. (2014). Multi-forest tracker: A chameleon in tracking. In Conference on computer vision and pattern recognition.
Tan, D. J., Tombari, F., Ilic, S., & Navab, N. (2015). A versatile learning-based 3d temporal tracker: Scalable, robust, online. In International conference on computer vision.
Tan, D. J., Tombari, F., & Navab, N. (2015). A combined generalized and subject-specific 3d head pose estimation. In International conference on 3D vision (pp. 500–508). Piscataway: IEEE.
Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57, 137–154.
Article Google Scholar
Weise, T., Leibe, B., & Van Gool, L. (2007). Fast 3d scanning with automatic motion compensation. In Conference on computer vision and pattern recognition (pp. 1–8). Piscataway: IEEE.

Download references

Author information

Authors and Affiliations

Technische Universität München, Munich, Germany
David Joseph Tan, Federico Tombari & Nassir Navab
Università di Bologna, Bologna, Italy
Federico Tombari

Authors

David Joseph Tan
View author publications
You can also search for this author in PubMed Google Scholar
Federico Tombari
View author publications
You can also search for this author in PubMed Google Scholar
Nassir Navab
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Joseph Tan.

Additional information

Communicated by Michael S. Brown, Cordelia Schmid .

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tan, D.J., Tombari, F. & Navab, N. Real-Time Accurate 3D Head Tracking and Pose Estimation with Consumer RGB-D Cameras. Int J Comput Vis 126, 158–183 (2018). https://doi.org/10.1007/s11263-017-0988-8

Download citation

Received: 15 March 2016
Accepted: 03 January 2017
Published: 02 February 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s11263-017-0988-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-Time Accurate 3D Head Tracking and Pose Estimation with Consumer RGB-D Cameras

Abstract

Access this article

Similar content being viewed by others

Accurate and fast 3D head pose estimation with noisy RGBD images

Head Pose Tracking from RGBD Sensor Based on Direct Motion Estimation

Robust 3D face modeling and tracking from RGB-D images

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-Time Accurate 3D Head Tracking and Pose Estimation with Consumer RGB-D Cameras

Abstract

Access this article

Similar content being viewed by others

Accurate and fast 3D head pose estimation with noisy RGBD images

Head Pose Tracking from RGBD Sensor Based on Direct Motion Estimation

Robust 3D face modeling and tracking from RGB-D images

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation