Real-Time Head Pose Estimation by Tracking and Detection of Keypoints and Facial Landmarks

Díaz Barros, Jilliam M.; Mirbach, Bruno; Garcia, Frederic; Varanasi, Kiran; Stricker, Didier

doi:10.1007/978-3-030-26756-8_16

Jilliam M. Díaz Barros^15,16,
Bruno Mirbach¹⁷,
Frederic Garcia¹⁷,
Kiran Varanasi¹⁵ &
…
Didier Stricker^15,16

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 997))

Included in the following conference series:

International Joint Conference on Computer Vision, Imaging and Computer Graphics

566 Accesses
3 Citations

Abstract

We introduce a novel fusion framework for real-time head pose estimation using a tailored Kalman Filter. This approach estimates the pose from intensity images in monocular video data. The method is robust to extreme head rotations and varying illumination, with real-time capability. Our framework incorporates the head pose computed from a keypoint-based tracking scheme into the prediction step of the Kalman Filter and the head pose computed from a facial-landmark-based detection scheme into the correction step. The head pose from the tracking scheme is estimated from 2D keypoints tracked in two consecutive frames in the region of the head and their 3D projection on a simple geometric model. In contrast, the head pose from the detection scheme is estimated from 2D facial landmarks detected in each frame and their 3D correspondences retrieved through triangulation. In each scheme, the head pose results from the minimization of the reprojection error from the 3D-2D correspondences. In each iteration, we update the state transition matrix of the filter and subsequently the estimated covariance. We evaluated our approach on a publicly available dataset and compared with related methods of the state of the art. Our approach could achieve similar performance in terms of mean average error, while operating in real time. Furthermore, we tested our method on our own dataset, to evaluate its performance in the presence of large head rotations. We show good results even in cases where facial landmarks are partially occluded.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Xu, X., Kakadiaris, I.A.: Joint head pose estimation and face alignment framework using global and local CNN features. In: 12th International Conference on Automatic Face & Gesture Recognition (FG 2017), vol. 2, pp. 642–649. IEEE, May 2017
Google Scholar
An, K.H., Chung, M.J.: 3D head tracking and pose-robust 2D texture map-based face recognition using a simple ellipsoid model. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 307–312. IEEE (2008)
Google Scholar
Kumano, S., Otsuka, K., Yamato, J., Maeda, E., Sato, Y.: Pose-invariant facial expression recognition using variable-intensity templates. Int. J. Comput. Vis. 83(2), 178–194 (2009). https://doi.org/10.1007/s11263-008-0185-x
Article Google Scholar
Valenti, R., Sebe, N., Gevers, T.: Combining head pose and eye location information for gaze estimation. Trans. Image Process. 21(2), 802–815 (2012)
Article MathSciNet Google Scholar
Valenti, R., Yucel, Z., Gevers, T.: Robustifying eye center localization by head pose cues. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 612–618. IEEE (2009)
Google Scholar
Vicente, F., Huang, Z., Xiong, X., De la Torre, F., Zhang, W., Levi, D.: Driver gaze tracking and eyes off the road detection system. Trans. Intell. Transp. Syst. 16(4), 2014–2027 (2015)
Article Google Scholar
Borghi, G., Venturelli, M., Vezzani, R., Cucchiara, R.: Poseidon: face-from-depth for driver pose estimation. In: International Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2017)
Google Scholar
Mohr, P., Tatzgern, M., Grubert, J., Schmalstieg, D., Kalkofen, D.: Adaptive user perspective rendering for handheld augmented reality. In: Symposium on 3D User Interfaces (3DUI), pp. 176–181. IEEE (2017)
Google Scholar
Fanelli, G., Dantone, M., Gall, J., Fossati, A., Van Gool, L.: Random forests for real time 3D face analysis. Int. J. Comput. Vis. 101(3), 437–458 (2013)
Article Google Scholar
Tan, D.J., Tombari, F., Navab, N.: Real-time accurate 3D head tracking and pose estimation with consumer RGB-D cameras. Int. J. Comput. Vis. 126, 1–26 (2017)
MathSciNet Google Scholar
Diaz Barros, J.M., Garcia, F., Mirbach, B., Varanasi, K., Stricker, D.: Combined framework for real-time head pose estimation using facial landmark detection and salient feature tracking. In: Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP), vol. 5, pp. 123–133. INSTICC, SciTePress (2018)
Google Scholar
Diaz Barros, J.M., Mirbach, B., Garcia, F., Varanasi, K., Stricker, D.: Fusion of keypoint tracking and facial landmark detection for real-time head pose estimation. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2028–2037. IEEE, March 2018
Google Scholar
La Cascia, M., Sclaroff, S., Athitsos, V.: Fast, reliable head tracking under varying illumination: an approach based on registration of texture-mapped 3D models. Trans. Pattern Anal. Mach. Intell. 22(4), 322–336 (2000)
Article Google Scholar
Choi, S., Kim, D.: Robust head tracking using 3D ellipsoidal head model in particle filter. Pattern Recogn. 41(9), 2901–2915 (2008)
Article Google Scholar
Sung, J., Kanade, T., Kim, D.: Pose robust face tracking by combining active appearance models and cylinder head models. Int. J. Comput. Vision 80(2), 260–274 (2008)
Article Google Scholar
Jang, J.S., Kanade, T.: Robust 3D head tracking by online feature registration. In: 8th International Conference on Automatic Face & Gesture Recognition (FG 2008). IEEE (2008)
Google Scholar
Jang, J.S., Kanade, T.: Robust 3D head tracking by view-based feature point registration. People Image Analysis (PIA) Consortium, Carnegie Mellon University, Technical report (2010)
Google Scholar
Asteriadis, S., Karpouzis, K., Kollias, S.: Head pose estimation with one camera, in uncalibrated environments. In: Workshop on Eye Gaze in Intelligent Human Machine Interaction, pp. 55–62. ACM (2010)
Google Scholar
Prasad, B.H., Aravind, R.: A robust head pose estimation system for uncalibrated monocular videos. In: 7th Indian Conference on Computer Vision, Graphics and Image Processing, pp. 162–169. ACM (2010)
Google Scholar
Diaz Barros, J.M., Garcia, F., Mirbach, B., Stricker, D.: Real-time monocular 6-DoF head pose estimation from salient 2D points. In: International Conference on Image Processing (ICIP), pp. 121–125. IEEE, September 2017
Google Scholar
Yin, C., Yang, X.: Real-time head pose estimation for driver assistance system using low-cost on-board computer. In: 15th ACM SIGGRAPH Conference on Virtual-Reality Continuum and Its Applications in Industry, vol. 1, pp. 43–46. ACM (2016)
Google Scholar
Wu, Y., Gou, C., Ji, Q.: Simultaneous facial landmark detection, pose and deformation estimation under facial occlusion (2017)
Google Scholar
Gou, C., Wu, Y., Wang, F.Y., Ji, Q.: Coupled cascade regression for simultaneous facial landmark detection and head pose estimation. In: International Conference on Image Processing (ICIP). IEEE (2017)
Google Scholar
Ahn, B., Park, J., Kweon, I.S.: Real-time head orientation from a monocular camera using deep neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9005, pp. 82–96. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16811-1_6
Chapter Google Scholar
Drouard, V., Ba, S., Evangelidis, G., Deleforge, A., Horaud, R.: Head pose estimation via probabilistic high-dimensional regression. In: International Conference on Image Processing (ICIP), pp. 4624–4628. IEEE (2015)
Google Scholar
Fanelli, G., Gall, J., Van Gool, L.: Real time head pose estimation with random regression forests. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 617–624. IEEE (2011)
Google Scholar
Wang, H., Davoine, F., Lepetit, V., Chaillou, C., Pan, C.: 3D head tracking via invariant keypoint learning. Trans. Circuits Syst. Video Technol. 22(8), 1113–1126 (2012)
Article Google Scholar
Liu, X., Liang, W., Wang, Y., Li, S., Pei, M.: 3D head pose estimation with convolutional neural network trained on synthetic images. In: International Conference on Image Processing (ICIP), pp. 1289–1293. IEEE (2016)
Google Scholar
Tulyakov, S., Vieriu, R.L., Semeniuta, S., Sebe, N.: Robust real-time extreme head pose estimation. In: 22nd International Conference on Pattern Recognition (ICPR), pp. 2263–2268. IEEE (2014)
Google Scholar
Schwarz, A., Haurilet, M., Martinez, M., Stiefelhagen, R.: Driveahead - a large-scale driver head pose dataset. In: International Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE (2017)
Google Scholar
Derkach, D., Ruiz, A., Sukno, F.M.: Head pose estimation based on 3-D facial landmarks localization and regression. In: 12th International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 820–827. IEEE, May 2017
Google Scholar
Meyer, G.P., Gupta, S., Frosio, I., Reddy, D., Kautz, J.: Robust model-based 3D head pose estimation. In: International Conference on Computer Vision (ICCV), pp. 3649–3657. IEEE (2015)
Google Scholar
Yu, Y., Funes Mora, K.A., Odobez, J.M.: Robust and accurate 3D head pose estimation through 3DMM and online head model reconstruction. In: 12th International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 711–718. IEEE, May 2017
Google Scholar
Ghiass, R.S., Arandjelović, O., Laurendeau, D.: Highly accurate and fully automatic head pose estimation from a low quality consumer-level RGB-D sensor. In: 2nd Workshop on Computational Models of Social Interactions: Human-Computer-Media Communication, pp. 25–34. ACM (2015)
Google Scholar
Papazov, C., Marks, T.K., Jones, M.: Real-time 3D head pose and facial landmark estimation from depth images using triangular surface patch features. In: International Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2015)
Google Scholar
Jeni, L.A., Cohn, J.F., Kanade, T.: Dense 3D face alignment from 2D video for real-time use. Image Vis. Comput. 58, 13–24 (2017)
Article Google Scholar
Morency, L., Whitehill, J., Movellan, J.: Generalized adaptive view-based appearance model: integrated framework for monocular head pose estimation. In: 8th International Conference on Automatic Face & Gesture Recognition (FG 2008), pp. 1–8. IEEE (2008)
Google Scholar
Baltrušaitis, T., Robinson, P., Morency, L.P.: 3D constrained local model for rigid and non-rigid facial tracking. In: International Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2012)
Google Scholar
Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable model fitting by regularized landmark mean-shift. Int. J. Comput. Vision 91(2), 200–215 (2011)
Article MathSciNet Google Scholar
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1867–1874. IEEE (2014)
Google Scholar
Makehuman: Open source tool for making 3D characters (2017). http://www.makehumancommunity.org/. Accessed 31 May 2018
Hartley, R.I., Sturm, P.: Triangulation. Comput. Vis. Image Underst. 68(2), 146–157 (1997)
Article Google Scholar
Atkinson, K.E.: An introduction to numerical analysis. Wiley, New York (2008)
Google Scholar
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C: The Art of Scientific Computing, Cambridge (1992)
Google Scholar
Rosten, E., Porter, R., Drummond, T.: FASTER and better: a machine learning approach to corner detection. Trans. Pattern Anal. Mach. Intell. 32, 105–119 (2010)
Article Google Scholar
Bouguet, J.Y.: Pyramidal implementation of the affine Lucas-Kanade feature tracker description of the algorithm. Intel Corporation 5, 1–10 (2001)
Google Scholar
Kun, J., Bok-Suk, S., Reinhard, K.: Novel backprojection method for monocular head pose estimation. Int. J. Fuzzy Logic Intell. Syst. 13(1), 50–58 (2013)
Article Google Scholar
Dodgson, N.A.: Variation and extrema of human interpupillary distance. In: Stereoscopic Displays and Virtual Reality Systems XI, vol. 5291, pp. 36–46. SPIE (2004)
Google Scholar
Gordon, C.C., et al.: Anthropometric survey of U.S. army personnel: methods and summary statistics. In: Technical report 89–044, U.S. Army Natick Research, Development and Engineering Center, Natick, MA (1989)
Google Scholar
Lefevre, S., Odobez, J.M.: Structure and appearance features for robust 3D facial actions tracking. In: International Conference on Multimedia and Expo, pp. 298–301. IEEE, June 2009
Google Scholar
Tran, N.-T., Ababsa, F.-E., Charbit, M., Feldmar, J., Petrovska-Delacrétaz, D., Chollet, G.: 3D face pose and animation tracking via eigen-decomposition based bayesian approach. In: Bebis, G., et al. (eds.) ISVC 2013. LNCS, vol. 8033, pp. 562–571. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41914-0_55
Chapter Google Scholar
German Research Center for Artificial Intelligence (DFKI): Head pose estimation dataset (2018). http://av.dfki.de/publications/real-time-head-pose-estimation-by-tracking-and-detection-of-keypoints-and-facial-landmarks/

Download references

Author information

Authors and Affiliations

Augmented Vision Department, German Research Center for Artificial Intelligence (DFKI), 67663, Kaiserslautern, Germany
Jilliam M. Díaz Barros, Kiran Varanasi & Didier Stricker
Computer Science Department, Technische Universität Kaiserslautern, 67663, Kaiserslautern, Germany
Jilliam M. Díaz Barros & Didier Stricker
IEE S.A., 5326, Contern, Luxembourg
Bruno Mirbach & Frederic Garcia

Authors

Jilliam M. Díaz Barros
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Mirbach
View author publications
You can also search for this author in PubMed Google Scholar
Frederic Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Kiran Varanasi
View author publications
You can also search for this author in PubMed Google Scholar
Didier Stricker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jilliam M. Díaz Barros , Bruno Mirbach , Frederic Garcia , Kiran Varanasi or Didier Stricker .

Editor information

Editors and Affiliations

University of Strasbourg, Strasbourg, France
Dominique Bechmann
University of Genoa, Genoa, Italy
Manuela Chessa
University of Lisbon, Lisbon, Portugal
Ana Paula Cláudio
Research Innovation Center, Apple Inc., San Jose, CA, USA
Francisco Imai
Linnaeus University, Växjö, Kronobergs Län, Sweden
Andreas Kerren
LISA - ISTIA, University of Angers, Angers, France
Paul Richard
University of Groningen, Groningen, The Netherlands
Alexandru Telea
Jean Monnet University, Saint-Etienne, France
Alain Tremeau

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Díaz Barros, J.M., Mirbach, B., Garcia, F., Varanasi, K., Stricker, D. (2019). Real-Time Head Pose Estimation by Tracking and Detection of Keypoints and Facial Landmarks. In: Bechmann, D., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2018. Communications in Computer and Information Science, vol 997. Springer, Cham. https://doi.org/10.1007/978-3-030-26756-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-26756-8_16
Published: 24 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26755-1
Online ISBN: 978-3-030-26756-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics