Skip to main content

Real-Time Head Pose Estimation by Tracking and Detection of Keypoints and Facial Landmarks

  • Conference paper
  • First Online:
Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2018)

Abstract

We introduce a novel fusion framework for real-time head pose estimation using a tailored Kalman Filter. This approach estimates the pose from intensity images in monocular video data. The method is robust to extreme head rotations and varying illumination, with real-time capability. Our framework incorporates the head pose computed from a keypoint-based tracking scheme into the prediction step of the Kalman Filter and the head pose computed from a facial-landmark-based detection scheme into the correction step. The head pose from the tracking scheme is estimated from 2D keypoints tracked in two consecutive frames in the region of the head and their 3D projection on a simple geometric model. In contrast, the head pose from the detection scheme is estimated from 2D facial landmarks detected in each frame and their 3D correspondences retrieved through triangulation. In each scheme, the head pose results from the minimization of the reprojection error from the 3D-2D correspondences. In each iteration, we update the state transition matrix of the filter and subsequently the estimated covariance. We evaluated our approach on a publicly available dataset and compared with related methods of the state of the art. Our approach could achieve similar performance in terms of mean average error, while operating in real time. Furthermore, we tested our method on our own dataset, to evaluate its performance in the presence of large head rotations. We show good results even in cases where facial landmarks are partially occluded.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Xu, X., Kakadiaris, I.A.: Joint head pose estimation and face alignment framework using global and local CNN features. In: 12th International Conference on Automatic Face & Gesture Recognition (FG 2017), vol. 2, pp. 642–649. IEEE, May 2017

    Google Scholar 

  2. An, K.H., Chung, M.J.: 3D head tracking and pose-robust 2D texture map-based face recognition using a simple ellipsoid model. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 307–312. IEEE (2008)

    Google Scholar 

  3. Kumano, S., Otsuka, K., Yamato, J., Maeda, E., Sato, Y.: Pose-invariant facial expression recognition using variable-intensity templates. Int. J. Comput. Vis. 83(2), 178–194 (2009). https://doi.org/10.1007/s11263-008-0185-x

    Article  Google Scholar 

  4. Valenti, R., Sebe, N., Gevers, T.: Combining head pose and eye location information for gaze estimation. Trans. Image Process. 21(2), 802–815 (2012)

    Article  MathSciNet  Google Scholar 

  5. Valenti, R., Yucel, Z., Gevers, T.: Robustifying eye center localization by head pose cues. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 612–618. IEEE (2009)

    Google Scholar 

  6. Vicente, F., Huang, Z., Xiong, X., De la Torre, F., Zhang, W., Levi, D.: Driver gaze tracking and eyes off the road detection system. Trans. Intell. Transp. Syst. 16(4), 2014–2027 (2015)

    Article  Google Scholar 

  7. Borghi, G., Venturelli, M., Vezzani, R., Cucchiara, R.: Poseidon: face-from-depth for driver pose estimation. In: International Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2017)

    Google Scholar 

  8. Mohr, P., Tatzgern, M., Grubert, J., Schmalstieg, D., Kalkofen, D.: Adaptive user perspective rendering for handheld augmented reality. In: Symposium on 3D User Interfaces (3DUI), pp. 176–181. IEEE (2017)

    Google Scholar 

  9. Fanelli, G., Dantone, M., Gall, J., Fossati, A., Van Gool, L.: Random forests for real time 3D face analysis. Int. J. Comput. Vis. 101(3), 437–458 (2013)

    Article  Google Scholar 

  10. Tan, D.J., Tombari, F., Navab, N.: Real-time accurate 3D head tracking and pose estimation with consumer RGB-D cameras. Int. J. Comput. Vis. 126, 1–26 (2017)

    MathSciNet  Google Scholar 

  11. Diaz Barros, J.M., Garcia, F., Mirbach, B., Varanasi, K., Stricker, D.: Combined framework for real-time head pose estimation using facial landmark detection and salient feature tracking. In: Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP), vol. 5, pp. 123–133. INSTICC, SciTePress (2018)

    Google Scholar 

  12. Diaz Barros, J.M., Mirbach, B., Garcia, F., Varanasi, K., Stricker, D.: Fusion of keypoint tracking and facial landmark detection for real-time head pose estimation. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2028–2037. IEEE, March 2018

    Google Scholar 

  13. La Cascia, M., Sclaroff, S., Athitsos, V.: Fast, reliable head tracking under varying illumination: an approach based on registration of texture-mapped 3D models. Trans. Pattern Anal. Mach. Intell. 22(4), 322–336 (2000)

    Article  Google Scholar 

  14. Choi, S., Kim, D.: Robust head tracking using 3D ellipsoidal head model in particle filter. Pattern Recogn. 41(9), 2901–2915 (2008)

    Article  Google Scholar 

  15. Sung, J., Kanade, T., Kim, D.: Pose robust face tracking by combining active appearance models and cylinder head models. Int. J. Comput. Vision 80(2), 260–274 (2008)

    Article  Google Scholar 

  16. Jang, J.S., Kanade, T.: Robust 3D head tracking by online feature registration. In: 8th International Conference on Automatic Face & Gesture Recognition (FG 2008). IEEE (2008)

    Google Scholar 

  17. Jang, J.S., Kanade, T.: Robust 3D head tracking by view-based feature point registration. People Image Analysis (PIA) Consortium, Carnegie Mellon University, Technical report (2010)

    Google Scholar 

  18. Asteriadis, S., Karpouzis, K., Kollias, S.: Head pose estimation with one camera, in uncalibrated environments. In: Workshop on Eye Gaze in Intelligent Human Machine Interaction, pp. 55–62. ACM (2010)

    Google Scholar 

  19. Prasad, B.H., Aravind, R.: A robust head pose estimation system for uncalibrated monocular videos. In: 7th Indian Conference on Computer Vision, Graphics and Image Processing, pp. 162–169. ACM (2010)

    Google Scholar 

  20. Diaz Barros, J.M., Garcia, F., Mirbach, B., Stricker, D.: Real-time monocular 6-DoF head pose estimation from salient 2D points. In: International Conference on Image Processing (ICIP), pp. 121–125. IEEE, September 2017

    Google Scholar 

  21. Yin, C., Yang, X.: Real-time head pose estimation for driver assistance system using low-cost on-board computer. In: 15th ACM SIGGRAPH Conference on Virtual-Reality Continuum and Its Applications in Industry, vol. 1, pp. 43–46. ACM (2016)

    Google Scholar 

  22. Wu, Y., Gou, C., Ji, Q.: Simultaneous facial landmark detection, pose and deformation estimation under facial occlusion (2017)

    Google Scholar 

  23. Gou, C., Wu, Y., Wang, F.Y., Ji, Q.: Coupled cascade regression for simultaneous facial landmark detection and head pose estimation. In: International Conference on Image Processing (ICIP). IEEE (2017)

    Google Scholar 

  24. Ahn, B., Park, J., Kweon, I.S.: Real-time head orientation from a monocular camera using deep neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9005, pp. 82–96. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16811-1_6

    Chapter  Google Scholar 

  25. Drouard, V., Ba, S., Evangelidis, G., Deleforge, A., Horaud, R.: Head pose estimation via probabilistic high-dimensional regression. In: International Conference on Image Processing (ICIP), pp. 4624–4628. IEEE (2015)

    Google Scholar 

  26. Fanelli, G., Gall, J., Van Gool, L.: Real time head pose estimation with random regression forests. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 617–624. IEEE (2011)

    Google Scholar 

  27. Wang, H., Davoine, F., Lepetit, V., Chaillou, C., Pan, C.: 3D head tracking via invariant keypoint learning. Trans. Circuits Syst. Video Technol. 22(8), 1113–1126 (2012)

    Article  Google Scholar 

  28. Liu, X., Liang, W., Wang, Y., Li, S., Pei, M.: 3D head pose estimation with convolutional neural network trained on synthetic images. In: International Conference on Image Processing (ICIP), pp. 1289–1293. IEEE (2016)

    Google Scholar 

  29. Tulyakov, S., Vieriu, R.L., Semeniuta, S., Sebe, N.: Robust real-time extreme head pose estimation. In: 22nd International Conference on Pattern Recognition (ICPR), pp. 2263–2268. IEEE (2014)

    Google Scholar 

  30. Schwarz, A., Haurilet, M., Martinez, M., Stiefelhagen, R.: Driveahead - a large-scale driver head pose dataset. In: International Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE (2017)

    Google Scholar 

  31. Derkach, D., Ruiz, A., Sukno, F.M.: Head pose estimation based on 3-D facial landmarks localization and regression. In: 12th International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 820–827. IEEE, May 2017

    Google Scholar 

  32. Meyer, G.P., Gupta, S., Frosio, I., Reddy, D., Kautz, J.: Robust model-based 3D head pose estimation. In: International Conference on Computer Vision (ICCV), pp. 3649–3657. IEEE (2015)

    Google Scholar 

  33. Yu, Y., Funes Mora, K.A., Odobez, J.M.: Robust and accurate 3D head pose estimation through 3DMM and online head model reconstruction. In: 12th International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 711–718. IEEE, May 2017

    Google Scholar 

  34. Ghiass, R.S., Arandjelović, O., Laurendeau, D.: Highly accurate and fully automatic head pose estimation from a low quality consumer-level RGB-D sensor. In: 2nd Workshop on Computational Models of Social Interactions: Human-Computer-Media Communication, pp. 25–34. ACM (2015)

    Google Scholar 

  35. Papazov, C., Marks, T.K., Jones, M.: Real-time 3D head pose and facial landmark estimation from depth images using triangular surface patch features. In: International Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2015)

    Google Scholar 

  36. Jeni, L.A., Cohn, J.F., Kanade, T.: Dense 3D face alignment from 2D video for real-time use. Image Vis. Comput. 58, 13–24 (2017)

    Article  Google Scholar 

  37. Morency, L., Whitehill, J., Movellan, J.: Generalized adaptive view-based appearance model: integrated framework for monocular head pose estimation. In: 8th International Conference on Automatic Face & Gesture Recognition (FG 2008), pp. 1–8. IEEE (2008)

    Google Scholar 

  38. Baltrušaitis, T., Robinson, P., Morency, L.P.: 3D constrained local model for rigid and non-rigid facial tracking. In: International Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2012)

    Google Scholar 

  39. Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable model fitting by regularized landmark mean-shift. Int. J. Comput. Vision 91(2), 200–215 (2011)

    Article  MathSciNet  Google Scholar 

  40. Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1867–1874. IEEE (2014)

    Google Scholar 

  41. Makehuman: Open source tool for making 3D characters (2017). http://www.makehumancommunity.org/. Accessed 31 May 2018

  42. Hartley, R.I., Sturm, P.: Triangulation. Comput. Vis. Image Underst. 68(2), 146–157 (1997)

    Article  Google Scholar 

  43. Atkinson, K.E.: An introduction to numerical analysis. Wiley, New York (2008)

    Google Scholar 

  44. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C: The Art of Scientific Computing, Cambridge (1992)

    Google Scholar 

  45. Rosten, E., Porter, R., Drummond, T.: FASTER and better: a machine learning approach to corner detection. Trans. Pattern Anal. Mach. Intell. 32, 105–119 (2010)

    Article  Google Scholar 

  46. Bouguet, J.Y.: Pyramidal implementation of the affine Lucas-Kanade feature tracker description of the algorithm. Intel Corporation 5, 1–10 (2001)

    Google Scholar 

  47. Kun, J., Bok-Suk, S., Reinhard, K.: Novel backprojection method for monocular head pose estimation. Int. J. Fuzzy Logic Intell. Syst. 13(1), 50–58 (2013)

    Article  Google Scholar 

  48. Dodgson, N.A.: Variation and extrema of human interpupillary distance. In: Stereoscopic Displays and Virtual Reality Systems XI, vol. 5291, pp. 36–46. SPIE (2004)

    Google Scholar 

  49. Gordon, C.C., et al.: Anthropometric survey of U.S. army personnel: methods and summary statistics. In: Technical report 89–044, U.S. Army Natick Research, Development and Engineering Center, Natick, MA (1989)

    Google Scholar 

  50. Lefevre, S., Odobez, J.M.: Structure and appearance features for robust 3D facial actions tracking. In: International Conference on Multimedia and Expo, pp. 298–301. IEEE, June 2009

    Google Scholar 

  51. Tran, N.-T., Ababsa, F.-E., Charbit, M., Feldmar, J., Petrovska-Delacrétaz, D., Chollet, G.: 3D face pose and animation tracking via eigen-decomposition based bayesian approach. In: Bebis, G., et al. (eds.) ISVC 2013. LNCS, vol. 8033, pp. 562–571. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41914-0_55

    Chapter  Google Scholar 

  52. German Research Center for Artificial Intelligence (DFKI): Head pose estimation dataset (2018). http://av.dfki.de/publications/real-time-head-pose-estimation-by-tracking-and-detection-of-keypoints-and-facial-landmarks/

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jilliam M. Díaz Barros , Bruno Mirbach , Frederic Garcia , Kiran Varanasi or Didier Stricker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Díaz Barros, J.M., Mirbach, B., Garcia, F., Varanasi, K., Stricker, D. (2019). Real-Time Head Pose Estimation by Tracking and Detection of Keypoints and Facial Landmarks. In: Bechmann, D., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2018. Communications in Computer and Information Science, vol 997. Springer, Cham. https://doi.org/10.1007/978-3-030-26756-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26756-8_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26755-1

  • Online ISBN: 978-3-030-26756-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics