Abstract
We introduce a novel fusion framework for real-time head pose estimation using a tailored Kalman Filter. This approach estimates the pose from intensity images in monocular video data. The method is robust to extreme head rotations and varying illumination, with real-time capability. Our framework incorporates the head pose computed from a keypoint-based tracking scheme into the prediction step of the Kalman Filter and the head pose computed from a facial-landmark-based detection scheme into the correction step. The head pose from the tracking scheme is estimated from 2D keypoints tracked in two consecutive frames in the region of the head and their 3D projection on a simple geometric model. In contrast, the head pose from the detection scheme is estimated from 2D facial landmarks detected in each frame and their 3D correspondences retrieved through triangulation. In each scheme, the head pose results from the minimization of the reprojection error from the 3D-2D correspondences. In each iteration, we update the state transition matrix of the filter and subsequently the estimated covariance. We evaluated our approach on a publicly available dataset and compared with related methods of the state of the art. Our approach could achieve similar performance in terms of mean average error, while operating in real time. Furthermore, we tested our method on our own dataset, to evaluate its performance in the presence of large head rotations. We show good results even in cases where facial landmarks are partially occluded.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Xu, X., Kakadiaris, I.A.: Joint head pose estimation and face alignment framework using global and local CNN features. In: 12th International Conference on Automatic Face & Gesture Recognition (FG 2017), vol. 2, pp. 642–649. IEEE, May 2017
An, K.H., Chung, M.J.: 3D head tracking and pose-robust 2D texture map-based face recognition using a simple ellipsoid model. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 307–312. IEEE (2008)
Kumano, S., Otsuka, K., Yamato, J., Maeda, E., Sato, Y.: Pose-invariant facial expression recognition using variable-intensity templates. Int. J. Comput. Vis. 83(2), 178–194 (2009). https://doi.org/10.1007/s11263-008-0185-x
Valenti, R., Sebe, N., Gevers, T.: Combining head pose and eye location information for gaze estimation. Trans. Image Process. 21(2), 802–815 (2012)
Valenti, R., Yucel, Z., Gevers, T.: Robustifying eye center localization by head pose cues. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 612–618. IEEE (2009)
Vicente, F., Huang, Z., Xiong, X., De la Torre, F., Zhang, W., Levi, D.: Driver gaze tracking and eyes off the road detection system. Trans. Intell. Transp. Syst. 16(4), 2014–2027 (2015)
Borghi, G., Venturelli, M., Vezzani, R., Cucchiara, R.: Poseidon: face-from-depth for driver pose estimation. In: International Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2017)
Mohr, P., Tatzgern, M., Grubert, J., Schmalstieg, D., Kalkofen, D.: Adaptive user perspective rendering for handheld augmented reality. In: Symposium on 3D User Interfaces (3DUI), pp. 176–181. IEEE (2017)
Fanelli, G., Dantone, M., Gall, J., Fossati, A., Van Gool, L.: Random forests for real time 3D face analysis. Int. J. Comput. Vis. 101(3), 437–458 (2013)
Tan, D.J., Tombari, F., Navab, N.: Real-time accurate 3D head tracking and pose estimation with consumer RGB-D cameras. Int. J. Comput. Vis. 126, 1–26 (2017)
Diaz Barros, J.M., Garcia, F., Mirbach, B., Varanasi, K., Stricker, D.: Combined framework for real-time head pose estimation using facial landmark detection and salient feature tracking. In: Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP), vol. 5, pp. 123–133. INSTICC, SciTePress (2018)
Diaz Barros, J.M., Mirbach, B., Garcia, F., Varanasi, K., Stricker, D.: Fusion of keypoint tracking and facial landmark detection for real-time head pose estimation. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2028–2037. IEEE, March 2018
La Cascia, M., Sclaroff, S., Athitsos, V.: Fast, reliable head tracking under varying illumination: an approach based on registration of texture-mapped 3D models. Trans. Pattern Anal. Mach. Intell. 22(4), 322–336 (2000)
Choi, S., Kim, D.: Robust head tracking using 3D ellipsoidal head model in particle filter. Pattern Recogn. 41(9), 2901–2915 (2008)
Sung, J., Kanade, T., Kim, D.: Pose robust face tracking by combining active appearance models and cylinder head models. Int. J. Comput. Vision 80(2), 260–274 (2008)
Jang, J.S., Kanade, T.: Robust 3D head tracking by online feature registration. In: 8th International Conference on Automatic Face & Gesture Recognition (FG 2008). IEEE (2008)
Jang, J.S., Kanade, T.: Robust 3D head tracking by view-based feature point registration. People Image Analysis (PIA) Consortium, Carnegie Mellon University, Technical report (2010)
Asteriadis, S., Karpouzis, K., Kollias, S.: Head pose estimation with one camera, in uncalibrated environments. In: Workshop on Eye Gaze in Intelligent Human Machine Interaction, pp. 55–62. ACM (2010)
Prasad, B.H., Aravind, R.: A robust head pose estimation system for uncalibrated monocular videos. In: 7th Indian Conference on Computer Vision, Graphics and Image Processing, pp. 162–169. ACM (2010)
Diaz Barros, J.M., Garcia, F., Mirbach, B., Stricker, D.: Real-time monocular 6-DoF head pose estimation from salient 2D points. In: International Conference on Image Processing (ICIP), pp. 121–125. IEEE, September 2017
Yin, C., Yang, X.: Real-time head pose estimation for driver assistance system using low-cost on-board computer. In: 15th ACM SIGGRAPH Conference on Virtual-Reality Continuum and Its Applications in Industry, vol. 1, pp. 43–46. ACM (2016)
Wu, Y., Gou, C., Ji, Q.: Simultaneous facial landmark detection, pose and deformation estimation under facial occlusion (2017)
Gou, C., Wu, Y., Wang, F.Y., Ji, Q.: Coupled cascade regression for simultaneous facial landmark detection and head pose estimation. In: International Conference on Image Processing (ICIP). IEEE (2017)
Ahn, B., Park, J., Kweon, I.S.: Real-time head orientation from a monocular camera using deep neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9005, pp. 82–96. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16811-1_6
Drouard, V., Ba, S., Evangelidis, G., Deleforge, A., Horaud, R.: Head pose estimation via probabilistic high-dimensional regression. In: International Conference on Image Processing (ICIP), pp. 4624–4628. IEEE (2015)
Fanelli, G., Gall, J., Van Gool, L.: Real time head pose estimation with random regression forests. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 617–624. IEEE (2011)
Wang, H., Davoine, F., Lepetit, V., Chaillou, C., Pan, C.: 3D head tracking via invariant keypoint learning. Trans. Circuits Syst. Video Technol. 22(8), 1113–1126 (2012)
Liu, X., Liang, W., Wang, Y., Li, S., Pei, M.: 3D head pose estimation with convolutional neural network trained on synthetic images. In: International Conference on Image Processing (ICIP), pp. 1289–1293. IEEE (2016)
Tulyakov, S., Vieriu, R.L., Semeniuta, S., Sebe, N.: Robust real-time extreme head pose estimation. In: 22nd International Conference on Pattern Recognition (ICPR), pp. 2263–2268. IEEE (2014)
Schwarz, A., Haurilet, M., Martinez, M., Stiefelhagen, R.: Driveahead - a large-scale driver head pose dataset. In: International Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE (2017)
Derkach, D., Ruiz, A., Sukno, F.M.: Head pose estimation based on 3-D facial landmarks localization and regression. In: 12th International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 820–827. IEEE, May 2017
Meyer, G.P., Gupta, S., Frosio, I., Reddy, D., Kautz, J.: Robust model-based 3D head pose estimation. In: International Conference on Computer Vision (ICCV), pp. 3649–3657. IEEE (2015)
Yu, Y., Funes Mora, K.A., Odobez, J.M.: Robust and accurate 3D head pose estimation through 3DMM and online head model reconstruction. In: 12th International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 711–718. IEEE, May 2017
Ghiass, R.S., Arandjelović, O., Laurendeau, D.: Highly accurate and fully automatic head pose estimation from a low quality consumer-level RGB-D sensor. In: 2nd Workshop on Computational Models of Social Interactions: Human-Computer-Media Communication, pp. 25–34. ACM (2015)
Papazov, C., Marks, T.K., Jones, M.: Real-time 3D head pose and facial landmark estimation from depth images using triangular surface patch features. In: International Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2015)
Jeni, L.A., Cohn, J.F., Kanade, T.: Dense 3D face alignment from 2D video for real-time use. Image Vis. Comput. 58, 13–24 (2017)
Morency, L., Whitehill, J., Movellan, J.: Generalized adaptive view-based appearance model: integrated framework for monocular head pose estimation. In: 8th International Conference on Automatic Face & Gesture Recognition (FG 2008), pp. 1–8. IEEE (2008)
Baltrušaitis, T., Robinson, P., Morency, L.P.: 3D constrained local model for rigid and non-rigid facial tracking. In: International Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2012)
Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable model fitting by regularized landmark mean-shift. Int. J. Comput. Vision 91(2), 200–215 (2011)
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1867–1874. IEEE (2014)
Makehuman: Open source tool for making 3D characters (2017). http://www.makehumancommunity.org/. Accessed 31 May 2018
Hartley, R.I., Sturm, P.: Triangulation. Comput. Vis. Image Underst. 68(2), 146–157 (1997)
Atkinson, K.E.: An introduction to numerical analysis. Wiley, New York (2008)
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C: The Art of Scientific Computing, Cambridge (1992)
Rosten, E., Porter, R., Drummond, T.: FASTER and better: a machine learning approach to corner detection. Trans. Pattern Anal. Mach. Intell. 32, 105–119 (2010)
Bouguet, J.Y.: Pyramidal implementation of the affine Lucas-Kanade feature tracker description of the algorithm. Intel Corporation 5, 1–10 (2001)
Kun, J., Bok-Suk, S., Reinhard, K.: Novel backprojection method for monocular head pose estimation. Int. J. Fuzzy Logic Intell. Syst. 13(1), 50–58 (2013)
Dodgson, N.A.: Variation and extrema of human interpupillary distance. In: Stereoscopic Displays and Virtual Reality Systems XI, vol. 5291, pp. 36–46. SPIE (2004)
Gordon, C.C., et al.: Anthropometric survey of U.S. army personnel: methods and summary statistics. In: Technical report 89–044, U.S. Army Natick Research, Development and Engineering Center, Natick, MA (1989)
Lefevre, S., Odobez, J.M.: Structure and appearance features for robust 3D facial actions tracking. In: International Conference on Multimedia and Expo, pp. 298–301. IEEE, June 2009
Tran, N.-T., Ababsa, F.-E., Charbit, M., Feldmar, J., Petrovska-Delacrétaz, D., Chollet, G.: 3D face pose and animation tracking via eigen-decomposition based bayesian approach. In: Bebis, G., et al. (eds.) ISVC 2013. LNCS, vol. 8033, pp. 562–571. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41914-0_55
German Research Center for Artificial Intelligence (DFKI): Head pose estimation dataset (2018). http://av.dfki.de/publications/real-time-head-pose-estimation-by-tracking-and-detection-of-keypoints-and-facial-landmarks/
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Díaz Barros, J.M., Mirbach, B., Garcia, F., Varanasi, K., Stricker, D. (2019). Real-Time Head Pose Estimation by Tracking and Detection of Keypoints and Facial Landmarks. In: Bechmann, D., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2018. Communications in Computer and Information Science, vol 997. Springer, Cham. https://doi.org/10.1007/978-3-030-26756-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-26756-8_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26755-1
Online ISBN: 978-3-030-26756-8
eBook Packages: Computer ScienceComputer Science (R0)