A Data-Driven Approach to Improve 3D Head-Pose Estimation

Aghli, Nima; Ribeiro, Eraldo

doi:10.1007/978-3-030-90439-5_43

Nima Aghli¹⁷ &
Eraldo Ribeiro¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13017))

Included in the following conference series:

International Symposium on Visual Computing

1221 Accesses
3 Citations

Abstract

Head-pose estimation from images is an important research topic in computer vision. Its many applications include detecting focus of attention, tracking driver behavior, and human-computer interaction. Recent research on head-pose estimation has focused on developing models based on deep convolutional neural networks (CNNs). These models are trained using transfer-learning and image augmentation to achieve better initiation states and robustness against occlusions. However, methods that use transfer-learning networks are usually aimed at general image recognition and offer no in-depth study of transfer learning from more task-related networks. Additionally, for the head-pose estimation, robustness against heavy occlusion, and noise such as motion blur and low-brightness are vital. In this paper, we propose a new image-augmentation approach that significantly improves the estimation accuracy of the head-pose model. We also propose a task-related weight initialization to further improve the estimation accuracy by studying internal activations of models trained for face-related tasks such as face-recognition. We test our head-pose estimation model on three challenging test sets and achieve better results to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://keras.io/.

References

Balasubramanian, V.N., Ye, J., Panchanathan, S.: Biased manifold embedding: a framework for person-independent head pose estimation. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7. IEEE (2007)
Google Scholar
Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D face alignment problem? (And a dataset of 230,000 3D facial landmarks). In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1021–1030 (2017)
Google Scholar
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 67–74. IEEE (2018)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Drouard, V., Horaud, R., Deleforge, A., Ba, S., Evangelidis, G.: Robust head-pose estimation based on partially-latent mixture of linear regressions. IEEE Trans. Image Process. 26(3), 1428–1440 (2017)
Article MathSciNet Google Scholar
Fanelli, G., Dantone, M., Gall, J., Fossati, A., Van Gool, L.: Random forests for real time 3D face analysis. Int. J. Comput. Vis. 101(3), 437–458 (2013)
Article Google Scholar
Fanelli, G., Weise, T., Gall, J., Van Gool, L.: Real time head pose estimation from consumer depth cameras. In: Mester, R., Felsberg, M. (eds.) DAGM 2011. LNCS, vol. 6835, pp. 101–110. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23123-0_11
Chapter Google Scholar
Gu, J., Yang, X., De Mello, S., Kautz, J.: Dynamic facial analysis: from Bayesian filtering to recurrent neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1548–1557 (2017)
Google Scholar
Gupta, A., Thakkar, K., Gandhi, V., Narayanan, P.: Nose, eyes and ears: head pose estimation by locating facial keypoints. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1977–1981. IEEE (2019)
Google Scholar
Gustafsson, F.K., Danelljan, M., Timofte, R., Schön, T.B.: How to train your energy-based model for regression. arXiv preprint arXiv:2005.01698 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hsueh, B.Y., Li, W., Wu, I.C.: Stochastic gradient descent with hyperbolic-tangent decay on classification. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 435–442. IEEE (2019)
Google Scholar
Huang, J., Shao, X., Wechsler, H.: Face pose discrimination using support vector machines (SVM). In: Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No. 98EX170), vol. 1, pp. 154–156. IEEE (1998)
Google Scholar
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874 (2014)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 2144–2151. IEEE (2011)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Kumar, A., Alavi, A., Chellappa, R.: KEPLER: keypoint and pose estimation of unconstrained faces by learning efficient H-CNN regressors. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 258–265. IEEE (2017)
Google Scholar
Martin, M., Van De Camp, F., Stiefelhagen, R.: Real time head model creation and head pose estimation on consumer depth cameras. In: 2014 2nd International Conference on 3D Vision, vol. 1, pp. 641–648. IEEE (2014)
Google Scholar
Mukherjee, S.S., Robertson, N.M.: Deep head pose: gaze-direction estimation in multimodal video. IEEE Trans. Multimedia 17(11), 2094–2107 (2015)
Article Google Scholar
Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 121–135 (2017)
Article Google Scholar
Raytchev, B., Yoda, I., Sakaue, K.: Head pose estimation by nonlinear manifold learning. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004, ICPR 2004, vol. 4, pp. 462–466. IEEE (2004)
Google Scholar
Rowley, H.A., Baluja, S., Kanade, T.: Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 23–38 (1998)
Article Google Scholar
Ruiz, N., Chong, E., Rehg, J.M.: Fine-grained head pose estimation without keypoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2074–2083 (2018)
Google Scholar
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 397–403 (2013)
Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Google Scholar
Sherrah, J., Gong, S., Ong, E.J.: Understanding pose discrimination in similarity space. In: BMVC, pp. 1–10. Citeseer (1999)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International Conference on Machine Learning, pp. 1139–1147 (2013)
Google Scholar
Wang, H., Chen, Z., Zhou, Y.: Hybrid coarse-fine classification for head pose estimation. arXiv preprint arXiv:1901.06778 (2019)
Yang, T.Y., Chen, Y.T., Lin, Y.Y., Chuang, Y.Y.: FSA-Net: learning fine-grained structure aggregation for head pose estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1087–1096 (2019)
Google Scholar
Yang, T.Y., Huang, Y.H., Lin, Y.Y., Hsiu, P.C., Chuang, Y.Y.: SSR-Net: a compact soft stagewise regression network for age estimation. In: IJCAI, vol. 5, p. 7 (2018)
Google Scholar
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Sig. Process. Lett. 23(10), 1499–1503 (2016)
Article Google Scholar
Zhang, Z., Hu, Y., Liu, M., Huang, T.: Head pose estimation in seminar room using multi view face detectors. In: Stiefelhagen, R., Garofolo, J. (eds.) CLEAR 2006. LNCS, vol. 4122, pp. 299–304. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-69568-4_27
Chapter Google Scholar
Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 146–155 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Florida Institute of Technology, Melbourne, FL, 32904, USA
Nima Aghli & Eraldo Ribeiro

Authors

Nima Aghli
View author publications
You can also search for this author in PubMed Google Scholar
Eraldo Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nima Aghli .

Editor information

Editors and Affiliations

University of Nevada, Reno, NV, USA
George Bebis
University of Texas at Arlington, Arlington, TX, USA
Vassilis Athitsos
University of South Carolina, Columbia, SC, USA
Tong Yan
City University of Hong Kong, Kowloon, Hong Kong
Manfred Lau
School of Engineering and Computing, University of Durham, Durham, Durham, UK
Frederick Li
Airbnb, New York, NY, USA
Conglei Shi
Peking University, Beijing, China
Xiaoru Yuan
Purdue University, West Lafayette, IN, USA
Christos Mousas
IST, School of Modeling, Simulation, and Training, Orlando, FL, USA
Gerd Bruder

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aghli, N., Ribeiro, E. (2021). A Data-Driven Approach to Improve 3D Head-Pose Estimation. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2021. Lecture Notes in Computer Science(), vol 13017. Springer, Cham. https://doi.org/10.1007/978-3-030-90439-5_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-90439-5_43
Published: 01 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90438-8
Online ISBN: 978-3-030-90439-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics