Abstract
Face pose estimation has been widely used into various applications of human–computer interaction; however, it is yet a challenging work due to illumination, background, face orientations, appearance visibility, etc. In this paper, a novel coarse-to-fine method of face pose quantitative estimation based on convolutional neural networks (CNN) and geometric projection is proposed. In coarse classification, CNN is applied to classify the input image into a specific category and obtain a relevant weight. After that, geometric projections of 3D face landmarks projected into three planes, x–y, x–z and y–z, of 3D coordinate systems are used to perform the fine estimation of face pose, which can get the offset angles of the face in the three directions of roll, yaw, and pitch. Finally, the final score of face pose is obtained by combining the results of two stages. Experiments on standard datasets show that the proposed method can get better results than some competitive algorithms, which proves the effectiveness of the proposed method.
Similar content being viewed by others
References
Doshi A, Trivedi MM (2012) Head and eye gaze dynamics during visual attention shifts in complex environments. J Vis 12(2):1–16
Ding C, Xu C, Tao D (2015) Multi-task pose-invariant face recognition. IEEE Trans Image Process 24(3):980–993
Murphy-Chutorian E, Trivedi MM (2009) Head pose estimation in computer vision: a survey. IEEE Trans Pattern Anal Mach Intell 31(4):607–626
Beymer D (1994) Face recognition under varying pose. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 756–761
Ng J, Gong S (2002) Composite support vector machines for detection of faces across views and pose estimation. Image Vis Comput 20(5–6):359–368
Ng J, Gong S (1999) Multi-view face detection and pose estimation using a composite support vector machine across the view sphere. In: Proceedings international workshop on recognition, analysis, and tracking of faces and gestures in real-time systems, pp 14–21
Wang J, Sung E (2007) EM enhancement of 3D head pose estimated by point at infinity. Image Vis Comput 25(12):1864–1874
Heo J, Savvides M (2011) Generic 3D face pose estimation using facial shapes. In: 2011 international joint conference on biometrics (IJCB), pp 1–8
Hegde C, Sankaranarayanan AC, Baraniuk RG (2011) Learning manifolds in the wild. J Mach Learn Res 1(2):1–34
Sundararajan K, Woodard DL (2015) Head pose estimation in the wild using approximate view manifolds. In: 2015 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 50–58
Zhang Z, Hu Y, Liu M, Huang T (2007) Head pose estimation in seminar room using multi view face detectors. In: International evaluation workshop on classification of events, activities and relationships, pp 299–304
Ma B, Zhang W, Shan S, Chen X, Gao W (2006) Robust head pose estimation using LGBP. In: 18th international conference on pattern recognition (ICPR’06), pp 512–515
Murphy-Chutorian E, Trivedi MM (2007) Head pose estimation for driver assistance systems: a robust algorithm and experimental evaluation. In: 2007 IEEE intelligent transportation systems conference, pp 709–714
Ma Y, Konishi Y, Kinoshita K, Lao S, Kawade M (2006) Sparse Bayesian regression for head pose estimation. In: 18th International conference on pattern recognition (ICPR’06), pp 507–510
Han B, Lee S, Yang H (2014) Head pose estimation using image abstraction and local directional quaternary patterns for multiclass classification. Pattern Recogn Lett 45:145–153
Drouard V, Ba S, Evangelidis G, Deleforge A, Horaud R (2015) Head pose estimation via probabilistic high-dimensional regression. In: 2015 IEEE international conference on image processing (ICIP), pp 4624–4628
Drouard V, Horaud R, Deleforge A, Ba S, Evangelidis G (2017) Robust head-pose estimation based on partially-latent mixture of linear regressions. IEEE Trans Image Process 26(3):1428–1440
Aghajanian J, Prince S (2009) Face pose estimation in uncontrolled environments. BMVC 1(2):1–11
Torki M, Elgammal A (2011) Regression from local features for viewpoint and pose estimation. In: 2011 international conference on computer vision, pp 2603–2610
Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2879–2886
Wang Y, Liang W, Shen J, Jia Y, Yu L (2019) A deep Coarse-to-Fine network for head pose estimation from synthetic data. Pattern Recogn 94(10):196–206
Ranjan R, Patel VM, Chellappa R (2019) HyperFace: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135
Liu X, Liang W, Wang Y, Li S, Pei M (2016) 3D head pose estimation with convolutional neural network trained on synthetic images. In: 2016 IEEE international conference on image processing (ICIP), pp 1289–1293
Ahn B, Jaesik P, Kweon I (2014) Real-time head orientation from a monocular camera using deep neural network. In: Asian conference on computer vision, pp 82–96
Patacchiola M, Cangelosi A (2017) Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods. Pattern Recogn 71:132–143
Zavan F, Bellon OR, Silva L, Medioni GG (2019) Benchmarking parts based face processing in-the-wild for gender recognition and head pose estimation. Pattern Recogn Lett 123:104–110
Ruiz N, Chong E, Rehg JM (2018) Fine-grained head pose estimation without keypoints. In: 2018 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 2155–2164
Kumar A, Alavi A, Chellappa R (2017) KEPLER: keypoint and pose estimation of unconstrained faces by learning efficient H-CNN regressors. In: 2017 12th IEEE international conference on automatic face and gesture recognition (FG 2017), pp 258–265
Zhang H, Ji Y, Huang W, Liu L (2019) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 31:7361–7380
Ji Y, Zhang H, Wu Q (2018) Salient object detection via multi-scale attention CNN. Neurocomputing 322:130–140
Hsu H, Wu T, Wan S, Wong W, Lee C (2019) QuatNet: quaternion-based head pose estimation with multiregression loss. IEEE Trans Multimed 21(4):1035–1046
Huang B, Chen R, Xu W, Zhou Q (2020) Improving head pose estimation using two-stage ensembles with top-k regression. Image Vis Comput 93:103827–103835
Wu H, Zhang K, Tian G (2018) Simultaneous face detection and pose estimation using convolutional neural network cascade. IEEE Access 6:49563–49575
Fanelli G, Dantone M, Gall J, Fossati A, Gool L (2013) Random forests for real time 3D face analysis. Int J Comput Vision 101(3):437–458
Gourier N, Hall D, Crowley JL (2004) Estimating face orientation from robust detection of salient facial structures. In: FG Net workshop on visual observation of deictic gestures, vol 6, pp 1–9
Kostinger M, Wohlhart P, Roth P, Bischof H (2011) Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops), pp 2144–2151
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: Thirty-First AAAI conference on artificial intelligence, pp 4278–4284
Zhang K-P, Zhang Z-P, Li Z-F, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Deng J, Dong W, Socher R, Li L-J, Li K, Li F-F (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
Bulat A, Tzimiropoulos G (2017) How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks). In: 2017 IEEE international conference on computer vision (ICCV), pp 1021–1030
Jourabloo A, Liu X (2016) Large-pose face alignment via CNN-based dense 3D model fitting. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4188–4196
Zhu X, Lei Z, Liu X (2016) Face alignment across large poses: a 3D solution. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 146–155
Redmon J, Farhadi A (2018) YOLOv3: An Incremental Improvement. http://arxiv.org/abs/1804.02767
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg A (2016) SSD: single shot multibox detector. In: European conference on computer vision (ECCV), pp 21–37
Acknowledgements
This work is being supported by the National Natural Science Foundation of China under Grant No. 61976193, the Zhejiang Provincial Science and Technology Planning Key Project of China under Grant No. 2018C01064 and the Zhejiang Provincial Natural Science Foundation of China under Grant No. LY19F020027.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gao, F., Li, S. & Lu, S. How frontal is a face? Quantitative estimation of face pose based on CNN and geometric projection. Neural Comput & Applic 33, 3035–3051 (2021). https://doi.org/10.1007/s00521-020-05167-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05167-0