Skip to main content
Log in

RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Human head pose estimation in images has applications in many fields such as human–computer interaction or video surveillance tasks. In this work, we address this problem, defined here as the estimation of both vertical (tilt/pitch) and horizontal (pan/yaw) angles, through the use of a single Convolutional Neural Network (ConvNet) model, trying to balance precision and inference speed in order to maximize its usability in real-world applications. Our model is trained over the combination of two datasets: ‘Pointing’04’ (aiming at covering a wide range of poses) and ‘Annotated Facial Landmarks in the Wild’ (in order to improve robustness of our model for its use on real-world images). Three different partitions of the combined dataset are defined and used for training, validation and testing purposes. As a result of this work, we have obtained a trained ConvNet model, coined RealHePoNet, that given a low-resolution grayscale input image, and without the need of using facial landmarks, is able to estimate with low error both tilt and pan angles (\(~4.4^{\circ }\) average error on the test partition). Also, given its low inference time (6 ms per head), we consider our model usable even when paired with medium-spec hardware (i.e. GTX 1060 GPU). Code available at: https://github.com/rafabs97/headpose_final Demo video at: https://www.youtube.com/watch?v=2UeuXh5DjAE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Abbreviations

AFLW:

Annotated Facial Landmarks in the Wild

CNN:

Convolutional Neural Network

Conv:

Convolution

ConvNet:

Convolutional Neural Network

CT:

Confidence Threshold

FC:

Fully connected

flops:

Floating point operations per second

FPS:

Frames per second

HPE:

Head pose estimation

IoU:

Intersection over Union

MAE:

Mean Absolute Error

MSE:

Mean Squared Error

SSD:

Single Shot Detector

References

  1. (2014) YouTube video: How to warm up your neck. https://www.youtube.com/watch?v=W2IlxHQwR14. Accessed 19 Nov 2020

  2. (2016) YouTube video: High School Mannequin Challenge 1500 Students—Maple Ridge Secondary School. https://www.youtube.com/watch?v=qFaUhLkdRPg. Accessed 19 Nov 2020

  3. (2018) YouTube video: Social mobility and education: DISCUSSION—BBC Newsnight. https://www.youtube.com/watch?v=s84NGoMdPxg. Accessed 19 Nov 2020

  4. (2019) YouTube video: Find Out Which ‘The Big Bang Theory’ Star Is the Most Emotional as Series End Nears. https://www.youtube.com/watch?v=5AgenwHpelU. Accessed 19 Nov 2020

  5. (2020) YouTube video: #Coronavirus: Pacientes en #UCI habla por móvil con su familia tras ser extubada. https://www.youtube.com/watch?v=1cYr0NMi5m0. Accessed 19 Nov 2020

  6. Abate AF, Barra P, Bisogni C, Nappi M, Ricciardi S (2019) Near real-time three axis head pose estimation without training. IEEE Access 7:64256–64265. https://doi.org/10.1109/ACCESS.2019.2917451

    Article  Google Scholar 

  7. Ba SO, Odobez JM (2004) A probabilistic framework for joint head tracking and pose estimation. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., vol 4, pp 264–267 Vol.4, https://doi.org/10.1109/icpr.2004.1333754

  8. Balasubramanian VN, Ye J, Panchanathan S (2007) Biased manifold embedding: A framework for person-independent head pose estimation. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp 1–7, https://doi.org/10.1109/cvpr.2007.383280

  9. Barra P, Barra S, Bisogni C, De Marsico M, Nappi M (2020) Web-shaped model for head pose estimation: an approach for best exemplar selection. IEEE Trans Image Process 29:5457–5468. https://doi.org/10.1109/TIP.2020.2984373

    Article  Google Scholar 

  10. Berral-Soler R, Marín-Jiménez MJ, Madrid-Cuevas FJ (2019) Human head pose estimation using Keras over TensorFlow. https://github.com/rafabs97/headpose_final. Accessed 19 Nov 2020

  11. Berral-Soler R, Marín-Jiménez MJ, Madrid-Cuevas FJ (2020) RealHePoNet Demo. https://www.youtube.com/watch?v=2UeuXh5DjAE. Accessed 19 Nov 2020

  12. Castro FM, Marín-Jiménez MJ, Guil N, de la Blanca NP (2020) Multimodal feature fusion for CNN-based gait recognition: an empirical comparison. Neural Comput. Appl. 32(17):14173–14193. https://doi.org/10.1007/s00521-020-04811-z

    Article  Google Scholar 

  13. Czupryński B, Strupczewski A (2014) High accuracy head pose tracking survey. In: Active Media Technology, pp 407–420, https://doi.org/10.1007/978-3-319-09912-5_34

  14. Fanelli G, Gall J, Van Gool L (2011) Real time head pose estimation with random regression forests. CVPR 2011:617–624. https://doi.org/10.1109/cvpr.2011.5995458

    Article  Google Scholar 

  15. Fanelli G, Weise T, Gall J, Gool LV (2011) Real time head pose estimation from consumer depth cameras. In: Proceedings of the 33rd International Conference on Pattern Recognition, Springer-Verlag, Berlin, Heidelberg, DAGM’11, pp 101–110, https://doi.org/10.1007/978-3-642-23123-0_11

  16. Flickr (n.d.) Flickr. https://www.flickr.com/. Accessed 19 Nov 2020

  17. Gourier N, Crowley J (2004) Estimating face orientation from robust detection of salient facial structures. FG Net Workshop on Visual Observation of Deictic Gestures

  18. Gourier N, Maisonnasse J, Hall D, Crowley JL (2007) Head pose estimation on low resolution images. In: Proceedings of the 1st International Evaluation Conference on Classification of Events, Activities and Relationships, Springer-Verlag, Berlin, Heidelberg, CLEAR’06, pp 270–280, https://doi.org/10.1007/978-3-540-69568-4_24

  19. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. CoRR abs/1512.03385, https://doi.org/10.1109/CVPR.2016.90

  20. Koestinger M, Wohlhart P, Roth PM, Bischof H (2011) Annotated Facial Landmarks in the Wild: A Large-scale, Real-world Database for Facial Landmark Localization. In: Proc. First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, https://doi.org/10.1109/iccvw.2011.6130513

  21. Lathuiliere S, Juge R, Mesejo P, Muñoz-Salinas R, Horaud R (2017) Deep mixture of linear inverse regressions applied to head-pose estimation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7149–7157, https://doi.org/10.1109/cvpr.2017.756

  22. Lathuiliere S, Mesejo P, Alameda-Pineda X, Horaud R (2018) A comprehensive analysis of deep regression. CoRR abs/1803.08450, https://doi.org/10.1109/tpami.2019.2910523arXiv:1803.08450

  23. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  24. Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu CY, Berg AC (2015) SSD: single shot multibox detector. CoRR abs/1512.02325, https://doi.org/10.1007/978-3-319-46448-0_2arXiv:1512.02325

  25. Liu X, Liang W, Wang Y, Li S, Pei M (2016) 3d head pose estimation with convolutional neural network trained on synthetic images. In: 2016 IEEE International Conference on Image Processing (ICIP), pp 1289–1293, https://doi.org/10.1109/icip.2016.7532566

  26. Marín-Jiménez MJ, Zisserman A, Eichner M, Ferrari V (2014) Detecting people looking at each other in videos. Int J Comput Vis 106(3):282–296. https://doi.org/10.1007/s11263-013-0655-7

    Article  Google Scholar 

  27. Marín-Jiménez MJ, Ramírez FJR, Muñoz-Salinas R, Carnicer RM (2018) 3D human pose estimation from depth maps using a deep combination of poses. J Vis Commun Image Represent 55:627–639. https://doi.org/10.1016/j.jvcir.2018.07.010

    Article  Google Scholar 

  28. Marín-Jiménez MJ, Kalogeiton V, Medina-Suárez P, Zisserman A (2019) LAEO-Net: revisiting people Looking At Each Other in videos. In: CVPR, https://doi.org/10.1109/cvpr.2019.00359

  29. Muñoz-Salinas R, Yeguas-Bolivar E, Saffiotti A, Medina Carnicer R (2012) Multi-camera head pose estimation. Mach Vis Appl 23(3):479–490. https://doi.org/10.1007/s00138-012-0410-z

    Article  Google Scholar 

  30. Murphy-Chutorian E, Trivedi MM (2009) Head pose estimation in computer vision: a survey. IEEE Trans Pattern Anal Mach Intell 31(4):607–626. https://doi.org/10.1109/tpami.2008.106

    Article  Google Scholar 

  31. Murphy-Chutorian E, Trivedi MM (2010) Head pose estimation and augmented reality tracking: an integrated system and evaluation for monitoring driver awareness. IEEE Trans Intell Transp Syst 11(2):300–311. https://doi.org/10.1109/tits.2010.2044241

    Article  Google Scholar 

  32. Murphy-Chutorian E, Doshi A, Trivedi MM (2007) Head pose estimation for driver assistance systems: A robust algorithm and experimental evaluation. In: 2007 IEEE Intelligent Transportation Systems Conference, pp 709–714, https://doi.org/10.1109/itsc.2007.4357803

  33. Passalis N, Tefas A (2020) Continuous drone control using deep reinforcement learning for frontal view person shooting. Neural Comput Appl 32(9):4227–4238. https://doi.org/10.1007/s00521-019-04330-6

    Article  Google Scholar 

  34. Patacchiola M, Cangelosi A (2017) Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods. Pattern Recognit. https://doi.org/10.1016/j.patcog.2017.06.009

    Article  Google Scholar 

  35. Patacchiola M, Gooch J, Mehta I, Surace L, Kamath H (2016) Deepgaze library repository. https://github.com/mpatacchiola/deepgaze. Accessed 19 Nov 2020

  36. Pereira EM, Ciobanu L, Cardoso JS (2017) Cross-layer classification framework for automatic social behavioural analysis in surveillance scenario. Neural Comput Appl 28(9):2425–2444. https://doi.org/10.1007/s00521-016-2282-z

    Article  Google Scholar 

  37. Raytchev B, Yoda I, Sakaue K (2004) Head pose estimation by nonlinear manifold learning. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., vol 4, pp 462–466 Vol.4, https://doi.org/10.1109/icpr.2004.1333802

  38. Rosebrock A (2016) Intersection over Union (IoU) for object detection. https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/. Accessed 19 Nov 2020

  39. Ruiz N, Rehg JM (2017) Dockerface: an easy to install and use Faster R-CNN face detector in a Docker container. ArXiv e-prints arXiv:1708.04370

  40. Ruiz N, Chong E, Rehg JM (2017) Hopenet. https://github.com/natanielruiz/deep-head-pose. Accessed 19 Nov 2020

  41. Ruiz N, Chong E, Rehg JM (2018) Fine-grained head pose estimation without keypoints. In: Proc. of IEEE conf. on Computer Vision and Pattern Recognition Workshops, pp 2074–2083, https://doi.org/10.1109/CVPRW.2018.00281

  42. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations, ICLR

  43. Tenenbaum JB, Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323. https://doi.org/10.1126/science.290.5500.2319

    Article  Google Scholar 

  44. Vatahska T, Bennewitz M, Behnke S (2007) Feature-based head pose estimation from images. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots, pp 330–335, https://doi.org/10.1109/ichr.2007.4813889

  45. Wijnands JS, Thompson J, Nice KA, Aschwanden GD, Stevenson M (2019) Real-time monitoring of driver drowsiness on mobile platforms using 3d neural networks. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04506-0

    Article  Google Scholar 

  46. Xia J, Cao L, Zhang G, Liao J (2019) Head pose estimation in the wild assisted by facial landmarks based on convolutional neural networks. IEEE Access 7:48470–48483. https://doi.org/10.1109/ACCESS.2019.2909327

    Article  Google Scholar 

  47. Yuan A, Bai G, Jiao L, Liu Y (2012) Offline handwritten english character recognition based on convolutional neural network. In: 2012 10th IAPR International Workshop on Document Analysis Systems, pp 125–129, https://doi.org/10.1109/das.2012.61

  48. Yuan H, Li M, Hou J, Xiao J (2020) Single image-based head pose estimation with spherical parametrization and 3d morphing. Pattern Recognit. 103:107316. https://doi.org/10.1016/j.patcog.2020.107316

    Article  Google Scholar 

  49. Zhang T, Sodhro AH, Luo Z, Zahid N, Nawaz MW, Pirbhulal S, Muzammal M (2020) A joint deep learning and internet of medical things driven framework for elderly patients. IEEE Access 8:75822–75832. https://doi.org/10.1109/access.2020.2989143

    Article  Google Scholar 

  50. Zhu X, Liu X, Lei Z, Li SZ (2019) Face alignment in full pose range: a 3d total solution. IEEE Trans Pattern Anal Mach Intell 41(1):78–92. https://doi.org/10.1109/TPAMI.2017.2778152

    Article  Google Scholar 

Download references

Acknowledgements

This work has been partially funded by the Spanish projects TIN2019-75279-P and RED2018-102511-T. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuel J. Marín-Jiménez.

Ethics declarations

Conflicts of Interest

The authors declare that they have no conflict of interest.

Code availability

Code is publicly available at: https://github.com/rafabs97/headpose_final/.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Berral-Soler, R., Madrid-Cuevas, F.J., Muñoz-Salinas, R. et al. RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild. Neural Comput & Applic 33, 7673–7689 (2021). https://doi.org/10.1007/s00521-020-05511-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05511-4

Keywords

Navigation