Towards Real-Time Head Pose Estimation: Exploring Parameter-Reduced Residual Networks on In-the-wild Datasets

Rieger, Ines; Hauenstein, Thomas; Hettenkofer, Sebastian; Garbas, Jens-Uwe

doi:10.1007/978-3-030-22999-3_12

Ines Rieger¹³,
Thomas Hauenstein¹³,
Sebastian Hettenkofer¹³ &
…
Jens-Uwe Garbas¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11606))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

2088 Accesses
4 Citations
1 Altmetric

Abstract

Head poses are a key component of human bodily communication and thus a decisive element of human-computer interaction. Real-time head pose estimation is crucial in the context of human-robot interaction or driver assistance systems. The most promising approaches for head pose estimation are based on Convolutional Neural Networks (CNNs). However, CNN models are often too complex to achieve real-time performance. To face this challenge, we explore a popular subgroup of CNNs, the Residual Networks (ResNets) and modify them in order to reduce their number of parameters. The ResNets are modified for different image sizes including low-resolution images and combined with a varying number of layers. They are trained on in-the-wild datasets to ensure real-world applicability. As a result, we demonstrate that the performance of the ResNets can be maintained while reducing the number of parameters. The modified ResNets achieve state-of-the-art accuracy and provide fast inference for real-time applicability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://image-net.org/challenges/LSVRC/2015/, accessed 14.12.2018.
2.
http://cocodataset.org/#detections-challenge2015, accessed 14.12.2018.
3.
https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/aflw/, accessed 26.03.2019.
4.
https://github.com/natanielruiz/deep-head-pose, accessed 09.01.2019.

References

Benenson, R., Omran, M., Hosang, J., Schiele, B.: Ten years of pedestrian detection, what have we learned? In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 613–627. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16181-5_47
Chapter Google Scholar
Beymer, D.: Face recognition under varying pose. In: CVPR, vol. 94, p. 137. Citeseer (1994)
Google Scholar
Dementhon, D.F., Davis, L.S.: Model-based object pose in 25 lines of code. Int. J. Comput. Vision 15(1–2), 123–141 (1995)
Article Google Scholar
Diebel, J.: Representing attitude: Euler angles, unit quaternions, and rotation vectors. Matrix 58(15–16), 1–35 (2006)
Google Scholar
Fanelli, G., Dantone, M., Gall, J., Fossati, A., Van Gool, L.: Random forests for real time 3D face analysis. Int. J. Comput. Vision 101(3), 437–458 (2013)
Article Google Scholar
Fanelli, G., Gall, J., Van Gool, L.: Real time head pose estimation with random regression forests. In: CVPR 2011, pp. 617–624. IEEE (2011)
Google Scholar
Ferrario, V.F., Sforza, C., Serrao, G., Grassi, G., Mossi, E.: Active range of motion of the head and cervical spine: a three-dimensional investigation in healthy young adults. J. Orthop. Res. 20(1), 122–129 (2002)
Article Google Scholar
Friesen, E., Ekman, P.: Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologist Press, Palo Alto (1978)
Google Scholar
Geronimo, D., Lopez, A.M., Sappa, A.D., Graf, T.: Survey of pedestrian detection for advanced driver assistance systems. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1239–1258 (2010)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Chapter Google Scholar
Hsu, H.W., Wu, T.Y., Wan, S., Wong, W.H., Lee, C.Y.: QuatNet: quaternion-based head pose estimation with multi-regression loss. IEEE Trans. Multimedia 21(4), 1035–1046 (2018)
Article Google Scholar
Izard, C.E.: Human Emotions. Springer, Heidelberg (2013)
Google Scholar
Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 2144–2151. IEEE (2011)
Google Scholar
Kumar, A., Alavi, A., Chellappa, R.: KEPLER: keypoint and pose estimation of unconstrained faces by learning efficient H-CNN regressors. In: 2017 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017), pp. 258–265. IEEE (2017)
Google Scholar
Kuwahara, J., Nakazato, H.: Driving assistance system, US Patent 9,855,892, 2 January 2018
Google Scholar
Leach, M.J., Baxter, R., Robertson, N.M., Sparks, E.P.: Detecting social groups in crowded surveillance videos using visual attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 461–467 (2014)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lepetit, V., Fua, P., et al.: Monocular model-based 3D tracking of rigid objects: a survey. Found. Trends® Comput. Graph. Vis. 1(1), 1–89 (2005)
Article Google Scholar
Leroy, J., Rocca, F., Mancas, M., Gosselin, B.: Second screen interaction: an approach to infer TV watcher’s interest using 3D head pose estimation. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 465–468. ACM (2013)
Google Scholar
Li, D., Pedrycz, W.: A central profile-based 3D face pose estimation. Pattern Recogn. 47(2), 525–534 (2014)
Article Google Scholar
Li, Y., Gong, S., Liddell, H.: Support vector regression and classification based multi-view face detection and recognition. In: Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580), pp. 300–305. IEEE (2000)
Google Scholar
Niyogi, S., Freeman, W.T.: Example-based head tracking. In: Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, pp. 374–378. IEEE (1996)
Google Scholar
Patacchiola, M., Cangelosi, A.: Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods. Pattern Recogn. 71, 132–143 (2017)
Article Google Scholar
van der Pol, D., Cuijpers, R.H., Juola, J.F.: Head pose estimation for a domestic robot. In: Proceedings of the 6th Conference on Human-Robot Interaction, pp. 277–278. ACM (2011)
Google Scholar
Ruiz, N., Chong, E., Rehg, J.M.: Fine-grained head pose estimation without keypoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2074–2083 (2018)
Google Scholar
Schiele, B., Waibel, A.: Gaze tracking based on face-color. In: International Workshop on Automatic Face and Gesture Recognition, vol. 476. University of Zurich Department of Computer Science Multimedia Laboratory (1995)
Google Scholar
Stiefelhagen, R.: Estimating head pose with neural networks-results on the Pointing04 ICPR workshop evaluation data. In: Proceedings of Pointing 2004 Workshop: Visual Observation of Deictic Gestures, vol. 1 (2004)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Veit, A., Wilber, M.J., Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: Advances in Neural Information Processing Systems, pp. 550–558 (2016)
Google Scholar
Wu, H., Zhang, K., Tian, G.: Simultaneous face detection and pose estimation using convolutional neural network cascade. IEEE Access 6, 49563–49575 (2018)
Article Google Scholar
Zhang, W., et al.: Cross-cascading regression for simultaneous head pose estimation and facial landmark detection. In: Zhou, J., et al. (eds.) CCBR 2018. LNCS, vol. 10996, pp. 148–156. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97909-0_16
Chapter Google Scholar
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886. IEEE (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Fraunhofer-Institute for Integrated Circuits IIS, Am Wolfsmantel 33, 91058, Erlangen, Germany
Ines Rieger, Thomas Hauenstein, Sebastian Hettenkofer & Jens-Uwe Garbas

Authors

Ines Rieger
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Hauenstein
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Hettenkofer
View author publications
You can also search for this author in PubMed Google Scholar
Jens-Uwe Garbas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ines Rieger .

Editor information

Editors and Affiliations

Institute for Software Technology, Graz University of Technology, Graz, Austria
Franz Wotawa
Department of Applied Informatics, University of Klagenfurt, Klagenfurt, Austria
Gerhard Friedrich
Institute for Software Technology, Graz University of Technology, Graz, Austria
Ingo Pill
Institute for Software Technology, Graz University of Technology, Graz, Austria
Roxane Koitz-Hristov
Department of Computer Science, Texas State University, San Marcos, TX, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rieger, I., Hauenstein, T., Hettenkofer, S., Garbas, JU. (2019). Towards Real-Time Head Pose Estimation: Exploring Parameter-Reduced Residual Networks on In-the-wild Datasets. In: Wotawa, F., Friedrich, G., Pill, I., Koitz-Hristov, R., Ali, M. (eds) Advances and Trends in Artificial Intelligence. From Theory to Practice. IEA/AIE 2019. Lecture Notes in Computer Science(), vol 11606. Springer, Cham. https://doi.org/10.1007/978-3-030-22999-3_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-22999-3_12
Published: 15 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22998-6
Online ISBN: 978-3-030-22999-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics