Real-Time Head Orientation from a Monocular Camera Using Deep Neural Network

Ahn, Byungtae; Park, Jaesik; Kweon, In So

doi:10.1007/978-3-319-16811-1_6

Byungtae Ahn¹⁷,
Jaesik Park¹⁷ &
In So Kweon¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9005))

Included in the following conference series:

Asian Conference on Computer Vision

3166 Accesses
26 Citations

Abstract

We propose an efficient and accurate head orientation estimation algorithm using a monocular camera. Our approach is leveraged by deep neural network and we exploit the architecture in a data regression manner to learn the mapping function between visual appearance and three dimensional head orientation angles. Therefore, in contrast to classification based approaches, our system outputs continuous head orientation. The algorithm uses convolutional filters trained with a large number of augmented head appearances, thus it is user independent and covers large pose variations. Our key observation is that an input image having \(32 \times 32\) resolution is enough to achieve about 3 degrees of mean square error, which can be used for efficient head orientation applications. Therefore, our architecture takes only 1 ms on roughly localized head positions with the aid of GPU. We also propose particle filter based post-processing to enhance stability of the estimation further in video sequences. We compare the performance with the state-of-the-art algorithm which utilizes depth sensor and we validate our head orientation estimator on Internet photos and video.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 31, 607–626 (2009)
Article Google Scholar
Foytik, J., Asari, V.K.: A two-layer framework for piecewise linear manifold-based head pose estimation. Int. J. Comput. Vis. (IJCV) 101, 270–287 (2013)
Article MathSciNet Google Scholar
Zhu, X., Ramanan, D.: Face detection, pose estimation and landmark localization in the wild. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 2879–2886 (2012)
Google Scholar
Ji, H., Liu, R., Su, F., Su, Z., Tian, Y.: Robust head pose estimation via convex regularized sparse regression. In: Proceedings of International Conference on Image Processing (ICIP), pp. 3617–3620 (2011)
Google Scholar
Huang, C., Ding, X., Fang, C.: Head pose estimation based on random forests for multiclass classification. In: Proceedings of International Conference on Pattern Recognition (ICPR), pp. 934–937 (2010)
Google Scholar
BenAbdelkader, C.: Robust head pose estimation using supervised manifold learning. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 518–531. Springer, Heidelberg (2010)
Chapter Google Scholar
Aghajanian, J., Prince, S.J.: Face pose estimation in uncontrolled environments. In: Proceedings of British Machine Vision Conference (BMVC), pp. 1–11 (2009)
Google Scholar
Gruji, N., Ili, S., Lepetit, V., Fua, P.: 3d facial pose estimation by image retrieval. In: 8th IEEE International Conference on Automatic Face and Gesture Recognition (2008)
Google Scholar
Balasubramanian, V.N., Ye, J., Panchanathan, S.: Biased manifold embedding: a framework for person-independent head pose estimation. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 1–7 (2007)
Google Scholar
Breitenstein, M.D., Kuettel, D., Weise, T., van Gool, L.: Real-time face pose estimation from single range images. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)
Google Scholar
Padeleris, P., Zabulis, X., Argyros, A.A.: Head pose estimation on depth data based on particle swarm optimization. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 42–49 (2012)
Google Scholar
Fanelli, G., Dantone, M., Gall, J., Fossati, A., Gool, L.V.: Random forests for real time 3d face analysis. Int. J. Comput. Vis. (IJCV) 101, 437–458 (2013)
Article Google Scholar
Hug, Y., Chen, L., Zhoug, Y., Zhang, H.: Estimating face pose by facial asymmetry and geometry. In: 6th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2004, pp. 651–656 (2004)
Google Scholar
Pathangay, V., Das, S., Greiner, T.: Symmetry-based face pose estimation from a single uncalibrated view. In: 8th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2008, pp. 1–8 (2008)
Google Scholar
Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and application. Comput. Vis. Image Underst. (CVIU) 61, 38–59 (1995)
Article Google Scholar
Cootes, T.F., Edwards, G., Taylor, C.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 23, 681–685 (2001)
Article Google Scholar
Martins, P., Batista, J.: Accurate single view model-based head pose estimation. In: 8th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2008, pp. 1–6 (2008)
Google Scholar
Morency, L.P., Whitehill, J., Movellan, J.: Monocular head pose estimation using generalized adaptive view-based appearance model. Image Vis. Comput. 28, 754–761 (2009)
Article Google Scholar
Gourier, N., Hall, D., Crowley, J.L.: Estimating face orientation from robust detection of salient facial features. In: Proceedings of Pointing 2004, ICPR, International Workshop on Visual Observation of Deictic Gestures (2004)
Google Scholar
Lecun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS) (2012)
Google Scholar
Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 3626–3633 (2013)
Google Scholar
Burger, H.C., Schuler, C.J., Harmeling, S.: Image denoising: can plain neural networks compete with bm3d? In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 3476–3483 (2013)
Google Scholar
Zhou, E., Fan, H., Cao, Z., Jiang, Y., Yin, Q.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 386–391 (2013)
Google Scholar
Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Article Google Scholar
Coates, A., Lee, H., Ng, A.Y.: An analysis of single-layer networks in unsupervised feature learning. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 215–233 (2011)
Google Scholar
Zhu, Z., Luo, P., Wang, X., Tang, X.: Recover canonical-view faces in the wild with deep neural networks. Computing Research Repository (CoRR), arXiv (2014)
Google Scholar
Doucet, A., Freitas, N.D., Gorden, N.: Sequential Monte Carlo Methods in Practice. Springer, New York (2001)
Book MATH Google Scholar
Gordon, N., Salmond, D., Smith, A.: Novel approach to nonlinear/nongaussian Bayesian state estimation. IEE Proc. Radar Sig. Process. 140, 107–113 (1993)
Article Google Scholar
Weise, T., Bouaziz, S., Li, H., Pauly, M.: Realtime performance-based facial animation. In: Proceedings of SIGGRAPH (2011)
Google Scholar
Nuevo, J., Bergasa, L.M., Jiménez, P.: Rsmat: Robust simultaneous modeling and tracking. Pattern Recogn. Lett. 31, 2455–2463 (2010)
Article Google Scholar

Download references

Acknowledgement

We appreciate constructive comments from anonymous reviewers. This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIP) (No. 2010-0028680).

Author information

Authors and Affiliations

KAIST, Daejeon, Republic of Korea
Byungtae Ahn, Jaesik Park & In So Kweon

Authors

Byungtae Ahn
View author publications
You can also search for this author in PubMed Google Scholar
Jaesik Park
View author publications
You can also search for this author in PubMed Google Scholar
In So Kweon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to In So Kweon .

Editor information

Editors and Affiliations

Technische Universität München, Garching, Bayern, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ahn, B., Park, J., Kweon, I.S. (2015). Real-Time Head Orientation from a Monocular Camera Using Deep Neural Network. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9005. Springer, Cham. https://doi.org/10.1007/978-3-319-16811-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-16811-1_6
Published: 16 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16810-4
Online ISBN: 978-3-319-16811-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics