Abstract
We propose an efficient and accurate head orientation estimation algorithm using a monocular camera. Our approach is leveraged by deep neural network and we exploit the architecture in a data regression manner to learn the mapping function between visual appearance and three dimensional head orientation angles. Therefore, in contrast to classification based approaches, our system outputs continuous head orientation. The algorithm uses convolutional filters trained with a large number of augmented head appearances, thus it is user independent and covers large pose variations. Our key observation is that an input image having \(32 \times 32\) resolution is enough to achieve about 3 degrees of mean square error, which can be used for efficient head orientation applications. Therefore, our architecture takes only 1 ms on roughly localized head positions with the aid of GPU. We also propose particle filter based post-processing to enhance stability of the estimation further in video sequences. We compare the performance with the state-of-the-art algorithm which utilizes depth sensor and we validate our head orientation estimator on Internet photos and video.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 31, 607–626 (2009)
Foytik, J., Asari, V.K.: A two-layer framework for piecewise linear manifold-based head pose estimation. Int. J. Comput. Vis. (IJCV) 101, 270–287 (2013)
Zhu, X., Ramanan, D.: Face detection, pose estimation and landmark localization in the wild. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 2879–2886 (2012)
Ji, H., Liu, R., Su, F., Su, Z., Tian, Y.: Robust head pose estimation via convex regularized sparse regression. In: Proceedings of International Conference on Image Processing (ICIP), pp. 3617–3620 (2011)
Huang, C., Ding, X., Fang, C.: Head pose estimation based on random forests for multiclass classification. In: Proceedings of International Conference on Pattern Recognition (ICPR), pp. 934–937 (2010)
BenAbdelkader, C.: Robust head pose estimation using supervised manifold learning. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 518–531. Springer, Heidelberg (2010)
Aghajanian, J., Prince, S.J.: Face pose estimation in uncontrolled environments. In: Proceedings of British Machine Vision Conference (BMVC), pp. 1–11 (2009)
Gruji, N., Ili, S., Lepetit, V., Fua, P.: 3d facial pose estimation by image retrieval. In: 8th IEEE International Conference on Automatic Face and Gesture Recognition (2008)
Balasubramanian, V.N., Ye, J., Panchanathan, S.: Biased manifold embedding: a framework for person-independent head pose estimation. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 1–7 (2007)
Breitenstein, M.D., Kuettel, D., Weise, T., van Gool, L.: Real-time face pose estimation from single range images. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)
Padeleris, P., Zabulis, X., Argyros, A.A.: Head pose estimation on depth data based on particle swarm optimization. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 42–49 (2012)
Fanelli, G., Dantone, M., Gall, J., Fossati, A., Gool, L.V.: Random forests for real time 3d face analysis. Int. J. Comput. Vis. (IJCV) 101, 437–458 (2013)
Hug, Y., Chen, L., Zhoug, Y., Zhang, H.: Estimating face pose by facial asymmetry and geometry. In: 6th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2004, pp. 651–656 (2004)
Pathangay, V., Das, S., Greiner, T.: Symmetry-based face pose estimation from a single uncalibrated view. In: 8th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2008, pp. 1–8 (2008)
Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and application. Comput. Vis. Image Underst. (CVIU) 61, 38–59 (1995)
Cootes, T.F., Edwards, G., Taylor, C.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 23, 681–685 (2001)
Martins, P., Batista, J.: Accurate single view model-based head pose estimation. In: 8th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2008, pp. 1–6 (2008)
Morency, L.P., Whitehill, J., Movellan, J.: Monocular head pose estimation using generalized adaptive view-based appearance model. Image Vis. Comput. 28, 754–761 (2009)
Gourier, N., Hall, D., Crowley, J.L.: Estimating face orientation from robust detection of salient facial features. In: Proceedings of Pointing 2004, ICPR, International Workshop on Visual Observation of Deictic Gestures (2004)
Lecun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS) (2012)
Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 3626–3633 (2013)
Burger, H.C., Schuler, C.J., Harmeling, S.: Image denoising: can plain neural networks compete with bm3d? In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2012)
Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 3476–3483 (2013)
Zhou, E., Fan, H., Cao, Z., Jiang, Y., Yin, Q.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 386–391 (2013)
Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2014)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Coates, A., Lee, H., Ng, A.Y.: An analysis of single-layer networks in unsupervised feature learning. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 215–233 (2011)
Zhu, Z., Luo, P., Wang, X., Tang, X.: Recover canonical-view faces in the wild with deep neural networks. Computing Research Repository (CoRR), arXiv (2014)
Doucet, A., Freitas, N.D., Gorden, N.: Sequential Monte Carlo Methods in Practice. Springer, New York (2001)
Gordon, N., Salmond, D., Smith, A.: Novel approach to nonlinear/nongaussian Bayesian state estimation. IEE Proc. Radar Sig. Process. 140, 107–113 (1993)
Weise, T., Bouaziz, S., Li, H., Pauly, M.: Realtime performance-based facial animation. In: Proceedings of SIGGRAPH (2011)
Nuevo, J., Bergasa, L.M., Jiménez, P.: Rsmat: Robust simultaneous modeling and tracking. Pattern Recogn. Lett. 31, 2455–2463 (2010)
Acknowledgement
We appreciate constructive comments from anonymous reviewers. This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIP) (No. 2010-0028680).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ahn, B., Park, J., Kweon, I.S. (2015). Real-Time Head Orientation from a Monocular Camera Using Deep Neural Network. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9005. Springer, Cham. https://doi.org/10.1007/978-3-319-16811-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-16811-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16810-4
Online ISBN: 978-3-319-16811-1
eBook Packages: Computer ScienceComputer Science (R0)