Biternion Nets: Continuous Head Pose Regression from Discrete Training Labels

  • Lucas Beyer
  • Alexander Hermans
  • Bastian Leibe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9358)


While head pose estimation has been studied for some time, continuous head pose estimation is still an open problem. Most approaches either cannot deal with the periodicity of angular data or require very fine-grained regression labels. We introduce biternion nets, a CNN-based approach that can be trained on very coarse regression labels and still estimate fully continuous \({360}^{\circ }\) head poses. We show state-of-the-art results on several publicly available datasets. Finally, we demonstrate how easy it is to record and annotate a new dataset with coarse orientation labels in order to obtain continuous head pose estimates using our biternion nets.

Supplementary material (1.5 mb)
Supplementary material 1 (zip 1526 KB)


  1. 1.
    Aghajanian, J., Prince, S.: Face pose estimation in uncontrolled environments. In: BMVC (2009)Google Scholar
  2. 2.
    Ba, S.O., Odobez, J.M.: Evaluation of multiple cue head pose estimation algorithms in natural environments. In: ICME (2005)Google Scholar
  3. 3.
    Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I.J., Bergeron, A., Bouchard, N., Bengio, Y.: Theano: new features and speed improvements. In: Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop (2012)Google Scholar
  4. 4.
    Baxter, R.H., Leach, M.J., Mukherjee, S.S., Robertson, N.M.: An adaptive motion model for person tracking with instantaneous head-pose features. IEEE Signal Process. Lett. 22(5), 578–582 (2015)CrossRefGoogle Scholar
  5. 5.
    Benfold, B., Reid, I.: Unsupervised learning of a scene-specific coarse gaze estimator. In: ICCV (2011)Google Scholar
  6. 6.
    Black Jr., J.A., Gargesha, M., Kahol, K., Kuchi, P., Panchanathan, S.: A framework for performance evaluation of face recognition algorithms. In: Proceedings of the SPIE, vol. 4862, pp. 163–174 (2002)Google Scholar
  7. 7.
    Chamveha, I., Sugano, Y., Sugimura, D., Siriteerakul, T., Okabe, T., Sato, Y., Sugimoto, A.: Head direction estimation from low resolution images with scene adaptation. CVIU 117(10), 1502–1511 (2013)Google Scholar
  8. 8.
    Chen, C., Odobez, J.M.: We are not contortionists: coupled adaptive learning for head and body orientation estimation in surveillance video. In: CVPR (2012)Google Scholar
  9. 9.
    Dantone, M., Gall, J., Fanelli, G., Van Gool, L.: Real-time facial feature detection using conditional regression forests. In: CVPR (2012)Google Scholar
  10. 10.
    Demirkus, M., Precup, D., Clark, J.J., Arbel, T.: Probabilistic temporal head pose estimation using a hierarchical graphical model. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part I. LNCS, vol. 8689, pp. 328–344. Springer, Heidelberg (2014) Google Scholar
  11. 11.
    Dollár, P., Welinder, P., Perona, P.: Cascaded pose regression. In: CVPR (2010)Google Scholar
  12. 12.
    Fanelli, G., Dantone, M., Gall, J., Fossati, A., Van Gool, L.: Random forests for real time 3D face analysis. IJCV 101(3), 437–458 (2013)CrossRefGoogle Scholar
  13. 13.
    Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: ICML (2013)Google Scholar
  14. 14.
    Gourier, N., Hall, D., Crowley, J.L.: Estimating Face orientation from robust detection of salient facial structures. In: ICPR 2004 FG Net Workshop (2004)Google Scholar
  15. 15.
    Gross, R., Matthews, I., Cohn, J., Kanade, T., Baker, S.: Multi-pie. Image Vis. Comput. 28(5), 807–813 (2010)CrossRefGoogle Scholar
  16. 16.
    Hara, K., Chellappa, R.: Growing regression forests by classification: applications to object pose estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part II. LNCS, vol. 8690, pp. 552–567. Springer, Heidelberg (2014) Google Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification (2015). arXiv preprint arXiv:1502.01852
  18. 18.
    He, K., Sigal, L., Sclaroff, S.: Parameterizing object detectors in the continuous pose space. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part IV. LNCS, vol. 8692, pp. 450–465. Springer, Heidelberg (2014) Google Scholar
  19. 19.
    Huang, D., Storer, M., De la Torre, F., Bischof, H.: Supervised local subspace learning for continuous head pose estimation. In: CVPR (2011)Google Scholar
  20. 20.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015). arXiv preprint arXiv:1502.03167
  21. 21.
    Lallemand, J., Ronge, A., Szczot, M., Ilic, S.: Pedestrian orientation estimation. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 476–487. Springer, Heidelberg (2014) Google Scholar
  22. 22.
    Mardia, K.V., Jupp, P.E.: Directional Statistics, vol. 494. Wiley, New york (2009)zbMATHGoogle Scholar
  23. 23.
    Montavon, G., Orr, G.B., Müller, K. (eds.): Neural Networks: Tricks of the Trade, 2nd edn. Springer, Berlin (2012) Google Scholar
  24. 24.
    Murphy-Chutorian, E., Doshi, A., Trivedi, M.M.: Head pose estimation for driver assistance systems: a robust algorithm and experimental evaluation. In: ITSC (2007)Google Scholar
  25. 25.
    Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: a survey. PAMI 31(4), 607–626 (2009)CrossRefGoogle Scholar
  26. 26.
    Osadchy, M., Cun, Y.L., Miller, M.L.: Synergistic face detection and pose estimation with energy-based models. JMLR 8, 1197–1215 (2007)Google Scholar
  27. 27.
    Pérez, F., Granger, B.E.: IPython: a system for interactive scientific computing. Comput. Sci. Eng. 9(3), 21–29 (2007). http://ipython.orgCrossRefGoogle Scholar
  28. 28.
    Qi, R.: Learning 3D Object Orientations From Synthetic Images (2015)Google Scholar
  29. 29.
    Saxe, A.M., McClelland, J.L., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In: ICLR (2014)Google Scholar
  30. 30.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  31. 31.
    Singhal, A.: Modern information retrieval: a brief overview. IEEE Data Eng. Bull. 24(4), 35–43 (2001)Google Scholar
  32. 32.
    Siriteerakul, T.: Advance in head pose estimation from low resolution images: a review. IJCSI 9(2) (2012)Google Scholar
  33. 33.
    Torki, M., Elgammal, A.: Regression from local features for viewpoint and pose estimation. In: ICCV (2011)Google Scholar
  34. 34.
    Tosato, D., Spera, M., Cristani, M., Murino, V.: Characterizing humans on riemannian manifolds. PAMI 35(8), 1972–1984 (2013)CrossRefGoogle Scholar
  35. 35.
    Wu, Y., Toyama, K.: Wide-range, person- and illumination-insensitive head orientation estimation. In: International Conference on Automatic Face and Gesture Recognition (2000)Google Scholar
  36. 36.
    Zeiler, M.D., Rob, F.: Stochastic pooling for regularization of deep convolutional neural networks. In: ICLR (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Open Access This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  1. 1.Visual Computing InstituteRWTH Aachen UniversityAachenGermany

Personalised recommendations