Probabilistic Temporal Head Pose Estimation Using a Hierarchical Graphical Model

  • Meltem Demirkus
  • Doina Precup
  • James J. Clark
  • Tal Arbel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8689)


We present a hierarchical graphical model to probabilistically estimate head pose angles from real-world videos, that leverages the temporal pose information over video frames. The proposed model employs a number of complementary facial features, and performs feature level, probabilistic classifier level and temporal level fusion. Extensive experiments are performed to analyze the pose estimation performance for different combination of features, different levels of the proposed hierarchical model and for different face databases. Experiments show that the proposed head pose model improves on the current state-of-the-art for the unconstrained McGillFaces [10] and the constrained CMU Multi-PIE [14] databases, increasing the pose classification accuracy compared to the current top performing method by 19.38% and 19.89%, respectively.


Face hierarchical probabilistic video graphical temporal head pose 


  1. 1.
    Berg, A.C., Berg, T.L., Malik, J.: Shape matching and object recognition using low distortion correspondences. In: Proc. International Conference on Computer Vision and Pattern Recognition, CVPR (2005)Google Scholar
  2. 2.
    Aghajanian, J., Prince, S.: Face pose estimation in uncontrolled environments. In: Cavallaro, A., Prince, S., Alexander, D.C. (eds.) BMVC, pp. 1–11. British Machine Vision Association (2009)Google Scholar
  3. 3.
    Balasubramanian, V.N., Ye, J., Panchanathan, S.: Biased manifold embedding: A framework for person-independent head pose estimation. In: CVPR. IEEE Computer Society (2007)Google Scholar
  4. 4.
    BenAbdelkader, C.: Robust head pose estimation using supervised manifold learning. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 518–531. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Berg, A.C., Malik, J.: Geometric blur for template matching. In: Proceedings of the 2001 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, pp. 607–614 (2001)Google Scholar
  6. 6.
    Beymer, D.: Face recognition under varying pose. In: Proceedings of the 1994 IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 1994, pp. 756–761 (June 1994)Google Scholar
  7. 7.
    Blanz, V., Grother, P., Phillips, P., Vetter, T.: Face recognition based on frontal views generated from non-frontal images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 2, pp. 454–461 (June 2005)Google Scholar
  8. 8.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  9. 9.
    Burghouts, G., Geusebroek, J.: Performance evaluation of local color invariants. Computer Vision and Image Understanding (CVIU) 113, 48–62 (2009)CrossRefGoogle Scholar
  10. 10.
    Demirkus, M., Clark, J.J., Arbel, T.: Robust semi-automatic head pose labeling for real-world face video sequences. Multimedia Tools and Applications, 1–29 (2013)Google Scholar
  11. 11.
    Demirkus, M., Oreshkin, B.N., Clark, J.J., Arbel, T.: Spatial and probabilistic codebook template based head pose estimation from unconstrained environments. In: Macq, B., Schelkens, P. (eds.) ICIP, pp. 573–576. IEEE (2011)Google Scholar
  12. 12.
    Demirkus, M., Precup, D., Clark, J.J., Arbel, T.: Soft biometric trait classification from real-world face videos conditioned on head pose estimation. In: CVPR Workshops, pp. 130–137. IEEE (2012)Google Scholar
  13. 13.
    Demirkus, M., Precup, D., Clark, J.J., Arbel, T.: Multi-layer temporal graphical model for head pose estimation in real-world videos. In: ICIP. IEEE (2014)Google Scholar
  14. 14.
    Gross, R., Matthews, I., Cohn, J., Kanade, T., Baker, S.: Multi-pie. Image Vision Comput. 28(5), 807–813 (2010)CrossRefGoogle Scholar
  15. 15.
    Hansen, N., Müller, S.D., Koumoutsakos, P.: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es). Evol. Comput. 11(1), 1–18 (2003)CrossRefGoogle Scholar
  16. 16.
    Hassner, T.: Viewing real-world faces in 3D. In: The IEEE International Conference on Computer Vision, ICCV (December 2013)Google Scholar
  17. 17.
    Hu, C., Xiao, J., Matthews, I., Baker, S., Cohn, J.F., Kanade, T.: Fitting a single active appearance model simultaneously to multiple images. In: Hoppe, A., Barman, S., Ellis, T. (eds.) BMVC, pp. 1–10. BMVA Press (2004)Google Scholar
  18. 18.
    Hu, N., Huang, W., Ranganath, S.: Head pose estimation by non-linear embedding and mapping. In: ICIP (2), pp. 342–345. IEEE (2005)Google Scholar
  19. 19.
    Hua, G., Yang, M., Learned-Miller, E., Ma, Y., Turk, M., Kriegman, D., Huang, T.: Special section on real-world face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33(10), 1921–1924 (2011)CrossRefGoogle Scholar
  20. 20.
    Huang, D., Storer, M., De la Torre, F., Bischof, H.: Supervised local subspace learning for continuous head pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2928 (2011)Google Scholar
  21. 21.
    Kim, J., Grauman, K.: Boundary preserving dense local regions. In: Proc. International Conference on Computer Vision and Pattern Recognition, CVPR (2011)Google Scholar
  22. 22.
    Kumar, N., Berg, A., Belhumeur, P., Nayar, S.: Describable visual attributes for face verification and image search. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 33(10), 1962–1977 (2011)CrossRefGoogle Scholar
  23. 23.
    Li, H., Hua, G., Lin, Z., Brandt, J., Yang, J.: Probabilistic elastic matching for pose variant face verification. In: CVPR, pp. 3499–3506 (2013)Google Scholar
  24. 24.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV) 60(2), 91–110 (2004)CrossRefGoogle Scholar
  25. 25.
    Morency, L., Rahimi, A., Checka, N., Darrell, T.: Fast stereo-based head tracking for interactive environments. In: Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 390–395 (May 2002)Google Scholar
  26. 26.
    Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 607–626 (2009)CrossRefGoogle Scholar
  27. 27.
    Oka, K., Sato, Y., Nakanishi, Y., Koike, H.: Head pose estimation system based on particle filtering with adaptive diffusion control. In: MVA, pp. 586–589 (2005)Google Scholar
  28. 28.
    Orozco, J., Gong, S., Xiang, T.: Head pose classification in crowded scenes. In: BMVC. British Machine Vision Association (2009)Google Scholar
  29. 29.
    Pearl, J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann (1988)Google Scholar
  30. 30.
    Raytchev, B., Yoda, I., Sakaue, K.: Head pose estimation by nonlinear manifold learning. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 4, pp. 462–466 (August 2004)Google Scholar
  31. 31.
    Van de Sande, K.E.A., Gevers, T.: Evaluating color descriptors for object and scene recognition. IEEE PAMI 32(9), 1582–1596 (2010)CrossRefGoogle Scholar
  32. 32.
    Sherrah, J., Gong, S.: Fusion of perceptual cues for robust tracking of head pose and position. Pattern Recognition 34(8), 1565–1572 (2001)CrossRefzbMATHGoogle Scholar
  33. 33.
    Toews, M., Arbel, T.: Detection, localization, and sex classification of faces from arbitrary viewpoints and under occlusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(9), 1567–1581 (2009)CrossRefGoogle Scholar
  34. 34.
    Torki, M., Elgammal, A.: Regression from local features for viewpoint and pose estimation. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2603–2610 (November 2011)Google Scholar
  35. 35.
    Tosato, D., Farenzena, M., Spera, M., Murino, V., Cristani, M.: Multi-class classification on riemannian manifolds for video surveillance. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 378–391. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  36. 36.
    Tuytelaars, T.: Mikolajczyk, K.: Local invariant feature detectors: a survey. Foundations and Trends in Computer Graphics and Vision 3(3), pp. 177–280 (2008)Google Scholar
  37. 37.
    Tuytelaars, T., Mikolajczyk, K.: A survey on local invariant features. Tutorial at ECCV (2006)Google Scholar
  38. 38.
    Wolf, L., Hassner, T., Maoz, I.: Face recognition in unconstrained videos with matched background similarity. In: Proc. IEEE Conf. Comput. Vision Pattern Recognition (2011)Google Scholar
  39. 39.
    Wu, J., Trivedi, M.M.: A two-stage head pose estimation framework and evaluation. Pattern Recogn. 41(3), 1138–1158 (2008)CrossRefzbMATHGoogle Scholar
  40. 40.
    Xiong, X., De la Torre, F.: Supervised descent method and its application to face alignment. In: Proc. International Conference on Computer Vision and Pattern Recognition, CVPR (2013)Google Scholar
  41. 41.
    Yi, D., Lei, Z., Li, S.Z.: Towards pose robust face recognition. In: CVPR, pp. 3539–3545 (2013)Google Scholar
  42. 42.
    Zhao, G., Chen, L., Song, J., Chen, G.: Large head movement tracking using sift-based registration. In: Lienhart, R., Prasad, A.R., Hanjalic, A., Choi, S., Bailey, B.P., Sebe, N. (eds.) ACM Multimedia, pp. 807–810. ACM (2007)Google Scholar
  43. 43.
    Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2879–2886 (June 2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Meltem Demirkus
    • 1
  • Doina Precup
    • 1
  • James J. Clark
    • 1
  • Tal Arbel
    • 1
  1. 1.Centre for Intelligent MachinesMcGill UniversityMontrealCanada

Personalised recommendations