International Journal of Computer Vision

, Volume 109, Issue 1–2, pp 146–167 | Cite as

Exploring Transfer Learning Approaches for Head Pose Classification from Multi-view Surveillance Images

  • Anoop Kolar Rajagopal
  • Ramanathan SubramanianEmail author
  • Elisa Ricci
  • Radu L. Vieriu
  • Oswald Lanz
  • Ramakrishnan  Kalpathi R.
  • Nicu Sebe


Head pose classification from surveillance images acquired with distant, large field-of-view cameras is difficult as faces are captured at low-resolution and have a blurred appearance. Domain adaptation approaches are useful for transferring knowledge from the training (source) to the test (target) data when they have different attributes, minimizing target data labeling efforts in the process. This paper examines the use of transfer learning for efficient multi-view head pose classification with minimal target training data under three challenging situations: (i) where the range of head poses in the source and target images is different, (ii) where source images capture a stationary person while target images capture a moving person whose facial appearance varies under motion due to changing perspective, scale and (iii) a combination of (i) and (ii). On the whole, the presented methods represent novel transfer learning solutions employed in the context of multi-view head pose classification. We demonstrate that the proposed solutions considerably outperform the state-of-the-art through extensive experimental validation. Finally, the DPOSE dataset compiled for benchmarking head pose classification performance with moving persons, and to aid behavioral understanding applications is presented in this work.


Transfer learning Multi-view head pose classification  Varying acquisition conditions Moving persons 



The authors gratefully acknowledge partial support from Singapore’s Agency for Science, Technology and Research (A*STAR) under the Human Sixth Sense Programme (HSSP) grant, EIT ICT Labs SSP 12205 Activity TIK—The Interaction Toolkit, tasks T1320A-T1321A and the FP7 EU project DALI.

Supplementary material

Supplementary material 1 (mp4 5174 KB)


  1. Benfold, B., & Reid, I. (2011). Unsupervised learning of a scene-specific coarse gaze estimator. In International Conference on Computer Vision (pp. 2344–2351).Google Scholar
  2. Chen, C., & Odobez, J.-M. (2012). We are not contortionists: Coupled adaptive learning for head and body orientation estimation in surveillance video. In Computer Vision and Pattern Recognition (pp. 1544–1551).Google Scholar
  3. Dai, W., Yang, Q., Xue, G. R., & Yu, Y. (2007). Boosting for transfer learning. In International Conference on Machine Learning (pp. 193–200).Google Scholar
  4. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition (pp. 886–893).Google Scholar
  5. Daume, H. (2007). Frustratingly easy domain adaptation. In Proceedings of Association for Computational Linguistics (pp. 256–263).Google Scholar
  6. Doshi, A., & Trivedi, M. M. (2012). Head and eye gaze dynamics during visual attention shifts in complex environments. Journal of Vision, 12(2), 1–16.CrossRefGoogle Scholar
  7. Duan, L., Tsang, I. W., & Xu, D. (2012). Domain transfer multiple kernel learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 465–479.CrossRefGoogle Scholar
  8. Duan, L., Tsang, I. W., Xu, D., & Chua, T.-S. (2009). Domain adaptation from multiple sources via auxiliary classifiers. In International Conference on Machine Learning (pp. 289–296).Google Scholar
  9. Farhadi, A., & Tabrizi, M. K. (2008). Learning to recognize activities from the wrong view point. In European Conference on Computer Vision (pp. 154–166).Google Scholar
  10. Ferencz, A., Learned-Miller, E. G., & Malik, J. (2008). Learning to locate informative features for visual identification. International Journal of Computer Vision, 77(1–3), 3–24.CrossRefGoogle Scholar
  11. HOSDB. (2006). Imagery library for intelligent detection systems (i-lids). In IEEE Crime and Security.Google Scholar
  12. Jiang, J., & Zhai, C. (2007). Instance weighting for domain adaptation in nlp. In Association of Computational Linguistics (pp. 264–271).Google Scholar
  13. Katzenmaier, M., Stiefelhagen, R., & Schultz, T. (2004). Identifying the addressee in human-human-robot interactions based on head pose and speech. In International Conference on Multimodal Interfaces (pp. 144–151).Google Scholar
  14. Kulis, B., Saenko, K., & Darrell, T. (2011). What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In Computer Vision and Pattern Recognition (pp. 1785–1792).Google Scholar
  15. Lanz, O. (2006). Approximate bayesian multibody tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1436–1449.CrossRefGoogle Scholar
  16. Lanz, O., & Brunelli, R. (2008). Joint bayesian tracking of head location and pose from low-resolution video. In R. Stiefelhagen, R. Bowers, & J. G. Fiscus (Eds.), Multimodal technologies for perception of humans, Lecture Notes in Computer Science (Vol. 4625, pp. 287–296). Heidelberg: Springer.Google Scholar
  17. Lepri, B., Subramanian, R., Kalimeri, K., Staiano, J., Pianesi, F., & Sebe, N. (2012). Connecting meeting behavior with extraversion–A systematic study. IEEE Transactions on Affective Computing, 3(4), 443–455.CrossRefGoogle Scholar
  18. Lim, J. J., Salakhutdinov, R., & Torralba, A. (2011). Transfer learning by borrowing examples for multiclass object detection. In Advances in Neural Information Processing Systems (pp. 118–126).Google Scholar
  19. Muñoz-Salinas, R., Yeguas-Bolivar, E., Saffiotti, A., & Carnicer, R. M. (2012). Multi-camera head pose estimation. Machine Vision and Applications, 23(3), 479–490.CrossRefGoogle Scholar
  20. Murphy-Chutorian, E., & Trivedi, M. M. (2009). Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 607–626.CrossRefGoogle Scholar
  21. Orozco, J., Gong, S., & Xiang, T. (2009). Head pose classification in crowded scenes. In British Machine Vision Conference (pp. 1– 11).Google Scholar
  22. Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.CrossRefGoogle Scholar
  23. Pardoe, D., & Stone, P. (2010). Boosting for regression transfer. In International Conference on Machine Learning (pp. 863–870).Google Scholar
  24. Rajagopal, A., Subramanian, R., Vieriu, R. L., Ricci, E., Lanz, O., Sebe, N., & Ramakrishnan, K. (2012). An adaptation framework for head pose estimation in dynamic multi-view scenarios. In Asian Conference on Computer Vision (pp. 652–666).Google Scholar
  25. Ricci, E., & Odobez, J.-M. (2009). Learning large margin likelihoods for realtime head pose tracking. In International Conference on Image Processing (pp. 2593–2596).Google Scholar
  26. Smith, K., Ba, S. O., Odobez, J.-M., & Gatica-Perez, D. (2008). Tracking the visual focus of attention for a varying number of wandering people. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7), 1212–1229.CrossRefGoogle Scholar
  27. Stiefelhagen, R., Bowers, R., & Fiscus, J. G. (2007). Multimodal Technologies for Perception of Humans. In International evaluation workshops CLEAR 2007 and RT 2007, Baltimore, MD, May 8–11, 2007, Revised Selected Papers (Vol. 4625). Heidelberg: Springer.Google Scholar
  28. Subramanian, R., Staiano, J., Kalimeri, K., Sebe, N., & Pianesi, F. (2010). Putting the pieces together: Multimodal analysis of social attention in meetings. In Acm Int’l Conference on Multimedia (pp. 659–662).Google Scholar
  29. Subramanian, R., Yan, Y., Staiano, J., Lanz, O., & Sebe, N. (2013). On the relationship between head pose, social attention and personality prediction for unstructured and dynamic group interactions. In Acm Int’l Conference on Multimodal Interfaces.Google Scholar
  30. Tosato, D., Farenzena, M., Spera, M., Murino, V., & Cristani, M. (2010). Multi-class classification on riemannian manifolds for video surveillance. In European Conference on Computer Vision (pp. 378–391).Google Scholar
  31. Voit, M., & Stiefelhagen, R. (2009). A system for probabilistic joint 3d head tracking and pose estimation in low-resolution, multi-view environments. In Computer Vision Systems (pp. 415–424).Google Scholar
  32. Wang, X., Han, T. X., & Yan, S. (2009). An hog-lbp human detector with partial occlusion handling. In International Conference on Computer Vision (pp. 32–39).Google Scholar
  33. Williams, C., Bonilla, E. V., & Chai, K. M. (2007). Multi-task gaussian process prediction. In Advances in Neural Information Processing Systems (pp. 153–160).Google Scholar
  34. Yan, Y., Subramanian, R., Lanz, O., & Sebe, N. (2012). Active transfer learning for multi-view head-pose classification. In Int’l Conference on Pattern Recognition (pp. 1168–1171).Google Scholar
  35. Yan, Y., Ricci, E., Subramanian, R., Lanz, O., & Sebe, N. (2013) No matter where you are: Flexible graph-guided multi-task learning for multi-view head pose classification under target motion. In Int’l Conference on Computer Vision.Google Scholar
  36. Yang, J., Yan, R., & Hauptmann, A. G. (2007). Cross-domain video concept detection using adaptive svms. In Acm Int’l Conference on Multimedia (pp. 188–197).Google Scholar
  37. Yang, W., Wang, Y., & Mori, G. (2009). Human action recognition from a single clip per action. In Int’l Workshop on Machine learning for Vision-Based Motion Analysis. Google Scholar
  38. Yang, W., Wang, Y., & Mori. G. (2010). Efficient human action detection using a transferable distance function. In Asian Conference on Computer Vision (pp. 417–426).Google Scholar
  39. Zabulis, X., Sarmis, T., & Argyros, A. A. (2009). 3d head pose estimation from multiple distant views. In British Machine Vision Conference (pp. 1–12).Google Scholar
  40. Zhang, Y., & Yeung, D.-Y. (2010). A convex formulation for learning task relationships in multi-task learning. In Uncertainity in Artificial Intelligence (pp. 733–742).Google Scholar
  41. Zheng, J., Jiang, Z., Phillips, J., & Chellappa, R. (2012). Cross-view action recognition via a transferable dictionary pair. In British Machine Vision Conference (pp. 1–11).Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Anoop Kolar Rajagopal
    • 1
  • Ramanathan Subramanian
    • 2
    Email author
  • Elisa Ricci
    • 3
    • 4
  • Radu L. Vieriu
    • 5
  • Oswald Lanz
    • 3
  • Ramakrishnan  Kalpathi R.
    • 1
  • Nicu Sebe
    • 5
  1. 1.Department of Electrical EngineeringIndian Institute of ScienceBangaloreIndia
  2. 2.Advanced Digital Sciences Center (ADSC)University of Illinois at Urbana-ChampaignSingaporeSingapore
  3. 3.Fondazione Bruno KesslerTrentoItaly
  4. 4.Department of Electrical and Information EngineeringUniversity of PerugiaPerugiaItaly
  5. 5.Department of Computer Science and Information Engineering (DISI)TrentoItaly

Personalised recommendations