Visual Estimation of Attentive Cues in HRI: The Case of Torso and Head Pose

  • Markos SigalasEmail author
  • Maria Pateraki
  • Panos Trahanias
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9163)


Capturing visual human-centered information is a fundamental input source for effective and successful human-robot interaction (HRI) in dynamic multi-party social settings. Torso and head pose, as forms of nonverbal communication, support the derivation people’s focus of attention, a key variable in the analysis of human behaviour in HRI paradigms encompassing social aspects. Towards this goal, we have developed a model-based approach for torso and head pose estimation to overcome key limitations in free-form interaction scenarios and issues of partial intra- and inter-person occlusions. The proposed approach builds up on the concept of Top View Re-projection (TVR) to uniformly treat the respective body parts, modelled as cylinders. For each body part a number of pose hypotheses is sampled from its configuration space. Each pose hypothesis is evaluated against the a scoring function and the hypothesis with the best score yields for the assumed pose and the location of the joints. A refinement step on head pose is applied based on tracking facial patch deformations to compute for the horizontal off-plane rotation. The overall approach forms one of the core component of a vision system integrated in a robotic platform that supports socially appropriate, multi-party, multimodal interaction in a bartending scenario. Results in the robot’s environment during real HRI experiments with varying number of users attest for the effectiveness of our approach.


Body pose estimation Head pose Model-based Tracking Particle filtering 



This work was partially supported by the European Commission under contract number FP7-270435 (JAMES project).


  1. 1.
    Baltzakis, H., Trahanias, P.: Hybrid mobile robot localization using switching state-space models. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 366–373 (2002)Google Scholar
  2. 2.
    Tsonis, V.S., Chandrinos, K.V., Trahanias, P.E.: Landmark-based navigation using projective invariants. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 342–347 (1998)Google Scholar
  3. 3.
    Baltzakis, H., Argyros, A.A., Lourakis, M.I.A., Trahanias, P.: Tracking of human hands and faces through probabilistic fusion of multiple visual cues. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 33–42. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  4. 4.
    Sigalas, M., Baltzakis, H., Trahanias, P.: Gesture recognition based on arm tracking for human-robot interaction. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5424–5429 (2010)Google Scholar
  5. 5.
    Langton, S.R., Honeyman, H., Tessler, E.: The influence of head contour and nose angle on the perception of eye-gaze direction. Percept. Psychophysics 66(5), 752–771 (2004)CrossRefGoogle Scholar
  6. 6.
    Moeslund, T.B., Hilton, A., Kruger, V., Sigal, L. (eds.): Visual Analysis of Humans - Looking at People. Springer, London (2011) Google Scholar
  7. 7.
    Murphy-Chutorian, E., Trivedi, M.: Head pose estimation in computer vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 31, 607–626 (2009)CrossRefGoogle Scholar
  8. 8.
    Microsoft kinect for xbox 360Google Scholar
  9. 9.
    Escalera, S.: Human behavior analysis from depth maps. In: Perales, F.J., Fisher, R.B., Moeslund, T.B. (eds.) AMDO 2012. LNCS, vol. 7378, pp. 282–292. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  10. 10.
    Shotton, J., et al.: Efficient human pose estimation from single depth images. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2821–2840 (2013)CrossRefGoogle Scholar
  11. 11.
    Fanelli, G., Gall, J., Gool, L.V.: Real time head pose estimation with random regression forests. In: Proceedings on Computer Vision and Pattern Recognition (CVPR), pp. 617–624 (2011)Google Scholar
  12. 12.
    Cai, Q., Gallup, D., Zhang, C., Zhang, Z.: 3d deformable face tracking with a commodity depth camera. In: Proceedings of the 11th European Conference on Computer Vision: Part III. ECCV 2010, pp. 229–242. Springer-Verlag, Heidelberg (2010)Google Scholar
  13. 13.
    Zhu, Y., Fujimura, K.: Constrained optimization for human pose estimation from depth sequences. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007, Part I. LNCS, vol. 4843, pp. 408–418. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  14. 14.
    Ye, M., Wang, X., Yang, R., Ren, L., Pollefeys, M.: Accurate 3d pose estimation from a single depth image. In: IEEE International Conference on Computer Vision (ICCV), pp. 731–738 (2011)Google Scholar
  15. 15.
    Yang, R., Zhang, Z.: Model-based head pose tracking with stereovision. In: Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, pp. 255–260 (2001)Google Scholar
  16. 16.
    Sigalas, M., Pateraki, M., Trahanias, P.: Robust articulated upper body pose tracking under severe occlusions. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4104–4111 (2014)Google Scholar
  17. 17.
    NASA: Man-systems integration standards - revision b (1995)Google Scholar
  18. 18.
    Stenger, B., Thayananthan, A., Torr, P.H., Cipolla, R.: Model-based hand tracking using a hierarchical bayesian filter. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1372–1384 (2006)CrossRefGoogle Scholar
  19. 19.
    Pateraki, M., Baltzakis, H., Trahanias, P.: Visual estimation of pointed targets for robot guidance via fusion of face pose and hand orientation. Comput. Vision Image Underst. 120, 1–13 (2014)CrossRefGoogle Scholar
  20. 20.
    Pateraki, M., Baltzakis, H., Trahanias, P.: Using dempster’s rule of combination to robustly estimate pointed targets. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 1218–1225 (2012)Google Scholar
  21. 21.
    Giuliani, M., et al.: Comparing task-based and socially intelligent behaviour in a robot bartender. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction (ICMI), pp. 263–270, New York (2013)Google Scholar
  22. 22.
    Foster, M., et al.: Two people walk into a bar: dynamic multi-party social interaction with a robot agent. In: Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI), pp. 3–10 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Markos Sigalas
    • 1
    • 2
    Email author
  • Maria Pateraki
    • 1
  • Panos Trahanias
    • 1
    • 2
  1. 1.Institute of Computer Science, Foundation for Research and Technology - HellasHeraklionGreece
  2. 2.Department of Computer ScienceUniversity of CreteRethymnoGreece

Personalised recommendations