Using Random Forests for the Estimation of Multiple Users’ Visual Focus of Attention from Head Pose

  • Silvia Rossi
  • Enrico Leone
  • Mariacarla Staffa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10037)


When interacting with a group of people, a robot requires the ability to compute people’s visual focus of attention in order to regulate the turn-taking, to determine attended objects, as well as to estimate the degree of users’ engagement. This work aims at evaluating the possibility of computing real-time multiple users’ focus of attention by combining a random forest approach for head pose estimation with the user’s head joint tracking. The system has been tested both on single users and on couples of users interacting with a simple scenario designed to guide the user attention towards a specific space region. The aim is to highlight the possible requirements and problems arising when dealing with the presence of multiple users. Results show that while the approach is promising, datasets that are different from the ones available in the literature are required in order to improve performance.


Visual focus of attention Human-robot interaction Random forests 



The research leading to these results has been supported by the RoDyMan project, which has received funding from the European Research Council FP7 Ideas under Advanced Grant agreement number 320992 and supported by the Italian National Project Security for Smart Cities PON-FSE Campania 2014-20. The authors are solely responsible for the content of this manuscript. The Authors thank Silvano Sorrentino for his contribution in code development.


  1. 1.
    Sidobre, D., Broqure, X., Mainprice, J., Burattini, E., Finzi, A., Rossi, S., Staffa, M.: Humanrobot interaction. In: Advanced Bimanual Manipulation. Volume 80 of Springer Tracts in Advanced Robotics. Springer Berlin Heidelberg (2012) 123–172Google Scholar
  2. 2.
    Staffa, M., De Gregorio, M., Giordano, M., Rossi, S.: Can you follow that guy? In: 22th European Symposium on Artificial Neural Networks, ESANN 2014, 23–25 April 2014, Bruges, Belgium, pp. 511–516 (2014)Google Scholar
  3. 3.
    Clabaugh, C., Ram, T., Matarić, M.J.: Estimating visual focus of attention in dyadic human-robot interaction for planar tasks. In: International Conference on Social Robotics, October 2015Google Scholar
  4. 4.
    Burattini, E., Finzi, A., Rossi, S., Staffa, M.: Monitoring strategies for adaptive periodic control in behavior-based robotic systems. In: Advanced Technologies for Enhanced Quality of Life, AT-EQUAL 2009, pp. 130–135, July 2009Google Scholar
  5. 5.
    Vatahska, T., Bennewitz, M., Behnke, S.: Feature-based head pose estimation from images. In: 7th IEEE-RAS International Conference on Humanoid Robots, pp. 330–335 (2007)Google Scholar
  6. 6.
    Burattini, E., Finzi, A., Rossi, S., Staffa, M.: Attentional human-robot interaction in simple manipulation tasks. In: Proceedings of the Seventh Annual ACM/IEEE International Conference on Human-Robot Interaction. HRI 2012, pp. 129–130. ACM, New York (2012)Google Scholar
  7. 7.
    Vinciarelli, A., Salamin, H., Polychroniou, A., Mohammadi, G., Origlia, A.: From nonverbal cues to perception: personality and social attractiveness. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds.) Cognitive Behavioural Systems. LNCS, vol. 7403, pp. 60–72. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-34584-5_5 CrossRefGoogle Scholar
  8. 8.
    Brinck, I.: Joint attention, triangulation and radical interpretation: a problem and its solution. Dialectica 58(2), 179–205 (2004)CrossRefGoogle Scholar
  9. 9.
    Broz, F., Kose-Bagci, H., Nehaniv, C.L., Dautenhahn, K.: Towards automated human-robot mutual gaze. In: Proceedings of International Conference on Advances in Computer-Human Interactions (ACHI) (2011)Google Scholar
  10. 10.
    Das, D., Rashed, M.G., Kobayashi, Y., Kuno, Y.: Supporting human-robot interaction based on the level of visual focus of attention. IEEE Trans. Hum. Mach. Syst. 45(6), 664–675 (2015)CrossRefGoogle Scholar
  11. 11.
    Nakano, Y.I., Ishii, R.: Estimating user’s engagement from eye-gaze behaviors in human-agent conversations. In: Proceedings of the 15th International Conference on Intelligent User Interfaces, pp. 139–148. ACM (2010)Google Scholar
  12. 12.
    Short, E., Matarić, M.J.: Towards robot moderators: understanding goal-directed multi-party interactions. In: AAAI Fall Symposium on Artificial Intelligence and Human-Robot Interaction, November 2015Google Scholar
  13. 13.
    Fanelli, G., Weise, T., Gall, J., Gool, L.: Real time head pose estimation from consumer depth cameras. In: Mester, R., Felsberg, M. (eds.) DAGM 2011. LNCS, vol. 6835, pp. 101–110. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-23123-0_11 CrossRefGoogle Scholar
  14. 14.
    Kennedy, J., Baxter, P., Belpaeme, T.: Head pose estimation is an inadequate replacement for eye gaze in child-robot interaction. In: Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction Extended Abstracts, HRI 2015, pp. 35–36. Extended Abstracts (2015)Google Scholar
  15. 15.
    Babcock, J.S., Pelz, J.B.: Building a lightweight eyetracking headgear. In: Proceedings of the 2004 Symposium on Eye Tracking Research & Applications, pp. 109–114. ACM (2004)Google Scholar
  16. 16.
    Balasubramanian, V., Ye, J., Panchanathan, S.: Biased manifold embedding: a framework for person-independent head pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1–7 (2007)Google Scholar
  17. 17.
    Muoz-Salinas, R., Yeguas-Bolivar, E., Saffiotti, A., Medina-Carnicer, R.: Multi-camera head pose estimation. Mach. Vis. Appl. 23(3), 479–490 (2012)CrossRefGoogle Scholar
  18. 18.
    Breitenstein, M., Kuettel, D., Weise, T., Van Gool, L., Pfister, H.: Real-time face pose estimation from single range images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)Google Scholar
  19. 19.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Voit, M., Stiefelhagen, R.: Deducing the visual focus of attention from head pose estimation in dynamic multi-view meeting scenarios. In: Proceedings of the 10th International Conference on Multimodal Interfaces, ICMI 2008, pp. 173–180. ACM (2008)Google Scholar
  21. 21.
    Stiefelhagen, R., Yang, J., Waibel, A.: Simultaneous tracking of head poses in a panoramic view. In: Proceedings of 15th International Conference on Pattern Recognition, vol. 3, pp. 722–725 (2000)Google Scholar
  22. 22.
    Johansson, M., Skantze, G., Gustafson, J.: Head pose patterns in multiparty human-robot team-building interactions. In: Herrmann, G., Pearson, M.J., Lenz, A., Bremner, P., Spiers, A., Leonards, U. (eds.) ICSR 2013. LNCS (LNAI), vol. 8239, pp. 351–360. Springer, Heidelberg (2013). doi: 10.1007/978-3-319-02675-6_35 CrossRefGoogle Scholar
  23. 23.
    Sheikhi, S., Odobez, J.-M.: Recognizing the visual focus of attention for human robot interaction. In: Salah, A.A., Ruiz-del-Solar, J., Meriçli, Ç., Oudeyer, P.-Y. (eds.) HBU 2012. LNCS, vol. 7559, pp. 99–112. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-34014-7_9 CrossRefGoogle Scholar
  24. 24.
    Fanelli, G., Dantone, M., Gall, J., Fossati, A., Van Gool, L.: Random forests for real time 3D face analysis. Int. J. Comput. Vis. 101(3), 437–458 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Department of Electrical Engineering and Information TechnologyUniversity of Naples Federico IINaplesItaly
  2. 2.Department of EngineeringUniversity of Naples ParthenopeNaplesItaly

Personalised recommendations