Investigating shared attention with a virtual agent using a gaze-based interface

  • 208 Accesses

  • 24 Citations


This paper investigates the use of a gaze-based interface for testing simple shared attention behaviours during an interaction scenario with a virtual agent. The interface is non-intrusive, operating in real-time using a standard web-camera for input, monitoring users’ head directions and processing them in real-time for resolution to screen coordinates. We use the interface to investigate user perception of the agent’s behaviour during a shared attention scenario. Our aim is to elaborate important factors to be considered when constructing engagement models that must account not only for behaviour in isolation, but also for the context of the interaction, as is the case during shared attention situations.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.


  1. 1.

    Picard RW (1997) Affective computing. MIT Press, Cambridge

  2. 2.

    Baron-Cohen S (1994) How to build a baby that can read minds: cognitive mechanisms in mind reading. Cah Psychol Cogn 13:513–552

  3. 3.

    Scassellati B (1996) Mechanisms of shared attention for a humanoid robot. In: Embodied cognition and action: papers from the 1996 AAAI fall symposium. AAAI, Menlo Park

  4. 4.

    Peters C, Castellano G, de Freitas S (2009) An exploration of user engagement in HCI. In: Proceedings of the affective-aware virtual agents and social robots (AFFINE) workshop, international conference on multimodal interfaces (ICMI’09). ACM, Cambridge

  5. 5.

    El Kaliouby R, Robinson P (2005) Generalization of a vision-based computational model of mind-reading. In: ACII 2005: proceedings of the first international conference on affective computing and intelligent interaction, pp 582–589

  6. 6.

    Mota S, Picard RW (2003) Automated posture analysis for detecting learner’s interest level. In: Computer vision and pattern recognition workshop, vol 5, p 49. IEEE Comput Soc, Los Alamitos

  7. 7.

    Castellano G, Pereira A, Leite I, Paiva A, McOwan PW (2009) Detecting user engagement with a robot companion using task and social interaction-based features. In: International conference on multimodal interfaces. ACM, Cambridge

  8. 8.

    Kapoor A, Picard RW (2005) Multimodal affect recognition in learning environments. In: ACM conference on multimedia, November 2005

  9. 9.

    Langton S, Watt R, Bruce V (2000) Do the eyes have it? Cues to the direction of social attention. Trends Cogn Sci 4:50–59

  10. 10.

    Emery NJ (2000) The eyes have it: the neuroethology, function and evolution of social gaze. Neurosci Biobehav Rev 24(6):581–604

  11. 11.

    Hoffman MW, Grimes DB, Shon AP, Rao RPN (2006) A probabilistic model of gaze imitation and shared attention. Neural Netw 19(3):299–310

  12. 12.

    Breazeal C, Scassellati B (2002) Challenges in building robots that imitate people. In: Dautenhahn K, Nehaniv CL (eds) Imitation in animals and artifacts. MIT Press, Cambridge, pp 363–390

  13. 13.

    Prendinger H, Eichner T, André E, Ishizuka M (2007) Gaze-based infotainment agents. In: Advances in computer entertainment technology, pp 87–90

  14. 14.

    Ishii R, Nakano YI (2008) Estimating user’s conversational engagement based on gaze behaviors. In: Prendinger H, Lester JC, Ishizuka M (eds) Intelligent virtual agents, 8th international conference, IVA. Lecture notes in computer science, vol 5208. Springer, Berlin, pp 200–207

  15. 15.

    Peters C, Asteriadis S, Karpouzis K, de Sevin E (2008) Towards a real-time gaze-based shared attention for a virtual agent. In: International conference on multimodal interfaces (ICMI), workshop on affective interaction in natural environments, AFFINE, Chania, Greece

  16. 16.

    Bevacqua E, Mancini M, Pelachaud C (2008) A listening agent exhibiting variable behaviour. In: Intelligent virtual agents (IVA), Tokyo

  17. 17.

    Voit M, Nickel K, Stiefelhagen R (2005) Multi-view head pose estimation using neural networks. In: Second Canadian conference on computer and robot vision (CRV), Victoria, BC, Canada. IEEE Comput Soc, Los Alamitos, pp 347–352

  18. 18.

    Mao Y, Suen CY, Sun C, Feng C (2007) Pose estimation based on two images from different views. In: Eighth IEEE workshop on applications of computer vision (WACV). IEEE Comput Soc, Washington, p 9

  19. 19.

    Beymer D, Flickner M (2003) Eye gaze tracking using an active stereo head. In: IEEE computer society conference on computer vision and pattern recognition (CVPR), vol 2, Madison, WI, USA, 2003. IEEE Comput Soc, Los Alamitos, pp 451–458

  20. 20.

    Meyer A, Böhme M, Martinetz T, Barth E (2006) A single-camera remote eye tracker. In: Lecture notes in artificial intelligence. Springer, Berlin, pp 208–211

  21. 21.

    Hennessey C, Noureddin B, Lawrence PD (2006) A single camera eye-gaze tracking system with free head motion. In: Proceedings of the eye tracking research & application symposium (ETRA), San Diego, California, USA, 2006. ACM, New York, pp 87–94

  22. 22.

    Gee A, Cipolla R (1994) Non-intrusive gaze tracking for human-computer interaction. In: Int conference on mechatronics and machine vision in pract, pp 112–117, Toowoomba, Australia

  23. 23.

    Gourier N, Hall D, Crowley J (2004) Estimating face orientation from robust detection of salient facial features. In: International workshop on visual observation of deictic gestures (ICPR), Cambridge, UK

  24. 24.

    Seo K, Cohen I, You S, Neumann U (2004) Face pose estimation system by combining hybrid ica-svm learning and re-registration. In: 5th Asian conference on computer vision, Jeju, Korea

  25. 25.

    Stiefelhagen R (2004) Estimating head pose with neural networks—results on the pointing, 04 ICPR workshop evaluation data. In: Pointing 04 workshop (ICPR), Cambridge, UK, August 2004

  26. 26.

    Cascia ML, Sclaroff S, Athitsos V (2000) Fast, reliable head tracking under varying illumination: an approach based on robust registration of texture-mapped 3d models. IEEE Trans Pattern Anal Mach Intell 22:322–336

  27. 27.

    Cootes T, Walker K, Taylor C (2000) View-based active appearance models. In: Fourth IEEE international conference on automatic face and gesture recognition, pp 227–232

  28. 28.

    Sung J, Kanade T, Kim D (2008) Pose robust face tracking by combining active appearance models and cylinder head models. Int J Comput Vis 80(2):260–274

  29. 29.

    Morency L-P, Whitehill J, Movellan J (2008) Generalized adaptive view-based appearance model: integrated framework for monocular head pose estimation. In: Proceedings IEEE international conference on face and gesture recognition

  30. 30.

    Whitehill J, Movellan JR (2008) A discriminative approach to frame-by-frame head pose tracking. In: Proceedings IEEE international conference on face and gesture recognition, pp 1–7

  31. 31.

    Asteriadis S, Nikolaidis N, Pitas I, Pardàs M (2007) Detection of facial characteristics based on edge information. In: Second international conference on computer vision theory and applications (VISAPP), vol 2, Barcelona, Spain, pp 247–252

  32. 32.

    Asteriadis S, Tzouveli P, Karpouzis K, Kollias S (2007) Non-verbal feedback on user interest based on gaze direction and head pose. In: 2nd international workshop on semantic media adaptation and personalization (SMAP), London, United Kingdom, December, 2007

  33. 33.

    Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision (IJCAI). In: Proceedings of the 7th international joint conference on artificial intelligence (IJCAI ’81), pp 674–679, April 1981

  34. 34.

    Peters C (2006) A perceptually-based theory of mind model for agent interaction initiation. In: International journal of humanoid robotics (IJHR), special issue: achieving human-like qualities in interactive virtual and physical humanoids. World Scientific, Singapore, pp 321–340

  35. 35.

    Emery NJ, Perrett DI (1994) Understanding the intentions of others from visual signals: neurophysiological evidence. Curr Psychol Cogn 13:683–694

  36. 36.

    Sidner CL, Kidd CD, Lee C, Lesh N (2004) Where to look: a study of human-robot interaction. In: Intelligent user interfaces conference. ACM, New York, pp 78–84

  37. 37.

    Conte R, Castelfranchi C (1995) Cognitive and social action. University College London, London

  38. 38.

    Poggi I (2007) Mind, hands, face and body. A goal and belief view of multimodal communication. Weidler, Berlin.

  39. 39.

    Congalton RG, Green K (1999) Assessing the accuracy of remotely sensed data: principles and practices. Lewis Publishers, Boca Raton

Download references

Author information

Correspondence to Christopher Peters.

Electronic Supplementary Material

Below is the link to the electronic supplementary material. (AVI 3.25 MB)

Below is the link to the electronic supplementary material. (AVI 3.25 MB)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Peters, C., Asteriadis, S. & Karpouzis, K. Investigating shared attention with a virtual agent using a gaze-based interface. J Multimodal User Interfaces 3, 119–130 (2010) doi:10.1007/s12193-009-0029-1

Download citation

  • Shared attention
  • Gaze detection
  • Embodied agents
  • Social behaviour