Journal on Multimodal User Interfaces

, Volume 3, Issue 1–2, pp 119–130 | Cite as

Investigating shared attention with a virtual agent using a gaze-based interface

  • Christopher Peters
  • Stylianos Asteriadis
  • Kostas Karpouzis
Original Paper


This paper investigates the use of a gaze-based interface for testing simple shared attention behaviours during an interaction scenario with a virtual agent. The interface is non-intrusive, operating in real-time using a standard web-camera for input, monitoring users’ head directions and processing them in real-time for resolution to screen coordinates. We use the interface to investigate user perception of the agent’s behaviour during a shared attention scenario. Our aim is to elaborate important factors to be considered when constructing engagement models that must account not only for behaviour in isolation, but also for the context of the interaction, as is the case during shared attention situations.

Shared attention Gaze detection Embodied agents Social behaviour 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

Below is the link to the electronic supplementary material. (AVI 3.25 MB)


  1. 1.
    Picard RW (1997) Affective computing. MIT Press, Cambridge Google Scholar
  2. 2.
    Baron-Cohen S (1994) How to build a baby that can read minds: cognitive mechanisms in mind reading. Cah Psychol Cogn 13:513–552 Google Scholar
  3. 3.
    Scassellati B (1996) Mechanisms of shared attention for a humanoid robot. In: Embodied cognition and action: papers from the 1996 AAAI fall symposium. AAAI, Menlo Park Google Scholar
  4. 4.
    Peters C, Castellano G, de Freitas S (2009) An exploration of user engagement in HCI. In: Proceedings of the affective-aware virtual agents and social robots (AFFINE) workshop, international conference on multimodal interfaces (ICMI’09). ACM, Cambridge Google Scholar
  5. 5.
    El Kaliouby R, Robinson P (2005) Generalization of a vision-based computational model of mind-reading. In: ACII 2005: proceedings of the first international conference on affective computing and intelligent interaction, pp 582–589 Google Scholar
  6. 6.
    Mota S, Picard RW (2003) Automated posture analysis for detecting learner’s interest level. In: Computer vision and pattern recognition workshop, vol 5, p 49. IEEE Comput Soc, Los Alamitos CrossRefGoogle Scholar
  7. 7.
    Castellano G, Pereira A, Leite I, Paiva A, McOwan PW (2009) Detecting user engagement with a robot companion using task and social interaction-based features. In: International conference on multimodal interfaces. ACM, Cambridge Google Scholar
  8. 8.
    Kapoor A, Picard RW (2005) Multimodal affect recognition in learning environments. In: ACM conference on multimedia, November 2005 Google Scholar
  9. 9.
    Langton S, Watt R, Bruce V (2000) Do the eyes have it? Cues to the direction of social attention. Trends Cogn Sci 4:50–59 CrossRefGoogle Scholar
  10. 10.
    Emery NJ (2000) The eyes have it: the neuroethology, function and evolution of social gaze. Neurosci Biobehav Rev 24(6):581–604 CrossRefGoogle Scholar
  11. 11.
    Hoffman MW, Grimes DB, Shon AP, Rao RPN (2006) A probabilistic model of gaze imitation and shared attention. Neural Netw 19(3):299–310 CrossRefMATHGoogle Scholar
  12. 12.
    Breazeal C, Scassellati B (2002) Challenges in building robots that imitate people. In: Dautenhahn K, Nehaniv CL (eds) Imitation in animals and artifacts. MIT Press, Cambridge, pp 363–390 Google Scholar
  13. 13.
    Prendinger H, Eichner T, André E, Ishizuka M (2007) Gaze-based infotainment agents. In: Advances in computer entertainment technology, pp 87–90 Google Scholar
  14. 14.
    Ishii R, Nakano YI (2008) Estimating user’s conversational engagement based on gaze behaviors. In: Prendinger H, Lester JC, Ishizuka M (eds) Intelligent virtual agents, 8th international conference, IVA. Lecture notes in computer science, vol 5208. Springer, Berlin, pp 200–207 Google Scholar
  15. 15.
    Peters C, Asteriadis S, Karpouzis K, de Sevin E (2008) Towards a real-time gaze-based shared attention for a virtual agent. In: International conference on multimodal interfaces (ICMI), workshop on affective interaction in natural environments, AFFINE, Chania, Greece Google Scholar
  16. 16.
    Bevacqua E, Mancini M, Pelachaud C (2008) A listening agent exhibiting variable behaviour. In: Intelligent virtual agents (IVA), Tokyo Google Scholar
  17. 17.
    Voit M, Nickel K, Stiefelhagen R (2005) Multi-view head pose estimation using neural networks. In: Second Canadian conference on computer and robot vision (CRV), Victoria, BC, Canada. IEEE Comput Soc, Los Alamitos, pp 347–352 CrossRefGoogle Scholar
  18. 18.
    Mao Y, Suen CY, Sun C, Feng C (2007) Pose estimation based on two images from different views. In: Eighth IEEE workshop on applications of computer vision (WACV). IEEE Comput Soc, Washington, p 9 CrossRefGoogle Scholar
  19. 19.
    Beymer D, Flickner M (2003) Eye gaze tracking using an active stereo head. In: IEEE computer society conference on computer vision and pattern recognition (CVPR), vol 2, Madison, WI, USA, 2003. IEEE Comput Soc, Los Alamitos, pp 451–458 Google Scholar
  20. 20.
    Meyer A, Böhme M, Martinetz T, Barth E (2006) A single-camera remote eye tracker. In: Lecture notes in artificial intelligence. Springer, Berlin, pp 208–211 Google Scholar
  21. 21.
    Hennessey C, Noureddin B, Lawrence PD (2006) A single camera eye-gaze tracking system with free head motion. In: Proceedings of the eye tracking research & application symposium (ETRA), San Diego, California, USA, 2006. ACM, New York, pp 87–94 CrossRefGoogle Scholar
  22. 22.
    Gee A, Cipolla R (1994) Non-intrusive gaze tracking for human-computer interaction. In: Int conference on mechatronics and machine vision in pract, pp 112–117, Toowoomba, Australia Google Scholar
  23. 23.
    Gourier N, Hall D, Crowley J (2004) Estimating face orientation from robust detection of salient facial features. In: International workshop on visual observation of deictic gestures (ICPR), Cambridge, UK Google Scholar
  24. 24.
    Seo K, Cohen I, You S, Neumann U (2004) Face pose estimation system by combining hybrid ica-svm learning and re-registration. In: 5th Asian conference on computer vision, Jeju, Korea Google Scholar
  25. 25.
    Stiefelhagen R (2004) Estimating head pose with neural networks—results on the pointing, 04 ICPR workshop evaluation data. In: Pointing 04 workshop (ICPR), Cambridge, UK, August 2004 Google Scholar
  26. 26.
    Cascia ML, Sclaroff S, Athitsos V (2000) Fast, reliable head tracking under varying illumination: an approach based on robust registration of texture-mapped 3d models. IEEE Trans Pattern Anal Mach Intell 22:322–336 CrossRefGoogle Scholar
  27. 27.
    Cootes T, Walker K, Taylor C (2000) View-based active appearance models. In: Fourth IEEE international conference on automatic face and gesture recognition, pp 227–232 Google Scholar
  28. 28.
    Sung J, Kanade T, Kim D (2008) Pose robust face tracking by combining active appearance models and cylinder head models. Int J Comput Vis 80(2):260–274 CrossRefGoogle Scholar
  29. 29.
    Morency L-P, Whitehill J, Movellan J (2008) Generalized adaptive view-based appearance model: integrated framework for monocular head pose estimation. In: Proceedings IEEE international conference on face and gesture recognition Google Scholar
  30. 30.
    Whitehill J, Movellan JR (2008) A discriminative approach to frame-by-frame head pose tracking. In: Proceedings IEEE international conference on face and gesture recognition, pp 1–7 Google Scholar
  31. 31.
    Asteriadis S, Nikolaidis N, Pitas I, Pardàs M (2007) Detection of facial characteristics based on edge information. In: Second international conference on computer vision theory and applications (VISAPP), vol 2, Barcelona, Spain, pp 247–252 Google Scholar
  32. 32.
    Asteriadis S, Tzouveli P, Karpouzis K, Kollias S (2007) Non-verbal feedback on user interest based on gaze direction and head pose. In: 2nd international workshop on semantic media adaptation and personalization (SMAP), London, United Kingdom, December, 2007 Google Scholar
  33. 33.
    Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision (IJCAI). In: Proceedings of the 7th international joint conference on artificial intelligence (IJCAI ’81), pp 674–679, April 1981 Google Scholar
  34. 34.
    Peters C (2006) A perceptually-based theory of mind model for agent interaction initiation. In: International journal of humanoid robotics (IJHR), special issue: achieving human-like qualities in interactive virtual and physical humanoids. World Scientific, Singapore, pp 321–340 Google Scholar
  35. 35.
    Emery NJ, Perrett DI (1994) Understanding the intentions of others from visual signals: neurophysiological evidence. Curr Psychol Cogn 13:683–694 Google Scholar
  36. 36.
    Sidner CL, Kidd CD, Lee C, Lesh N (2004) Where to look: a study of human-robot interaction. In: Intelligent user interfaces conference. ACM, New York, pp 78–84 Google Scholar
  37. 37.
    Conte R, Castelfranchi C (1995) Cognitive and social action. University College London, London Google Scholar
  38. 38.
    Poggi I (2007) Mind, hands, face and body. A goal and belief view of multimodal communication. Weidler, Berlin. Google Scholar
  39. 39.
    Congalton RG, Green K (1999) Assessing the accuracy of remotely sensed data: principles and practices. Lewis Publishers, Boca Raton Google Scholar

Copyright information

© OpenInterface Association 2009

Authors and Affiliations

  • Christopher Peters
    • 1
  • Stylianos Asteriadis
    • 2
  • Kostas Karpouzis
    • 2
  1. 1.Department of Engineering and ComputingCoventry UniversityCoventryUK
  2. 2.National Technical University AthensAthensGreece

Personalised recommendations