Behavior Research Methods

, Volume 42, Issue 1, pp 254–265 | Cite as

Eyetracking for two-person tasks with manipulation of a virtual world

  • Jean Carletta
  • Robin L. Hill
  • Craig Nicol
  • Tim Taylor
  • Jan Peter de Ruiter
  • Ellen Gurman Bard


Eyetracking facilities are typically restricted to monitoring a single person viewing static images or prerecorded video. In the present article, we describe a system that makes it possible to study visual attention in coordination with other activity during joint action. The software links two eyetracking systems in parallel and provides an on-screen task. By locating eye movements against dynamic screen regions, it permits automatic tracking of moving on-screen objects. Using existing SR technology, the system can also cross-project each participant’s eyetrack and mouse location onto the other’s on-screen work space. Keeping a complete record of eyetrack and on-screen events in the same format as subsequent human coding, the system permits the analysis of multiple modalities. The software offers new approaches to spontaneous multimodal communication: joint action and joint attention. These capacities are demonstrated using an experimental paradigm for cooperative on-screen assembly of a two-dimensional model. The software is available under an open source license.


Joint Action Smooth Pursuit Multiple Object Tracking Experimental Software Open Source License 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Alston, L., & Humphreys, G. W. (2004). Subitization and attentional engagement by transient stimuli. Spatial Vision, 17, 17–50. doi:10.1163/15685680432277825PubMedCrossRefGoogle Scholar
  2. Altmann, G. T. M., & Kamide, Y. (1999). Incremental interpretation at verbs: Restricting the domain of subsequent reference. Cognition, 73, 247–264. doi:10.1016/S0010-0277(99)00059-1PubMedCrossRefGoogle Scholar
  3. Altmann, G. T. M., & Kamide, Y. (2004). Now you see it, now you don’t: Mediating the mapping between language and the visual world. In J. Henderson & F. Ferreira (Eds.), The interface of language, vision and action: Eye movements and the visual world (pp. 347–386). New York: Psychology Press.Google Scholar
  4. Altmann, G. T. M., & Kamide, Y. (2007). The real-time mediation of visual attention by language and world knowledge: Linking anticipatory (and other) eye movements to linguistic processing. Journal of Memory & Language, 57, 502–518. doi:10.1016/j.jml.2006.12.004CrossRefGoogle Scholar
  5. Anonymous (n.d.). FFMPEG Multimedia System. Retrieved from; last accessed February 2, 2010.Google Scholar
  6. Apache XML Project (1999). Xerces C++ parser. Retrieved February 2, 2010, from Scholar
  7. Bangerter, A. (2004). Using pointing and describing to achieve joint focus of attention in dialogue. Psychological Science, 15, 415–419. doi:10.1111/j.0956-7976.2004.00694.xPubMedCrossRefGoogle Scholar
  8. Bard, E. G., Anderson, A. H., Chen, Y., Nicholson, H. B. M., Havard, C., & Dalzel-Job, S. (2007). Let’s you do that: Sharing the cognitive burdens of dialogue. Journal of Memory & Language, 57, 616–641. doi:10.1016/j.jml.2006.12.003CrossRefGoogle Scholar
  9. Bard, E. G., Hill, R., & Arai, M. (2009, July). Referring and gaze alignment: Accessibility is alive and well in situated dialogue. Paper presented at the Annual Meeting of the Cognitive Science Society, Amsterdam.Google Scholar
  10. Bard, E. G., Hill, R., & Foster, M. E. (2008, July). What tunes accessibility of referring expressions in task-related dialogue? Paper presented at the Annual Meeting of the Cognitive Science Society, Washington, DC.Google Scholar
  11. Bard, E. G., Hill, R., Nicol, C., & Carletta, J. (2007, August). Look here: Does dialogue align gaze in dynamic joint action? Paper presented at AMLaP2007, Turku, Finland.Google Scholar
  12. Barnes, G. (2008). Cognitive processes involved in smooth pursuit eye movements. Brain & Cognition, 68, 309–326. doi:10.1016/j.bandc.2008.08.020CrossRefGoogle Scholar
  13. Bell, A. (1984). Language style as audience design. Language in Society, 13, 145–204.CrossRefGoogle Scholar
  14. Brennan, S. E., Chen, X., Dickinson, C., Neider, M., & Zelinsky, G. (2008). Coordinating cognition: The costs and benefits of shared gaze during collaborative search. Cognition, 106, 1465–1477. doi:10.1016/j.cognition.2007.05.012PubMedCrossRefGoogle Scholar
  15. Brockmole, J. R., & Franconeri, S. L. (Eds.) (2009). Binding [Special issue]. Visual Cognition, 17(1 & 2).Google Scholar
  16. Burke, M., & Barnes, G. (2006). Quantitative differences in smooth pursuit and saccadic eye movements. Experimental Brain Research, 175, 596–608. doi:10.1007/s00221-006-0576-6CrossRefGoogle Scholar
  17. Campbell, C., Jr., & McRoberts, T. (2005). The simple sockets library. Retrieved February 2, 2010, from Scholar
  18. Charness, N., Reingold, E. M., Pomplun, M., & Stampe, D. M. (2001). The perceptual aspect of skilled performance in chess: Evidence from eye movements. Memory & Cognition, 29, 1146–1152.CrossRefGoogle Scholar
  19. Cherubini, M., Nüssli, M.-A., & Dillenbourg, P. (2008). Deixis and gaze in collaborative work at a distance (over a shared map): A computational model to detect misunderstandings. In K.-J. Räihä & A. T. Duchowski (Eds.), ETRA 2008—Proceedings of the Eye Tracking Research and Application Symposium (pp. 173–180). New York: ACM Press. doi:10.1145/1344471.1344515CrossRefGoogle Scholar
  20. Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press.Google Scholar
  21. Clark, H. H. (2003). Pointing and placing. In S. Kita (Ed.), Pointing: Where language, culture, and cognition meet (pp. 243–268). Mahwah, NJ: Erlbaum.Google Scholar
  22. Clark, H. H., & Brennan, S. E. (1991). Grounding in communication. In L. B. Resnick, J. Levine, & S. D. Teasley (Eds.), Perspectives on socially shared cognition (pp. 127–149). Washington, DC: American Psychological Association.CrossRefGoogle Scholar
  23. Clark, H. H., & Krych, M. A. (2004). Speaking while monitoring addressees for understanding. Journal of Memory & Language, 50, 62–81. doi:10.1016/j.jml.2003.08.004CrossRefGoogle Scholar
  24. Clark, H. H., Schreuder, R., & Buttrick, S. (1983). Common ground and the understanding of demonstrative reference. Journal of Verbal Learning & Verbal Behavior, 22, 245–258.CrossRefGoogle Scholar
  25. Duchowski, A. T. (2003). Eye tracking methodology: Theory and practice. London: Springer.Google Scholar
  26. Engelhardt, P. E., Bailey, K. G. D., & Ferreira, F. (2006). Do speakers and listeners observe the Gricean Maxim of Quantity? Journal of Memory & Language, 54, 554–573. doi:10.1016/j.jml.2005.12.009CrossRefGoogle Scholar
  27. Falck-Ytter, T., Gredeback, G., & von Hofsten, C. (2006). Infants predict other people’s action goals. Nature Neuroscience, 9, 878–879. doi:10.1038/nn1729PubMedCrossRefGoogle Scholar
  28. Foster, M. E., Bard, E. G., Guhe, M., Hill, R. L., Oberlander, J., & Knoll, A. (2008). The roles of haptic-ostensive referring expressions in cooperative task-based human-robot dialogue. In T. Fong, K. Dautenhahn, M. Scheutz, & Y. Demiris (Eds.), Proceedings of the 3rd ACM/IEEE International Conference on Human-Robot Interaction (pp. 295–302). New York: ACM Press. doi:10.1145/1349822.1349861CrossRefGoogle Scholar
  29. Fussell, S., & Kraut, R. (2004). Visual copresence and conversational coordination. Behavioral & Brain Sciences, 27, 196–197. doi:10.1017/S0140525X04290057CrossRefGoogle Scholar
  30. Fussell, S., Setlock, L., Yang, J., Ou, J. Z., Mauer, E., & Kramer, A. (2004). Gestures over video streams to support remote collaboration on physical tasks. Human-Computer Interaction, 19, 273–309. doi:10.1207/s15327051hci1903_3CrossRefGoogle Scholar
  31. Gesierich, B., Bruzzo, A., Ottoboni, G., & Finos, L. (2008). Human gaze behaviour during action execution and observation. Acta Psychologica, 128, 324–330. doi:10.1016/j.actpsy.2008.03.006PubMedCrossRefGoogle Scholar
  32. Gigerenzer, G., Todd, P. M., & ABC Research Group (1999). Simple heuristics that make us smart. Oxford: Oxford University Press.Google Scholar
  33. Grant, E. R., & Spivey, M. J. (2003). Eye movements and problem solving: Guiding attention guides thought. Psychological Science, 14, 462–466. doi:10.1111/1467-9280.02454PubMedCrossRefGoogle Scholar
  34. Griffin, Z. M. (2004). Why look? Reasons for eye movements related to language production. In J. M. Henderson & F. Ferreira (Eds.), The integration of language, vision, and action: Eye movements and the visual world. New York: Psychology Press.Google Scholar
  35. Griffin, Z. M., & Oppenheimer, D. M. (2006). Speakers gaze at objects while preparing intentionally inaccurate labels for them. Journal of Experimental Psychology: Learning, Memory, & Cognition, 32, 943–948. doi:10.1037/0278-7393.32.4.943CrossRefGoogle Scholar
  36. Hadelich, K., & Crocker, M. W. (2006, March). Gaze alignment of interlocutors in conversational dialogues. Paper presented at the 19th Annual CUNY Conference on Human Sentence Processing, New York.Google Scholar
  37. Hanna, J., & Tanenhaus, M. K. (2004). Pragmatic effects on reference resolution in a collaborative task: Evidence from eye movements. Cognitive Science, 28, 105–115. doi:10.1207/s15516709cog2801_5CrossRefGoogle Scholar
  38. Henderson, J. M., & Ferreira, F. (Eds.) (2004). The interface of language, vision, and action: Eye movements and the visual worlds. New York: Psychology Press.Google Scholar
  39. Horton, W. S., & Keysar, B. (1996). When do speakers take into account common ground? Cognition, 59, 91–117. doi:10.1016/0010-0277(96)81418-1PubMedCrossRefGoogle Scholar
  40. International Computer Science Institute (n.d.). Extensions to Transcriber for Meeting Recorder Transcription. Retrieved from; last accessed February 2, 2010.Google Scholar
  41. Kraut, R., Fussell, S., & Siegel, J. (2003). Visual information as a conversational resource in collaborative physical tasks. Human-Computer Interaction, 18, 13–49. doi:10.1207/S15327051HCI1812_2CrossRefGoogle Scholar
  42. Kraut, R., Gergle, D., & Fussell, S. (2002, November). The use of visual information in shared visual spaces: Informing the development of virtual co-presence. Paper presented at the ACM Conference on Computer Supported Cooperative Work, New Orleans, LA.Google Scholar
  43. Land, M. F. (2006). Eye movements and the control of actions in everyday life. Progress in Retinal & Eye Research, 25, 296–324. doi:10.1016/j.preteyeres.2006.01.002CrossRefGoogle Scholar
  44. Land, M. F. (2007). Fixation strategies during active behavior: A brief history. In R. P. G. van Gompel, M. H. Fischer, W. S. Murray, & R. L. Hill (Eds.), Eye movements: A window on mind and brain (pp. 75–95). Oxford: Elsevier.Google Scholar
  45. Language Technology Group (n.d.). NITE XML Toolkit Homepages. Retrieved from; last accessed February 2, 2010.Google Scholar
  46. Land, M. F., & Furneaux, S. (1997). The knowledge base of the oculomotor system. Philosophical Transactions of the Royal Society B, 352, 1231–1239.CrossRefGoogle Scholar
  47. Lindström, A. (1999). SGE: SDL Graphics Extension. Retrieved February 2, 2010, from Scholar
  48. Lobmaier, J. S., Fischer, M. H., & Schwaninger, A. (2006). Objects capture perceived gaze direction. Experimental Psychology, 53, 117–122. doi:10.1027/1618-3169.53.2.117PubMedGoogle Scholar
  49. Lockridge, C. B., & Brennan, S. E. (2002). Addressees’ needs influence speakers’ early syntactic choices. Psychonomic Bulletin & Review, 9, 550–557.CrossRefGoogle Scholar
  50. Luck, S. J., & Beach, N. J. (1998). Visual attention and the binding problem: A neurophysiological perspective. In R. D. Wright (Ed.), Visual attention (pp. 455–478). Oxford: Oxford University Press.Google Scholar
  51. Max Planck Institute for Psycholinguistics (n.d.). Language Archiving Technology: ELAN. Retrieved from; last accessed February 2, 2010.Google Scholar
  52. Meyer, A. S., & Dobel, C. (2003). Application of eye tracking in speech production research. In J. Hyönä, R. Radach, & H. Deubel (Eds.), The mind’s eye: Cognitive and applied aspects of eye movement research (pp. 253–272). Amsterdam: Elsevier.Google Scholar
  53. Meyer, A. S., van der Meulen, F., & Brooks, A. (2004). Eye movements during speech planning: Speaking about present and remembered objects. Visual Cognition, 11, 553–576. doi:10.1080/13506280344000248CrossRefGoogle Scholar
  54. Monk, A., & Gale, C. (2002). A look is worth a thousand words: Full gaze awareness in video-mediated conversation. Discourse Processes, 33, 257–278. doi:10.1207/S15326950DP3303_4CrossRefGoogle Scholar
  55. Murray, N., & Roberts, D. (2006, October). Comparison of head gaze and head and eye gaze within an immersive environment. Paper presented at the 10th IEEE International Symposium on Distributed Simulation and Real-Time Applications, Los Alamitos, CA. doi:10.1109/DS-RT.2006.13Google Scholar
  56. Pickering, M., & Garrod, S. (2004). Towards a mechanistic psychology of dialogue. Behavioral & Brain Sciences, 27, 169–190. doi:10.1017/S0140525X04000056Google Scholar
  57. Pylyshyn, Z. W. (2006). Some puzzling findings in multiple object tracking (MOT): II. Inhibition of moving nontargets. Visual Cognition, 14, 175–198. doi:10.1080/13506280544000200CrossRefGoogle Scholar
  58. Pylyshyn, Z. W., & Annan, V. (2006). Dynamics of target selection in multiple object tracking (MOT). Spatial Vision, 19, 485–504. doi:10.1163/156856806779194017PubMedCrossRefGoogle Scholar
  59. Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124, 372–422.PubMedCrossRefGoogle Scholar
  60. Richardson, D., & Dale, R. (2005). Looking to understand: The coupling between speakers’ and listeners’ eye movements and its relationship to discourse comprehension. Cognitive Science, 29, 1045–1060. doi:10.1207/s15516709cog0000_29PubMedCrossRefGoogle Scholar
  61. Richardson, D., Dale, R., & Kirkham, N. (2007). The art of conversation is coordination: Common ground and the coupling of eye movements during dialogue. Psychological Science, 18, 407–413. doi:10.1111/j.1467-9280.2007.01914.xPubMedCrossRefGoogle Scholar
  62. Richardson, D., Dale, R., & Tomlinson, J. (2009). Conversation, gaze coordination, and beliefs about visual context. Cognitive Science, 33, 1468–1482. doi:10.1111/j.1551-6709.2009.01057.xPubMedCrossRefGoogle Scholar
  63. Simple DirectMedia Layer Project (n.d.). SDL: Simple DirectMedia Layer. Retrieved February 2, 2010, from Scholar
  64. Spivey, M., & Geng, J. (2001). Oculomotor mechanisms activated by imagery and memory: Eye movements to absent objects. Psychological Research, 65, 235–241. doi:10.1007/s004260100059PubMedCrossRefGoogle Scholar
  65. Spivey, M., Tanenhaus, M. K., Eberhard, K. M., & Sedivy, J. C. (2002). Eye movements and spoken language comprehension: Effects of visual context on syntactic ambiguity resolution. Cognitive Psychology, 45, 447–481. doi:10.1016/S0010-0285(02)00503-0PubMedCrossRefGoogle Scholar
  66. SR Research (n.d.). Complete eyetracking solutions. Retrieved February 2, 2010, from Scholar
  67. Stein, R., & Brennan, S. E. (2004, October). Another person’s eye gaze as a cue in solving programming problems. Paper presented at ICMI ’04: 6th International Conference on Multimodal Interfaces, State College, PA.Google Scholar
  68. Steptoe, W., Wolff, R., Murgia, A., Guimaraes, E., Rae, J., Sharkey, P., et al. (2008, November). Eye-tracking for avatar eye-gaze and interactional analysis in immersive collaborative virtual environments. Paper presented at the ACM Conference on Computer Supported Cooperative Work, San Diego, CA.Google Scholar
  69. TechSmith (n.d.). Camtasia Studio. Retrieved February 2, 2010, from Scholar
  70. Trueswell, J. C., & Tanenhaus, M. K. (Eds.) (2005). Approaches to studying world-situated language use: Bridging the language-as-product and language-as-action traditions. Cambridge, MA: MIT Press.Google Scholar
  71. Underwood, G. (Ed.) (2005). Cognitive processes in eye guidance. Oxford: Oxford University Press.Google Scholar
  72. Underwood, J. (2005). Novice and expert performance with a dynamic control task: Scanpaths during a computer game. In G. Underwood (Ed.), Cognitive processes in eye guidance (pp. 303–323). Oxford: Oxford University Press.Google Scholar
  73. Van Gompel, R. P. G., Fischer, M. H., Murray, W. S., & Hill, R. L. (Eds.) (2007). Eye movements: A window on mind and brain. Oxford: Elsevier.Google Scholar
  74. Velichkovsky, B. M. (1995). Communicating attention: Gaze position transfer in cooperative problem solving. Pragmatics & Cognition, 3, 199–222.CrossRefGoogle Scholar
  75. Vertegaal, R., & Ding, Y. (2002, November). Explaining effects of eye gaze on mediated group conversations: Amount or synchronization? Paper presented at the ACM Conference on Computer Supported Cooperation Work, New Orleans, LA.Google Scholar
  76. Wolfe, J. M., Place, S. S., & Horowitz, T. S. (2007). Multiple object juggling: Changing what is tracked during extended multiple object tracking. Psychonomic Bulletin & Review, 14, 344–349.CrossRefGoogle Scholar
  77. World Wide Web Consortium (1999). XSL Transformations (XSLT) Version 1.0: W3C Recommendation, 16 November. Retrieved from; last accessed February 2, 2010.Google Scholar

Copyright information

© Psychonomic Society, Inc. 2010

Authors and Affiliations

  • Jean Carletta
    • 2
  • Robin L. Hill
    • 2
  • Craig Nicol
    • 2
  • Tim Taylor
    • 2
  • Jan Peter de Ruiter
    • 1
  • Ellen Gurman Bard
    • 2
  1. 1.Max Planck Institute for PsycholinguisticsNijmegenThe Netherlands
  2. 2.Human Communication Research Centre, Informatics ForumUniversity of EdinburghEdinburghScotland

Personalised recommendations