Deictic Adaptation in a Virtual Environment

  • Nikhil KrishnaswamyEmail author
  • James Pustejovsky
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11034)


As human-computer interfaces become more sophisticated, people expect computational agents to behave more like humans. However, humans interacting make assumptions about mutual conceptual understanding that they may not make when interacting with a computational agent, where spatial cues in the environment affect their assumptions about the agent’s knowledge. In this paper, we examine an interaction between human subjects and a virtual embodied avatar displayed on a screen, wherein a surface displayed on the screen is either “continued” in the real world by a physical surface or not. Subjects are, with minimal instruction, asked to indicate objects displayed in the shared environment to the agent in the course of a collaborative task. We then examine the subjects’ adaptations, in aggregate, to the different configurations.


Spatial cognition Deixis Virtual agent Embodiment Spatial reasoning 



The authors would like to thank the reviewers for their helpful comments. We would also like to thank our colleagues at Colorado State University and the University of Florida for developing the gesture recognition systems: Prof. Bruce Draper, Prof. Jaime Ruiz, Prof. Ross Beveridge, Pradyumna Narayana, Isaac Wang, Rahul Bangar, Dhruva Patil, Gururaj Mulay, Jason Yu, and Jesse Smith; and our Brandeis University colleagues, Tuan Do and Kyeongmin Rim, for their work on VoxSim. Additional thanks to Jason for providing Fig. 3. This work is supported by a contract with the US Defense Advanced Research Projects Agency (DARPA), Contract W911NF-15-C-0238. Approved for Public Release, Distribution Unlimited. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government.


  1. 1.
    Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Savannah, Georgia, USA (2016)Google Scholar
  2. 2.
    Abbott, B.: Presuppositions and common ground. Linguist. Philos. 31(5), 523–538 (2008)CrossRefGoogle Scholar
  3. 3.
    Arbib, M., Rizzolatti, G.: Neural expectations: a possible evolutionary path from manual skills to language. Commun. Cogn. 29, 393–424 (1996)Google Scholar
  4. 4.
    Arbib, M.A.: From grasp to language: embodied concepts and the challenge of abstraction. J. Physiol. Paris 102(1), 4–20 (2008)CrossRefGoogle Scholar
  5. 5.
    Asher, N., Gillies, A.: Common ground, corrections, and coordination. Argumentation 17(4), 481–512 (2003)CrossRefGoogle Scholar
  6. 6.
    Ballard, D.H., Hayhoe, M.M., Pook, P.K., Rao, R.P.: Deictic codes for the embodiment of cognition. Behav. Brain Sci. 20(4), 723–742 (1997)Google Scholar
  7. 7.
    Benford, S., Greenhalgh, C., Reynard, G., Brown, C., Koleva, B.: Understanding and constructing shared spaces with mixed-reality boundaries. ACM Trans. Comput.-Hum. Interact. (TOCHI) 5(3), 185–223 (1998)CrossRefGoogle Scholar
  8. 8.
    Bergen, B.K.: Louder than Words: The New Science of How the Mind Makes Meaning. Basic Books, New York (2012)Google Scholar
  9. 9.
    Brooks, A.G., Breazeal, C.: Working with robots and objects: Revisiting deictic reference for achieving spatial common ground. In: Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction, pp. 297–304. ACM (2006)Google Scholar
  10. 10.
    Clark, H.H., Brennan, S.E.: Grounding in communication. In: Resnick, L., Levine, B., John, M., Teasley, S.D. (eds.) Perspectives on Socially Shared Cognition, pp. 13–1991. American Psychological Association (1991)Google Scholar
  11. 11.
    Clark, H.H., Schreuder, R., Buttrick, S.: Common ground at the understanding of demonstrative reference. J. Verbal Learn. Verbal Behav. 22(2), 245–258 (1983)CrossRefGoogle Scholar
  12. 12.
    David, N., Bewernick, B.H., Cohen, M.X., Newen, A., Lux, S., Fink, G.R., Shah, N.J., Vogeley, K.: Neural representations of self versus other: visual-spatial perspective taking and agency in a virtual ball-tossing game. J. Cogn. Neurosci. 18(6), 898–910 (2006)CrossRefGoogle Scholar
  13. 13.
    Edwards, A., Shepherd, G.J.: Theories of communication, human nature, and the world: associations and implications. Commun. Stud. 55(2), 197–208 (2004)CrossRefGoogle Scholar
  14. 14.
    Flintham, M., Benford, S., Anastasi, R., Hemmings, T., Crabtree, A., Greenhalgh, C., Tandavanitj, N., Adams, M., Row-Farr, J.: Where on-line meets on the streets: experiences with mobile mixed reality games. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 569–576. ACM (2003)Google Scholar
  15. 15.
    Fogassi, L., Gallese, V., Di Pellegrino, G., Fadiga, L., Gentilucci, M., Luppino, G., Matelli, M., Pedotti, A., Rizzolatti, G.: Space coding by premotor cortex. Exp. Brain Res. 89(3), 686–690 (1992)CrossRefGoogle Scholar
  16. 16.
    Fussell, S.R., Kiesler, S., Setlock, L.D., Yew, V.: How people anthropomorphize robots. In: Proceedings of the 3rd ACM/IEEE International Conference on Human Robot Interaction, pp. 145–152. ACM (2008)Google Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  18. 18.
    Hindmarsh, J., Fraser, M., Heath, C., Benford, S., Greenhalgh, C.: Object-focused interaction in collaborative virtual environments. ACM Trans. Comput. Hum. Interact. (TOCHI) 7(4), 477–509 (2000)CrossRefGoogle Scholar
  19. 19.
    Hindmarsh, J., Heath, C.: Embodied reference: a study of deixis in workplace interaction. J. Pragmatics 32(12), 1855–1878 (2000)CrossRefGoogle Scholar
  20. 20.
    Hostetter, A.B., Alibali, M.W.: Visible embodiment: gestures as simulated action. Psychon. Bull. Rev. 15(3), 495–514 (2008)CrossRefGoogle Scholar
  21. 21.
    Izadi, S., Brignull, H., Rodden, T., Rogers, Y., Underwood, M.: Dynamo: a public interactive surface supporting the cooperative sharing and exchange of media. In: Proceedings of the 16th Annual ACM Symposium on User Interface Software and Technology, pp. 159–168. ACM (2003)Google Scholar
  22. 22.
    Johanson, B., Hutchins, G., Winograd, T., Stone, M.: PointRight: experience with flexible input redirection in interactive workspaces. In: Proceedings of the 15th Annual ACM Symposium on User Interface Software and Technology, pp. 227–234. ACM (2002)Google Scholar
  23. 23.
    Kirchhofer, K.C., Zimmermann, F., Kaminski, J., Tomasello, M.: Dogs (canis familiaris), but not chimpanzees (pan troglodytes), understand imperative pointing. PLoS ONE 7(2), e30913 (2012)CrossRefGoogle Scholar
  24. 24.
    Krishnaswamy, N., Narayana, P., Wang, I., Rim, K., Bangar, R., Patil, D., Mulay, G., Ruiz, J., Beveridge, R., Draper, B., Pustejovsky, J.: Communicating and acting: understanding gesture in simulation semantics. In: 12th International Workshop on Computational Semantics (2017)Google Scholar
  25. 25.
    Krishnaswamy, N., Pustejovsky, J.: Multimodal semantic simulations of linguistically underspecified motion events. In: Barkowsky, T., Burte, H., Hölscher, C., Schultheis, H. (eds.) Spatial Cognition/KogWis - 2016. LNCS (LNAI), vol. 10523, pp. 177–197. Springer, Cham (2017). Scholar
  26. 26.
    Krishnaswamy, N., Pustejovsky, J.: VoxSim: a visual platform for modeling motion language. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. ACL (2016)Google Scholar
  27. 27.
    Krishnaswamy, N., Pustejovsky, J.: An evaluation framework for multimodal interaction. In: Proceedings of LREC (2018, forthcoming)Google Scholar
  28. 28.
    Lewis, D.: Scorekeeping in a language game. J. Philos. Logic 8(1), 339–359 (1979)CrossRefGoogle Scholar
  29. 29.
    Malik, S., Ranjan, A., Balakrishnan, R.: Interacting with large displays from a distance with vision-tracked multi-finger gestural input. In: Proceedings of the 18th Annual ACM Symposium on User Interface Software and Technology, pp. 43–52. ACM (2005)Google Scholar
  30. 30.
    Moeslund, T.B., Störring, M., Granum, E.: A natural interface to a virtual environment through computer vision-estimated pointing gestures. In: Wachsmuth, I., Sowa, T. (eds.) GW 2001. LNCS (LNAI), vol. 2298, pp. 59–63. Springer, Heidelberg (2002). Scholar
  31. 31.
    Morris, M.R., Huang, A., Paepcke, A., Winograd, T.: Cooperative gestures: multi-user gestural interactions for co-located groupware. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1201–1210. ACM (2006)Google Scholar
  32. 32.
    Morris, M.R., Wobbrock, J.O., Wilson, A.D.: Understanding users’ preferences for surface gestures. In: Proceedings of Graphics Interface 2010, pp. 261–268. Canadian Information Processing Society (2010)Google Scholar
  33. 33.
    Narayana, P., Krishnaswamy, N., Wang, I., Bangar, R., Patil, D., Mulay, G., Rim, K., Beveridge, R., Ruiz, J., Pustejovsky, J., Draper, B.: Cooperating with avatars through gesture, language and action. In: Intelligent Systems Conference (IntelliSys) (2018, forthcoming)Google Scholar
  34. 34.
    Papaxanthis, C., Pozzo, T., Schieppati, M.: Trajectories of arm pointing movements on the sagittal plane vary with both direction and speed. Exp. Brain Res. 148(4), 498–503 (2003)CrossRefGoogle Scholar
  35. 35.
    Pustejovsky, J.: The Generative Lexicon. MIT Press, Cambridge (1995)Google Scholar
  36. 36.
    Pustejovsky, J.: From actions to events: communicating through language and gesture. Interact. Stud. 19(1) (2018)Google Scholar
  37. 37.
    Pustejovsky, J., Krishnaswamy, N.: VoxML: a visualization modeling language. In: Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), Paris, May 2016Google Scholar
  38. 38.
    Pustejovsky, J., Krishnaswamy, N., Draper, B., Narayana, P., Bangar, R.: Creating common ground through multimodal simulations. In: Proceedings of the IWCS Workshop on Foundations of Situated and Multimodal Communication (2017)Google Scholar
  39. 39.
    Scott, S.D., Grant, K.D., Mandryk, R.L.: System guidelines for co-located, collaborative work on a tabletop display. In: Kuutti, K., Karsten, E.H., Fitzpatrick, G., Dourish, P., Schmidt, K. (eds.) ECSCW 2003, pp. 159–178. Springer, Heidelberg (2003). Scholar
  40. 40.
    Spence, I., Feng, J.: Video games and spatial cognition. Rev. Gen. Psychol. 14(2), 92 (2010)CrossRefGoogle Scholar
  41. 41.
    Stalnaker, R.: Common ground. Linguist. Philos. 25(5–6), 701–721 (2002)CrossRefGoogle Scholar
  42. 42.
    Tomasello, M., Carpenter, M.: Shared intentionality. Dev. Sci. 10(1), 121–125 (2007)CrossRefGoogle Scholar
  43. 43.
    Volterra, V., Caselli, M.C., Capirci, O., Pizzuto, E.: Gesture and the emergence and development of language. Beyond nature-nurture: Essays in honor of Elizabeth Bates, pp. 3–40 (2005)Google Scholar
  44. 44.
    Wang, I., Narayana, P., Patil, D., Mulay, G., Bangar, R., Draper, B., Beveridge, R., Ruiz, J.: EGGNOG: a continuous, multi-modal data set of naturally occurring gestures with ground truth labels. In: To appear in the Proceedings of the 12th IEEE International Conference on Automatic Face & Gesture Recognition (2017)Google Scholar
  45. 45.
    Wang, I., Narayana, P., Patil, D., Mulay, G., Bangar, R., Draper, B., Beveridge, R., Ruiz, J.: Exploring the use of gesture in collaborative tasks. In: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems, CHI EA 2017, pp. 2990–2997. ACM, New York (2017).
  46. 46.
    Wobbrock, J.O., Morris, M.R., Wilson, A.D.: User-defined gestures for surface computing. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1083–1092. ACM (2009)Google Scholar
  47. 47.
    Wraga, M., Creem-Regehr, S.H., Proffitt, D.R.: Spatial updating of virtual displays. Mem. Cogn. 32(3), 399–415 (2004)CrossRefGoogle Scholar
  48. 48.
    Wright, T.P.: Learning curve. J. Aeronaut. Sci. 3(1), 122–128 (1936)CrossRefGoogle Scholar
  49. 49.
    Zhai, S., Kong, J., Ren, X.: Speed-accuracy tradeoff in fitts’ law tasks-on the equivalency of actual and nominal pointing precision. Int. J. Hum. Comput. Stud. 61(6), 823–856 (2004)CrossRefGoogle Scholar
  50. 50.
    Zhang, Z.: Microsoft kinect sensor and its effect. IEEE MultiMedia 19, 4–10 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceBrandeis UniversityWalthamUSA

Personalised recommendations