Abstract
In this paper, we argue that embodiment can play an important role in the evaluation of systems developed for Human Computer Interaction. To this end, we describe a simulation platform for building Embodied Human Computer Interactions (EHCI). This system, VoxWorld, enables multimodal dialogue systems that communicate through language, gesture, action, facial expressions, and gaze tracking, in the context of task-oriented interactions. A multimodal simulation is an embodied 3D virtual realization of both the situational environment and the co-situated agents, as well as the most salient content denoted by communicative acts in a discourse. It is built on the modeling language VoxML, which encodes objects with rich semantic typing and action affordances, and actions themselves as multimodal programs, enabling contextually salient inferences and decisions in the environment. Through simulation experiments in VoxWorld, we can begin to identify and then evaluate the diverse parameters involved in multimodal communication between agents. VoxWorld enables an embodied HCI by situating both human and computational agents within the same virtual simulation environment, where they share perceptual and epistemic common ground. In this first part of this paper series, we discuss the consequences of embodiment and common ground, and how they help evaluate parameters of the interaction between humans and agents, and demonstrate different behaviors and types of interactions on different classes of agents.
This work was supported by Contract W911NF-15-C-0238 with the US Defense Advanced Research Projects Agency (DARPA) and the Army Research Office (ARO). Approved for Public Release, Distribution Unlimited. The views expressed herein are ours and do not reflect the official policy or position of the Department of Defense or the U.S. Government. We would like to thank Ken Lai, Bruce Draper, Ross Beveridge, and Francisco Ortega for their comments and suggestions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anderson, M.L.: Embodied cognition: a field guide. Artif. Intell. 149(1), 91–130 (2003)
Andrist, S., Gleicher, M., Mutlu, B.: Looking coordinated: bidirectional gaze mechanisms for collaborative interaction with virtual characters. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems CHI 2017, pp. 2571–2582. ACM, New York (2017). https://doi.org/10.1145/3025453.3026033, http://doi.acm.org/10.1145/3025453.3026033
Asher, N.: Common ground, corrections and coordination. J. Semant. (1998)
Asher, N., Lascarides, A.: Logics of Conversation. Cambridge University Press, Cambridge (2003)
Asher, N., Pogodalla, S.: SDRT and continuation semantics. In: Onada, T., Bekki, D., McCready, E. (eds.) JSAI-isAI 2010. LNCS (LNAI), vol. 6797, pp. 3–15. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25655-4_2
Barsalou, L.W.: Perceptions of perceptual symbols. Behav. Brain Sci. 22(4), 637–660 (1999)
Bergen, B.K.: Louder than Words: The New Science of How the Mind Makes Meaning. Basic Books, New York (2012)
Bolt, R.A.: “Put-that-there”: voice and gesture at the graphics interface, vol. 14. ACM (1980)
Brennan, S.E., Chen, X., Dickinson, C.A., Neider, M.B., Zelinsky, G.J.: Coordinating cognition: the costs and benefits of shared gaze during collaborative search. Cognition 106(3), 1465–1477 (2008). https://doi.org/10.1016/j.cognition.2007.05.012. http://www.sciencedirect.com/science/article/pii/S0010027707001448
Cassell, J.: Embodied Conversational Agents. MIT Press, Cambridge (2000)
Cassell, J., Stone, M., Yan, H.: Coordination and context-dependence in the generation of embodied conversation. In: Proceedings of the First International Conference on Natural Language Generation, vol. 14, pp. 171–178. Association for Computational Linguistics (2000)
Chrisley, R.: Embodied artificial intelligence. Artif. Intell. 149(1), 131–150 (2003)
Clair, A.S., Mead, R., Matarić, M.J., et al.: Monitoring and guiding user attention and intention in human-robot interaction. In: ICRA-ICAIR Workshop, Anchorage, AK, USA, vol. 1025 (2010)
Clark, H.H., Brennan, S.E.: Grounding in communication. In: Resnick, L.B., Levine, J.M., Teasley, S.D. (eds.) Perspectives on Socially Shared Cognition, vol. 13, pp. 127–149. American Psychological Association, Washington DC (1991)
Clark, H.H., Wilkes-Gibbs, D.: Referring as a collaborative process. Cognition 22(1), 1–39 (1986). https://doi.org/10.1016/0010-0277(86)90010-7. http://www.sciencedirect.com/science/article/pii/0010027786900107
Cooper, R., Ginzburg, J.: Type theory with records for natural language semantics. In: Lappin, S., Fox, C. (eds.) The Handbook of Contemporary Semantic Theory, p. 375. Wiley, Hoboken (2015)
Craik, K.J.W.: The Nature of Explanation. Cambridge University, Cambridge (1943)
De Groote, P.: Type raising, continuations, and classical logic. In: Proceedings of the Thirteenth Amsterdam Colloquium, pp. 97–101 (2001)
Dillenbourg, P., Traum, D.: Sharing solutions: persistence and grounding in multimodal collaborative problem solving. J. Learn. Sci. 15(1), 121–151 (2006)
Dobnik, S., Cooper, R., Larsson, S.: Modelling language, action, and perception in type theory with records. In: Duchier, D., Parmentier, Y. (eds.) CSLP 2012. LNCS, vol. 8114, pp. 70–91. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41578-4_5
Dumas, B., Lalanne, D., Oviatt, S.: Multimodal interfaces: a survey of principles, models and frameworks. In: Lalanne, D., Kohlas, J. (eds.) Human Machine Interaction. LNCS, vol. 5440, pp. 3–26. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00437-7_1
Eisenstein, J., Barzilay, R., Davis, R.: Discourse topic and gestural form. In: AAAI, pp. 836–841 (2008)
Eisenstein, J., Barzilay, R., Davis, R.: Gesture salience as a hidden variable for coreference resolution and keyframe extraction. J. Artif. Intell. Res. 31, 353–398 (2008)
Evans, V.: Language and Time: a Cognitive Linguistics Approach. Cambridge University Press, Cambridge (2013)
Feldman, J.: Embodied language, best-fit analysis, and formal compositionality. Phys. Life Rev. 7(4), 385–410 (2010)
Fernando, T.: Situations in LTL as strings. Inf. Comput. 207(10), 980–999 (2009)
Fussell, S.R., Kraut, R.E., Siegel, J.: Coordination of communication: effects of shared visual context on collaborative work. In: Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work CSCW 2000, pp. 21–30. ACM, New York (2000). https://doi.org/10.1145/358916.358947, http://doi.acm.org/10.1145/358916.358947
Fussell, S.R., Setlock, L.D., Yang, J., Ou, J., Mauer, E., Kramer, A.D.I.: Gestures over video streams to support remote collaboration on physical tasks. Hum. Comput. Interact. 19(3), 273–309 (2004). https://doi.org/10.1207/s15327051hci1903_3
Gergle, D., Kraut, R.E., Fussell, S.R.: Action as language in a shared visual space. In: Proceedings of the 2004 ACM Conference on Computer Supported Cooperative Work CSCW 2004, pp. 487–496. ACM, New York (2004). https://doi.org/10.1145/1031607.1031687, http://doi.acm.org/10.1145/1031607.1031687
Gibson, J.J., Reed, E.S., Jones, R.: Reasons for Realism: Selected Essays of James J. Gibson. Lawrence Erlbaum Associates, Mahwah (1982)
Gilbert, M.: On Social Facts. Princeton University Press, Princeton (1992)
Ginzburg, J., Fernández, R.: Computational models of dialogue. In: Clark, A., Fox, C., Lappin, S. (eds.) The Handbook of Computational Linguistics and Natural Language Processing, vol. 57, p. 1. Wiley, Hoboken (2010)
Goldman, A.I.: Interpretation psychologized*. Mind Lang. 4(3), 161–185 (1989)
Goldman, A.I.: Simulating Minds: The Philosophy, Psychology, and Neuroscience of Mindreading. Oxford University Press, Oxford (2006)
Gordon, R.M.: Folk psychology as simulation. Mind Lang. 1(2), 158–171 (1986)
Graesser, A.C., Singer, M., Trabasso, T.: Constructing inferences during narrative text comprehension. Psychol. Rev. 101(3), 371 (1994)
Heal, J.: Simulation, theory, and content. In: Carruthers, P., Smith, P.K. (eds.) Theories of Theories of Mind, pp. 75–89. Cambridge University Press, Cambridge (1996)
Johnson-Laird, P.N., Byrne, R.M.: Conditionals: a theory of meaning, pragmatics, and inference. Psychol. Rev. 109(4), 646 (2002)
Johnson-Laird, P.: How could consciousness arise from the computations of the brain. In: Mindwaves, pp. 247–257 Basil Blackwell, Oxford (1987)
Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press, Cambridge (2004)
Kennington, C., Kousidis, S., Schlangen, D.: Interpreting situated dialogue utterances: an update model that uses speech, gaze, and gesture information. In: Proceedings of SigDial 2013 (2013)
Kiela, D., Bulat, L., Vero, A.L., Clark, S.: Virtual embodiment: a scalable long-term strategy for artificial intelligence research. arXiv preprint arXiv:1610.07432 (2016)
Kraut, R.E., Fussell, S.R., Siegel, J.: Visual information as a conversational resource in collaborative physical tasks. Hum. Comput. Interact. 18(1), 13–49 (2003). https://doi.org/10.1207/S15327051HCI1812_2
Krishnaswamy, N., Pustejovsky, J.: Multimodal semantic simulations of linguistically underspecified motion events. In: Barkowsky, T., Burte, H., Hölscher, C., Schultheis, H. (eds.) Spatial Cognition/KogWis -2016. LNCS (LNAI), vol. 10523, pp. 177–197. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68189-4_11
Krishnaswamy, N., Pustejovsky, J.: Multimodal continuation-style architectures for human-robot interaction. arXiv preprint arXiv:1909.08161 (2019)
Lascarides, A., Stone, M.: Formal semantics for iconic gesture. In: Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue (BRANDIAL), pp. 64–71 (2006)
Lascarides, A., Stone, M.: Discourse coherence and gesture interpretation. Gesture 9(2), 147–180 (2009). https://doi.org/10.1075/gest.9.2.01las. http://www.jbe-platform.com/content/journals/10.1075/gest.9.2.01las
Lascarides, A., Stone, M.: A formal semantic analysis of gesture. J. Semant. 26, 393–449 (2009)
Lücking, A., Mehler, A., Walther, D., Mauri, M., Kurfürst, D.: Finding recurrent features of image schema gestures: the figure corpus. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 1426–1431 (2016)
Lücking, A., Pfeiffer, T., Rieser, H.: Pointing and reference reconsidered. J. Pragmat. 77, 56–79 (2015)
Marshall, P., Hornecker, E.: Theories of embodiment in HCI. In: Price, S., Jewitt, C., Brown, B. (eds.) The SAGE Handbook of Digital Technology Research, vol. 1, pp. 144–158. Sage, Thousand Oaks (2013)
Matuszek, C., Bo, L., Zettlemoyer, L., Fox, D.: Learning from unscripted deictic gesture and language for human-robot interactions. In: AAAI, pp. 2556–2563 (2014)
Mehlmann, G., Häring, M., Janowski, K., Baur, T., Gebhard, P., André, E.: Exploring a model of gaze for grounding in multimodal HRI. In: Proceedings of the 16th International Conference on Multimodal Interaction ICMI 2014, pp. 247–254. ACM, New York (2014). https://doi.org/10.1145/2663204.2663275, http://doi.acm.org/10.1145/2663204.2663275
Narayanan, S.: Mind changes: a simulation semantics account of counterfactuals. Cogn. Sci. (2010)
Naumann, R.: Aspects of changes: a dynamic event semantics. J. Semant. 18, 27–81 (2001)
Pustejovsky, J.: The Generative Lexicon. MIT Press, Cambridge (1995)
Pustejovsky, J.: Dynamic event structure and habitat theory. In: Proceedings of the 6th International Conference on Generative Approaches to the Lexicon (GL2013), pp. 1–10. ACL (2013)
Pustejovsky, J.: From actions to events: communicating through language and gesture. Interact. Stud. 19(1–2), 289–317 (2018)
Pustejovsky, J.: From experiencing events in the action-perception cycle to representing events in language. Interact. Stud. 19 (2018)
Pustejovsky, J., Krishnaswamy, N.: VoxML: A visualization modeling language. In: Chair, N.C.C., et al. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), Paris, France, May 2016
Pustejovsky, J., Krishnaswamy, N.: Embodied human-computer interactions through situated grounding. In: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, pp. 1–3 (2020)
Pustejovsky, J., Krishnaswamy, N.: Embodied human computer interaction. KĂĽnstliche Intelligenz (2021)
Pustejovsky, J., Krishnaswamy, N.: Situated meaning in multimodal dialogue: Human-robot and human-computer interactions. Traitement Automatique des Langues 62(1) (2021)
Pustejovsky, J., Moszkowicz, J.: The qualitative spatial dynamics of motion. J. Spatial Cognit. Comput. 11, 15–44 (2011)
Quek, F., et al.: Multimodal human discourse: gesture and speech. ACM Trans. Comput.-Hum. Interact. (TOCHI) 9(3), 171–193 (2002)
Ravenet, B., Pelachaud, C., Clavel, C., Marsella, S.: Automating the production of communicative gestures in embodied characters. Front. Psychol. 9, 1144 (2018)
Shapiro, L.: The Routledge Handbook of Embodied Cognition. Routledge, New York (2014)
Skantze, G., Hjalmarsson, A., Oertel, C.: Turn-taking, feedback and joint attention in situated human-robot interaction. Speech Commun. 65, 50–66 (2014). https://doi.org/10.1016/j.specom.2014.05.005. http://www.sciencedirect.com/science/article/pii/S016763931400051X
Stalnaker, R.: Common ground. Linguist. Philos. 25(5–6), 701–721 (2002)
Tomasello, M., Carpenter, M.: Shared intentionality. Dev. Sci. 10(1), 121–125 (2007)
Turk, M.: Multimodal interaction: a review. Pattern Recogn. Lett. 36, 189–195 (2014)
Unger, C.: Dynamic semantics as monadic computation. In: Okumura, M., Bekki, D., Satoh, K. (eds.) JSAI-isAI 2011. LNCS (LNAI), vol. 7258, pp. 68–81. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32090-3_7
Zwaan, R.A., Pecher, D.: Revisiting mental simulation in language comprehension: six replication attempts. PLoS ONE 7(12), e51382 (2012)
Zwaan, R.A., Radvansky, G.A.: Situation models in language comprehension and memory. Psychol. Bull. 123(2), 162 (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Pustejovsky, J., Krishnaswamy, N. (2021). The Role of Embodiment and Simulation in Evaluating HCI: Theory and Framework. In: Duffy, V.G. (eds) Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Human Body, Motion and Behavior. HCII 2021. Lecture Notes in Computer Science(), vol 12777. Springer, Cham. https://doi.org/10.1007/978-3-030-77817-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-77817-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77816-3
Online ISBN: 978-3-030-77817-0
eBook Packages: Computer ScienceComputer Science (R0)