The Role of Embodiment and Simulation in Evaluating HCI: Theory and Framework

Pustejovsky, James; Krishnaswamy, Nikhil

doi:10.1007/978-3-030-77817-0_21

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12777))

Included in the following conference series:

International Conference on Human-Computer Interaction

1264 Accesses
2 Citations

Abstract

In this paper, we argue that embodiment can play an important role in the evaluation of systems developed for Human Computer Interaction. To this end, we describe a simulation platform for building Embodied Human Computer Interactions (EHCI). This system, VoxWorld, enables multimodal dialogue systems that communicate through language, gesture, action, facial expressions, and gaze tracking, in the context of task-oriented interactions. A multimodal simulation is an embodied 3D virtual realization of both the situational environment and the co-situated agents, as well as the most salient content denoted by communicative acts in a discourse. It is built on the modeling language VoxML, which encodes objects with rich semantic typing and action affordances, and actions themselves as multimodal programs, enabling contextually salient inferences and decisions in the environment. Through simulation experiments in VoxWorld, we can begin to identify and then evaluate the diverse parameters involved in multimodal communication between agents. VoxWorld enables an embodied HCI by situating both human and computational agents within the same virtual simulation environment, where they share perceptual and epistemic common ground. In this first part of this paper series, we discuss the consequences of embodiment and common ground, and how they help evaluate parameters of the interaction between humans and agents, and demonstrate different behaviors and types of interactions on different classes of agents.

This work was supported by Contract W911NF-15-C-0238 with the US Defense Advanced Research Projects Agency (DARPA) and the Army Research Office (ARO). Approved for Public Release, Distribution Unlimited. The views expressed herein are ours and do not reflect the official policy or position of the Department of Defense or the U.S. Government. We would like to thank Ken Lai, Bruce Draper, Ross Beveridge, and Francisco Ortega for their comments and suggestions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Embodied Human Computer Interaction

Article 16 September 2021

Multimodal Semantics for Affordances and Actions

Prototyping User Interfaces for Investigating the Role of Virtual Agents in Human-Machine Interaction

Notes

1.
This is similar in many respects to the representations introduced in [16, 32] and [20] for modeling action and control with robots.

References

Anderson, M.L.: Embodied cognition: a field guide. Artif. Intell. 149(1), 91–130 (2003)
Article Google Scholar
Andrist, S., Gleicher, M., Mutlu, B.: Looking coordinated: bidirectional gaze mechanisms for collaborative interaction with virtual characters. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems CHI 2017, pp. 2571–2582. ACM, New York (2017). https://doi.org/10.1145/3025453.3026033, http://doi.acm.org/10.1145/3025453.3026033
Asher, N.: Common ground, corrections and coordination. J. Semant. (1998)
Google Scholar
Asher, N., Lascarides, A.: Logics of Conversation. Cambridge University Press, Cambridge (2003)
Google Scholar
Asher, N., Pogodalla, S.: SDRT and continuation semantics. In: Onada, T., Bekki, D., McCready, E. (eds.) JSAI-isAI 2010. LNCS (LNAI), vol. 6797, pp. 3–15. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25655-4_2
Chapter Google Scholar
Barsalou, L.W.: Perceptions of perceptual symbols. Behav. Brain Sci. 22(4), 637–660 (1999)
Article Google Scholar
Bergen, B.K.: Louder than Words: The New Science of How the Mind Makes Meaning. Basic Books, New York (2012)
Google Scholar
Bolt, R.A.: “Put-that-there”: voice and gesture at the graphics interface, vol. 14. ACM (1980)
Google Scholar
Brennan, S.E., Chen, X., Dickinson, C.A., Neider, M.B., Zelinsky, G.J.: Coordinating cognition: the costs and benefits of shared gaze during collaborative search. Cognition 106(3), 1465–1477 (2008). https://doi.org/10.1016/j.cognition.2007.05.012. http://www.sciencedirect.com/science/article/pii/S0010027707001448
Article Google Scholar
Cassell, J.: Embodied Conversational Agents. MIT Press, Cambridge (2000)
Book Google Scholar
Cassell, J., Stone, M., Yan, H.: Coordination and context-dependence in the generation of embodied conversation. In: Proceedings of the First International Conference on Natural Language Generation, vol. 14, pp. 171–178. Association for Computational Linguistics (2000)
Google Scholar
Chrisley, R.: Embodied artificial intelligence. Artif. Intell. 149(1), 131–150 (2003)
Article Google Scholar
Clair, A.S., Mead, R., Matarić, M.J., et al.: Monitoring and guiding user attention and intention in human-robot interaction. In: ICRA-ICAIR Workshop, Anchorage, AK, USA, vol. 1025 (2010)
Google Scholar
Clark, H.H., Brennan, S.E.: Grounding in communication. In: Resnick, L.B., Levine, J.M., Teasley, S.D. (eds.) Perspectives on Socially Shared Cognition, vol. 13, pp. 127–149. American Psychological Association, Washington DC (1991)
Chapter Google Scholar
Clark, H.H., Wilkes-Gibbs, D.: Referring as a collaborative process. Cognition 22(1), 1–39 (1986). https://doi.org/10.1016/0010-0277(86)90010-7. http://www.sciencedirect.com/science/article/pii/0010027786900107
Article Google Scholar
Cooper, R., Ginzburg, J.: Type theory with records for natural language semantics. In: Lappin, S., Fox, C. (eds.) The Handbook of Contemporary Semantic Theory, p. 375. Wiley, Hoboken (2015)
Chapter Google Scholar
Craik, K.J.W.: The Nature of Explanation. Cambridge University, Cambridge (1943)
Google Scholar
De Groote, P.: Type raising, continuations, and classical logic. In: Proceedings of the Thirteenth Amsterdam Colloquium, pp. 97–101 (2001)
Google Scholar
Dillenbourg, P., Traum, D.: Sharing solutions: persistence and grounding in multimodal collaborative problem solving. J. Learn. Sci. 15(1), 121–151 (2006)
Article Google Scholar
Dobnik, S., Cooper, R., Larsson, S.: Modelling language, action, and perception in type theory with records. In: Duchier, D., Parmentier, Y. (eds.) CSLP 2012. LNCS, vol. 8114, pp. 70–91. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41578-4_5
Chapter Google Scholar
Dumas, B., Lalanne, D., Oviatt, S.: Multimodal interfaces: a survey of principles, models and frameworks. In: Lalanne, D., Kohlas, J. (eds.) Human Machine Interaction. LNCS, vol. 5440, pp. 3–26. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00437-7_1
Chapter Google Scholar
Eisenstein, J., Barzilay, R., Davis, R.: Discourse topic and gestural form. In: AAAI, pp. 836–841 (2008)
Google Scholar
Eisenstein, J., Barzilay, R., Davis, R.: Gesture salience as a hidden variable for coreference resolution and keyframe extraction. J. Artif. Intell. Res. 31, 353–398 (2008)
Article Google Scholar
Evans, V.: Language and Time: a Cognitive Linguistics Approach. Cambridge University Press, Cambridge (2013)
Book Google Scholar
Feldman, J.: Embodied language, best-fit analysis, and formal compositionality. Phys. Life Rev. 7(4), 385–410 (2010)
Article Google Scholar
Fernando, T.: Situations in LTL as strings. Inf. Comput. 207(10), 980–999 (2009)
Article MathSciNet Google Scholar
Fussell, S.R., Kraut, R.E., Siegel, J.: Coordination of communication: effects of shared visual context on collaborative work. In: Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work CSCW 2000, pp. 21–30. ACM, New York (2000). https://doi.org/10.1145/358916.358947, http://doi.acm.org/10.1145/358916.358947
Fussell, S.R., Setlock, L.D., Yang, J., Ou, J., Mauer, E., Kramer, A.D.I.: Gestures over video streams to support remote collaboration on physical tasks. Hum. Comput. Interact. 19(3), 273–309 (2004). https://doi.org/10.1207/s15327051hci1903_3
Article Google Scholar
Gergle, D., Kraut, R.E., Fussell, S.R.: Action as language in a shared visual space. In: Proceedings of the 2004 ACM Conference on Computer Supported Cooperative Work CSCW 2004, pp. 487–496. ACM, New York (2004). https://doi.org/10.1145/1031607.1031687, http://doi.acm.org/10.1145/1031607.1031687
Gibson, J.J., Reed, E.S., Jones, R.: Reasons for Realism: Selected Essays of James J. Gibson. Lawrence Erlbaum Associates, Mahwah (1982)
Google Scholar
Gilbert, M.: On Social Facts. Princeton University Press, Princeton (1992)
Book Google Scholar
Ginzburg, J., Fernández, R.: Computational models of dialogue. In: Clark, A., Fox, C., Lappin, S. (eds.) The Handbook of Computational Linguistics and Natural Language Processing, vol. 57, p. 1. Wiley, Hoboken (2010)
Google Scholar
Goldman, A.I.: Interpretation psychologized*. Mind Lang. 4(3), 161–185 (1989)
Article Google Scholar
Goldman, A.I.: Simulating Minds: The Philosophy, Psychology, and Neuroscience of Mindreading. Oxford University Press, Oxford (2006)
Book Google Scholar
Gordon, R.M.: Folk psychology as simulation. Mind Lang. 1(2), 158–171 (1986)
Article Google Scholar
Graesser, A.C., Singer, M., Trabasso, T.: Constructing inferences during narrative text comprehension. Psychol. Rev. 101(3), 371 (1994)
Article Google Scholar
Heal, J.: Simulation, theory, and content. In: Carruthers, P., Smith, P.K. (eds.) Theories of Theories of Mind, pp. 75–89. Cambridge University Press, Cambridge (1996)
Chapter Google Scholar
Johnson-Laird, P.N., Byrne, R.M.: Conditionals: a theory of meaning, pragmatics, and inference. Psychol. Rev. 109(4), 646 (2002)
Article Google Scholar
Johnson-Laird, P.: How could consciousness arise from the computations of the brain. In: Mindwaves, pp. 247–257 Basil Blackwell, Oxford (1987)
Google Scholar
Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Kennington, C., Kousidis, S., Schlangen, D.: Interpreting situated dialogue utterances: an update model that uses speech, gaze, and gesture information. In: Proceedings of SigDial 2013 (2013)
Google Scholar
Kiela, D., Bulat, L., Vero, A.L., Clark, S.: Virtual embodiment: a scalable long-term strategy for artificial intelligence research. arXiv preprint arXiv:1610.07432 (2016)
Kraut, R.E., Fussell, S.R., Siegel, J.: Visual information as a conversational resource in collaborative physical tasks. Hum. Comput. Interact. 18(1), 13–49 (2003). https://doi.org/10.1207/S15327051HCI1812_2
Article Google Scholar
Krishnaswamy, N., Pustejovsky, J.: Multimodal semantic simulations of linguistically underspecified motion events. In: Barkowsky, T., Burte, H., Hölscher, C., Schultheis, H. (eds.) Spatial Cognition/KogWis -2016. LNCS (LNAI), vol. 10523, pp. 177–197. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68189-4_11
Chapter Google Scholar
Krishnaswamy, N., Pustejovsky, J.: Multimodal continuation-style architectures for human-robot interaction. arXiv preprint arXiv:1909.08161 (2019)
Lascarides, A., Stone, M.: Formal semantics for iconic gesture. In: Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue (BRANDIAL), pp. 64–71 (2006)
Google Scholar
Lascarides, A., Stone, M.: Discourse coherence and gesture interpretation. Gesture 9(2), 147–180 (2009). https://doi.org/10.1075/gest.9.2.01las. http://www.jbe-platform.com/content/journals/10.1075/gest.9.2.01las
Article Google Scholar
Lascarides, A., Stone, M.: A formal semantic analysis of gesture. J. Semant. 26, 393–449 (2009)
Article Google Scholar
Lücking, A., Mehler, A., Walther, D., Mauri, M., Kurfürst, D.: Finding recurrent features of image schema gestures: the figure corpus. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 1426–1431 (2016)
Google Scholar
Lücking, A., Pfeiffer, T., Rieser, H.: Pointing and reference reconsidered. J. Pragmat. 77, 56–79 (2015)
Article Google Scholar
Marshall, P., Hornecker, E.: Theories of embodiment in HCI. In: Price, S., Jewitt, C., Brown, B. (eds.) The SAGE Handbook of Digital Technology Research, vol. 1, pp. 144–158. Sage, Thousand Oaks (2013)
Chapter Google Scholar
Matuszek, C., Bo, L., Zettlemoyer, L., Fox, D.: Learning from unscripted deictic gesture and language for human-robot interactions. In: AAAI, pp. 2556–2563 (2014)
Google Scholar
Mehlmann, G., Häring, M., Janowski, K., Baur, T., Gebhard, P., André, E.: Exploring a model of gaze for grounding in multimodal HRI. In: Proceedings of the 16th International Conference on Multimodal Interaction ICMI 2014, pp. 247–254. ACM, New York (2014). https://doi.org/10.1145/2663204.2663275, http://doi.acm.org/10.1145/2663204.2663275
Narayanan, S.: Mind changes: a simulation semantics account of counterfactuals. Cogn. Sci. (2010)
Google Scholar
Naumann, R.: Aspects of changes: a dynamic event semantics. J. Semant. 18, 27–81 (2001)
Article Google Scholar
Pustejovsky, J.: The Generative Lexicon. MIT Press, Cambridge (1995)
Google Scholar
Pustejovsky, J.: Dynamic event structure and habitat theory. In: Proceedings of the 6th International Conference on Generative Approaches to the Lexicon (GL2013), pp. 1–10. ACL (2013)
Google Scholar
Pustejovsky, J.: From actions to events: communicating through language and gesture. Interact. Stud. 19(1–2), 289–317 (2018)
Article Google Scholar
Pustejovsky, J.: From experiencing events in the action-perception cycle to representing events in language. Interact. Stud. 19 (2018)
Google Scholar
Pustejovsky, J., Krishnaswamy, N.: VoxML: A visualization modeling language. In: Chair, N.C.C., et al. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), Paris, France, May 2016
Google Scholar
Pustejovsky, J., Krishnaswamy, N.: Embodied human-computer interactions through situated grounding. In: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, pp. 1–3 (2020)
Google Scholar
Pustejovsky, J., Krishnaswamy, N.: Embodied human computer interaction. Künstliche Intelligenz (2021)
Google Scholar
Pustejovsky, J., Krishnaswamy, N.: Situated meaning in multimodal dialogue: Human-robot and human-computer interactions. Traitement Automatique des Langues 62(1) (2021)
Google Scholar
Pustejovsky, J., Moszkowicz, J.: The qualitative spatial dynamics of motion. J. Spatial Cognit. Comput. 11, 15–44 (2011)
Article Google Scholar
Quek, F., et al.: Multimodal human discourse: gesture and speech. ACM Trans. Comput.-Hum. Interact. (TOCHI) 9(3), 171–193 (2002)
Article Google Scholar
Ravenet, B., Pelachaud, C., Clavel, C., Marsella, S.: Automating the production of communicative gestures in embodied characters. Front. Psychol. 9, 1144 (2018)
Article Google Scholar
Shapiro, L.: The Routledge Handbook of Embodied Cognition. Routledge, New York (2014)
Book Google Scholar
Skantze, G., Hjalmarsson, A., Oertel, C.: Turn-taking, feedback and joint attention in situated human-robot interaction. Speech Commun. 65, 50–66 (2014). https://doi.org/10.1016/j.specom.2014.05.005. http://www.sciencedirect.com/science/article/pii/S016763931400051X
Article Google Scholar
Stalnaker, R.: Common ground. Linguist. Philos. 25(5–6), 701–721 (2002)
Article Google Scholar
Tomasello, M., Carpenter, M.: Shared intentionality. Dev. Sci. 10(1), 121–125 (2007)
Article Google Scholar
Turk, M.: Multimodal interaction: a review. Pattern Recogn. Lett. 36, 189–195 (2014)
Article Google Scholar
Unger, C.: Dynamic semantics as monadic computation. In: Okumura, M., Bekki, D., Satoh, K. (eds.) JSAI-isAI 2011. LNCS (LNAI), vol. 7258, pp. 68–81. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32090-3_7
Chapter Google Scholar
Zwaan, R.A., Pecher, D.: Revisiting mental simulation in language comprehension: six replication attempts. PLoS ONE 7(12), e51382 (2012)
Article Google Scholar
Zwaan, R.A., Radvansky, G.A.: Situation models in language comprehension and memory. Psychol. Bull. 123(2), 162 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Brandeis University, Waltham, MA, 02453, USA
James Pustejovsky
Colorado State University, Fort Collins, CO, 80523, USA
Nikhil Krishnaswamy

Authors

James Pustejovsky
View author publications
You can also search for this author in PubMed Google Scholar
Nikhil Krishnaswamy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James Pustejovsky .

Editor information

Editors and Affiliations

Purdue University, West Lafayette, IN, USA
Vincent G. Duffy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pustejovsky, J., Krishnaswamy, N. (2021). The Role of Embodiment and Simulation in Evaluating HCI: Theory and Framework. In: Duffy, V.G. (eds) Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Human Body, Motion and Behavior. HCII 2021. Lecture Notes in Computer Science(), vol 12777. Springer, Cham. https://doi.org/10.1007/978-3-030-77817-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-77817-0_21
Published: 03 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77816-3
Online ISBN: 978-3-030-77817-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Role of Embodiment and Simulation in Evaluating HCI: Theory and Framework

Abstract

Access this chapter

Similar content being viewed by others

Embodied Human Computer Interaction

Multimodal Semantics for Affordances and Actions

Prototyping User Interfaces for Investigating the Role of Virtual Agents in Human-Machine Interaction

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

The Role of Embodiment and Simulation in Evaluating HCI: Theory and Framework

Abstract

Access this chapter

Similar content being viewed by others

Embodied Human Computer Interaction

Multimodal Semantics for Affordances and Actions

Prototyping User Interfaces for Investigating the Role of Virtual Agents in Human-Machine Interaction

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation