Abstract
In multiparty human–agent interaction, the agent should be able to properly respond to a user by determining whether the utterance is addressed to the agent or to another person. This study proposes a model for predicting the addressee by using the acoustic information in speech and head orientation as nonverbal information. First, we conducted a Wizard-of-Oz (WOZ) experiment to collect human–agent triadic conversations. Then, we analyzed whether the acoustic features and head orientations were correlated with addressee-hood. Based on the analysis, we propose an addressee prediction model that integrates acoustic and bodily nonverbal information using SVM.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Kendon, A.: Some Functions of Gaze Direction in Social Interaction. Acta Psychologica 26, 22–63 (1967)
Duncan, S.: Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23(2), 283–292 (1972)
Vertegaal, R., et al.: Eye gaze patterns in conversations: there is more the conversational agents than meets the eyes. In: CHI 2001 (2001)
Takemae, Y., Otsuka, K., Mukawa, N.: Video cut editing rule based on participants’ gaze in multiparty conversation. In: The 11th ACM International Conference on Multimedia (2003)
Akker, R.o.d., Traum, D.: A comparison of addressee detection methods for multiparty conversations. In: 13th Workshop on the Semantics and Pragmatics of Dialogue (2009)
Frampton, M., et al.: Who is “You”? Combining Linguistic and Gaze Features to Resolve Second-Person References in Dialogue. In: the 12th Conference of the European Chapter of the ACL (2009)
Lunsford, R., Oviatt, S.: Human perception of intended addressee during computer-assisted meetings. In: The 8th international Conference on Multimodal interfaces, ICMI 2006 (2006)
Bohus, D., Horvitz, E.: Facilitating Multiparty Dialog with Gaze, Gesture, and Speech. In: ICMI-MLMI 2010 (2010)
Terken, J., Joris, I., Valk, L.d.: Multimodal Cues for Addressee-hood in Triadic Communication with a Human Information Retrieval Agent. In: International Conference on Multimodal interfaces, ICMI 2007 (2007)
Katzenmaier, M., Stiefelhagen, R., Schultz, T.: Identifying the Addressee in HumanHumanRobot Interactions based on Head Pose and Speech. In: international Conference on Multimodal interfaces, ICMI 2004 (2004)
Rodriguez, H., Beck, D., Lind, D., Lok, B.: Audio Analysis of Human/Virtual-Human Interaction. In: Prendinger, H., Lester, J.C., Ishizuka, M. (eds.) IVA 2008. LNCS (LNAI), vol. 5208, pp. 154–161. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baba, N., Huang, HH., Nakano, Y.I. (2011). Identifying Utterances Addressed to an Agent in Multiparty Human–Agent Conversations. In: Vilhjálmsson, H.H., Kopp, S., Marsella, S., Thórisson, K.R. (eds) Intelligent Virtual Agents. IVA 2011. Lecture Notes in Computer Science(), vol 6895. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23974-8_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-23974-8_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23973-1
Online ISBN: 978-3-642-23974-8
eBook Packages: Computer ScienceComputer Science (R0)