Identifying Utterances Addressed to an Agent in Multiparty Human–Agent Conversations

  • Naoya Baba
  • Hung-Hsuan Huang
  • Yukiko I. Nakano
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6895)


In multiparty human–agent interaction, the agent should be able to properly respond to a user by determining whether the utterance is addressed to the agent or to another person. This study proposes a model for predicting the addressee by using the acoustic information in speech and head orientation as nonverbal information. First, we conducted a Wizard-of-Oz (WOZ) experiment to collect human–agent triadic conversations. Then, we analyzed whether the acoustic features and head orientations were correlated with addressee-hood. Based on the analysis, we propose an addressee prediction model that integrates acoustic and bodily nonverbal information using SVM.


Addressee-hood Multiparty conversation Head pose Prosody 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kendon, A.: Some Functions of Gaze Direction in Social Interaction. Acta Psychologica 26, 22–63 (1967)CrossRefGoogle Scholar
  2. 2.
    Duncan, S.: Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23(2), 283–292 (1972)CrossRefGoogle Scholar
  3. 3.
    Vertegaal, R., et al.: Eye gaze patterns in conversations: there is more the conversational agents than meets the eyes. In: CHI 2001 (2001)Google Scholar
  4. 4.
    Takemae, Y., Otsuka, K., Mukawa, N.: Video cut editing rule based on participants’ gaze in multiparty conversation. In: The 11th ACM International Conference on Multimedia (2003)Google Scholar
  5. 5.
    Akker, R.o.d., Traum, D.: A comparison of addressee detection methods for multiparty conversations. In: 13th Workshop on the Semantics and Pragmatics of Dialogue (2009) Google Scholar
  6. 6.
    Frampton, M., et al.: Who is “You”? Combining Linguistic and Gaze Features to Resolve Second-Person References in Dialogue. In: the 12th Conference of the European Chapter of the ACL (2009)Google Scholar
  7. 7.
    Lunsford, R., Oviatt, S.: Human perception of intended addressee during computer-assisted meetings. In: The 8th international Conference on Multimodal interfaces, ICMI 2006 (2006)Google Scholar
  8. 8.
    Bohus, D., Horvitz, E.: Facilitating Multiparty Dialog with Gaze, Gesture, and Speech. In: ICMI-MLMI 2010 (2010)Google Scholar
  9. 9.
    Terken, J., Joris, I., Valk, L.d.: Multimodal Cues for Addressee-hood in Triadic Communication with a Human Information Retrieval Agent. In: International Conference on Multimodal interfaces, ICMI 2007 (2007)Google Scholar
  10. 10.
    Katzenmaier, M., Stiefelhagen, R., Schultz, T.: Identifying the Addressee in HumanHumanRobot Interactions based on Head Pose and Speech. In: international Conference on Multimodal interfaces, ICMI 2004 (2004)Google Scholar
  11. 11.
    Rodriguez, H., Beck, D., Lind, D., Lok, B.: Audio Analysis of Human/Virtual-Human Interaction. In: Prendinger, H., Lester, J.C., Ishizuka, M. (eds.) IVA 2008. LNCS (LNAI), vol. 5208, pp. 154–161. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Naoya Baba
    • 1
  • Hung-Hsuan Huang
    • 2
  • Yukiko I. Nakano
    • 3
  1. 1.Graduate School of Science and TechnologySeikei UniversityMusashino-shiJapan
  2. 2.Department of Information & Communication ScienceRitsumeikan UniversityJapan
  3. 3.Dept. of Computer and Information ScienceSeikei UniversityJapan

Personalised recommendations