Journal on Multimodal User Interfaces

, Volume 3, Issue 1–2, pp 141–153 | Cite as

Natural interaction with a virtual guide in a virtual environment

A multimodal dialogue system
  • Dennis Hofs
  • Mariët TheuneEmail author
  • Rieks op den Akker
Open Access
Original Paper


This paper describes the Virtual Guide, a multimodal dialogue system represented by an embodied conversational agent that can help users to find their way in a virtual environment, while adapting its affective linguistic style to that of the user. We discuss the modular architecture of the system, and describe the entire loop from multimodal input analysis to multimodal output generation. We also describe how the Virtual Guide detects the level of politeness of the user’s utterances in real-time during the dialogue and aligns its own language to that of the user, using different politeness strategies. Finally we report on our first user tests, and discuss some potential extensions to improve the system.

Conversational agent Application Social interaction Politeness Multimodal analysis and generation 


  1. 1.
    Allen J, Core M (1997) Draft of DAMSL: Dialog Act Markup in Several Layers. Tech. rep., University of Rochester Google Scholar
  2. 2.
    André E, Rehm M, Minker W, Buhler D (2004) Endowing spoken language dialogue systems with emotional intelligence. In: Affective dialogue systems. LNCS, vol 3068, pp 178–187 Google Scholar
  3. 3.
    Bateman J, Paris C (2005) Adaptation to affective factors: architectural impacts for natural language generation and dialogue. In: Proceedings of the workshop on adapting the interaction style to affective factors at the 10th international conference on user modeling (UM-05) Google Scholar
  4. 4.
    Bernsen N, Dybkjær L (2004) Managing domain-oriented spoken conversation. In: Proceedings of the AAMAS 2004 workshop on embodied conversational agents: balanced perception and action, pp 9–17 Google Scholar
  5. 5.
    Bickmore T, Caruso L, Clough-Gorr K, Heeren T (2005) ‘It’s just like you talk to a friend’—relational agents for older adults. Interact Comput 17(6):711–735 CrossRefGoogle Scholar
  6. 6.
    Black W, Thompson P, Funk A, Conroy A (2003) Learning to classify utterances in a task-oriented dialogue. In: Proceedings of the 2003 EACL workshop on dialogue systems: interaction, adaptation and styles of management, pp 9–16 Google Scholar
  7. 7.
    Boves L, Neumann A, Vuurpijl L, ten Bosch L, Rossignol S, Engel R, Pfleger N (2004) Multimodal interaction in architectural design applications. In: Proceedings UI4ALL 2004: 8th ERCIM workshop on “user interfaces for all”, pp 384–390 Google Scholar
  8. 8.
    Brown P, Levinson SC (1987) Politeness—some universals in language usage. Cambridge University Press, Cambridge Google Scholar
  9. 9.
    Buschmeier H, Bergmann K, Kopp S (2009) An alignment-capable microplanner for natural language generation. In: Proceedings of the twelfth European workshop on natural language generation (ENLG 2009), pp 82–89 Google Scholar
  10. 10.
    Cassell J, Bickmore T (2003) Negotiated collusion: modeling social language and its relationship effects in intelligent agents. User Model User-Adapt Interact 13(1–2):89–132 CrossRefGoogle Scholar
  11. 11.
    Cassell J, Vilhjálmsson H, Bickmore T (2001) BEAT: the Behavior Expression Animation Toolkit. In: Proceedings of SIGGRAPH ’01, pp 477–486 Google Scholar
  12. 12.
    Catizone R, Setzer A, Wilks Y (2003) Multimodal dialogue management in the COMIC project. In: Proceedings of the 2003 EACL workshop on dialogue systems: interaction, adaptation and styles of management, pp 25–34 Google Scholar
  13. 13.
    Cheyer A, Martin D (2001) The open agent architecture. J Auton Agents Multi-Agent Syst 4(1):143–148 CrossRefGoogle Scholar
  14. 14.
    Clark HH (1996) Using language. Cambridge University Press, Cambridge CrossRefGoogle Scholar
  15. 15.
    Dale R, Reiter E (1995) Computational interpretation of the Gricean maxims in the generation of referring expressions. Cogn Sci 19(2):233–263 CrossRefGoogle Scholar
  16. 16.
    van Dijk B, op den Akker R, Nijholt A, Zwiers J (2003) Navigation assistance in virtual worlds. Inf Sci 6:115–125. Special series on community informatics Google Scholar
  17. 17.
    Evers M, Nijholt A (2000) Jacob—an animated instruction agent for virtual reality. In: Tan T et al. (eds), Advances in multimodal interfaces—ICMI 2000. LNCS, vol 1948. Springer, Berlin, pp 526–533 CrossRefGoogle Scholar
  18. 18.
    Guinn C, Hubal R (2003) Extracting emotional information from the text of spoken dialog. In: Proceedings of the 9th international conference on user modeling, pp 23–27 Google Scholar
  19. 19.
    Gupta S, Walker MA, Romano DM (2007) Generating politeness in task based interaction: an evaluation of the effect of linguistic form and culture. In: Proceedings of the eleventh European workshop on natural language generation (ENLG-07), pp 57–64 Google Scholar
  20. 20.
    Gupta S, Walker MA, Romano DM (2008) POLLy: a conversational system that uses a shared, representation to generate action and social language. In: Proceedings of IJCNLP 2008, the third international joint conference on natural language processing, pp 967–972 Google Scholar
  21. 21.
    Isard A, Brockmann C, Oberlander J (2006) Individuality and alignment in generated dialogues. In: Proceedings of the 4th international conference on natural language generation (INLG-06), pp 22–29 Google Scholar
  22. 22.
    Janarthanam S, Lemon O (2009) Learning lexical alignment policies for generating referring expressions for spoken dialogue systems. In: Proceedings of the twelfth European workshop on natural language generation (ENLG 2009), pp 74–81 Google Scholar
  23. 23.
    de Jong M, Theune M, Hofs D (2008) Politeness and alignment in dialogues with a virtual guide. In: Proceedings of the seventh international conference on autonomous agents and multiagent systems (AAMAS 2008), pp 207–214 Google Scholar
  24. 24.
    Keizer S, op den Akker R (2007) Dialogue act recognition under uncertainty using bayesian networks. Nat Lang Eng 13(4):287–316 CrossRefGoogle Scholar
  25. 25.
    Kelleher JD, Costello FJ (2009) Applying computational models of spatial prepositions to visually situated dialog. Comput Linguist 35(2):271–306 CrossRefGoogle Scholar
  26. 26.
    Kerminen A, Jokinen K (2003) Distributed dialogue management in a blackboard architecture. In: Proceedings of the 2003 EACL workshop on dialogue systems: interaction, adaptation and styles of management, pp 53–60 Google Scholar
  27. 27.
    Kopp S, Tepper P, Striegnitz K, Ferriman K, Cassell J (2007) Trading spaces: how humans and humanoids use speech and gesture to give directions. In: Nishida T (ed) Engineering approaches to conversational informatics. Wiley, New York Google Scholar
  28. 28.
    Lappin S, Leass H (1994) An algorithm for pronominal anaphora resolution. Comput Linguist 20(4):535–561 Google Scholar
  29. 29.
    Lemon O, Bracy A, Gruenstein A, Peters S (2001) The WITAS multi-modal dialogue system I. In: Proceedings EuroSpeech 2001, pp 1559–1562 Google Scholar
  30. 30.
    Neff M, Kipp M, Albrecht I, Seidel HP (2008) Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Trans Graph 27(1):1–24 CrossRefGoogle Scholar
  31. 31.
    Oviatt S, Cohen P (2000) Multimodal interfaces that process what comes naturally. Commun ACM 43(3):45–53 CrossRefGoogle Scholar
  32. 32.
    Pickering MJ, Garrod S (2004) Toward a mechanistic psychology of dialogue. Behav Brain Sci 27:169–226 Google Scholar
  33. 33.
    Porayska-Pomsta K, Mellish C (2004) Modelling politeness in natural language generation. In: Proceedings of the third international conference on natural language generation (INLG-04). LNAI, vol 3123, pp 141–150 Google Scholar
  34. 34.
    Rehm M, André E (2005) Informing the design of embodied conversational agents by analyzing multimodal politeness behaviors in human-human communication. In: Proceedings of the AISB symposium on conversational informatics for supporting social intelligence and interaction, pp 144–151 Google Scholar
  35. 35.
    Sikkel K, op den Akker R (1993) Predictive head-corner chart parsing. In: IWPT 3, third international workshop on parsing technologies, pp 267–276 Google Scholar
  36. 36.
    Theune M, Hofs D, van Kessel M (2007) The virtual guide: a direction giving embodied conversational agent. In: Proceedings of interspeech 2007, pp 2197–2200 Google Scholar
  37. 37.
    Vismans R (1994) Modal particles in dutch directives: a study in functional grammar. In: IFOTT, Vrije Universiteit, Amsterdam Google Scholar
  38. 38.
    Walker M, Cahn J, Whittaker S (1997) Improvising linguistic style: social and affective bases for agent personality. In: Proceedings of autonomous agents’97. ACM, New York, pp 96–105 CrossRefGoogle Scholar
  39. 39.
    Wang N, Johnson WL, Mayer RE, Rizzo P, Shaw E, Collins H (2008) The politeness effect: pedagogical agents and learning outcomes. Int J Human-Comput Stud 66:98–112 CrossRefGoogle Scholar
  40. 40.
    Wasinger R, Wahlster W (2006) Multimodal human-environment interaction. In: Aarts E, Encarnação J (eds) True visions: the emergence of ambient intelligence. Springer, Berlin, pp 293–308 Google Scholar
  41. 41.
    van Welbergen H, Nijholt A, Reidsma D, Zwiers J (2006) Presenting in virtual worlds: towards an architecture for a 3D presenter explaining 2D-presented information. IEEE Intell Syst 21(5):47–53 CrossRefGoogle Scholar
  42. 42.
    White M, Caldwell T (1998) EXEMPLARS: a practical, extensible framework for dynamic text generation. In: Proceedings of the ninth international workshop on natural language generation (INLG-98), pp 266–275 Google Scholar
  43. 43.
    Wu L, Oviatt SL, Cohen PR (1999) Multimodal integration—a statistical view. IEEE Trans Multimedia 1(4):334–341 CrossRefGoogle Scholar

Copyright information

© The Author(s) 2009

Authors and Affiliations

  • Dennis Hofs
    • 1
  • Mariët Theune
    • 1
    Email author
  • Rieks op den Akker
    • 1
  1. 1.Human Media InteractionUniversity of TwenteEnschedeThe Netherlands

Personalised recommendations