Journal on Multimodal User Interfaces

, Volume 8, Issue 1, pp 61–73 | Cite as

A model for incremental grounding in spoken dialogue systems

  • Thomas Visser
  • David Traum
  • David DeVault
  • Rieks op den Akker
Original Paper


We present a computational model of incremental grounding, including state updates and action selection. The model is inspired by corpus-based examples of overlapping utterances of several sorts, including backchannels and completions. The model has also been partially implemented within a virtual human system that includes incremental understanding, and can be used to track grounding and provide overlapping verbal and non-verbal behaviors from a listener, before a speaker has completed her utterance.


Spoken dialogue systems  Incremental language processing Grounding 



Some of the effort described here has been sponsored by the US Army. Any opinions, content or information presented does not necessarily reflect the position or the policy of the United States Government, and no official endorsement should be inferred.


  1. 1.
    op den Akker H, Schulz C (2008) Exploring features and classifiers for dialogue act segmentation. In: Popescu-Belis A, Stiefelhagen R (eds) Machine learning for multimodal interaction. Lecture notes in computer science, vol 5237. Springer, Heidelberg, pp 196–207Google Scholar
  2. 2.
    Allwood J, Kopp S, Grammer K, Ahlsn E, Oberzaucher E, Koppensteiner M (2007) The analysis of embodied communicative feedback in multimodal corpora: a prerequisite for behavior simulation. Lang Res Eval 41(3—-4):255–272. doi: 10.1007/s10579-007-9056-2 CrossRefGoogle Scholar
  3. 3.
    Bohus D, Horvitz E (2009) Learning to predict engagement with a spoken dialog system in open-world settings. In: Proceedings of SIGDIAL 2009. LondonGoogle Scholar
  4. 4.
    Buß O, Baumann T, Schlangen D (2010) Collaborating on utterances with a spoken dialogue system using an isu-based approach to incremental dialogue management. In: Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue Association for, Computational Linguistics. pp 233–236Google Scholar
  5. 5.
    Carletta J (2007) Unleashing the killer corpus: experiences in creating the multi-everything ami meeting corpus. Lang Res Eval 41(2):181–190CrossRefGoogle Scholar
  6. 6.
    Clark H (1996) Using language. Cambridge University Press, linebreak CambridgeCrossRefGoogle Scholar
  7. 7.
    Clark H, Schaefer E (1989) Contributing to discourse. Cogn Sci 13(2):259–294CrossRefGoogle Scholar
  8. 8.
    DeVault D, Sagae K, Traum D (2009) Can i finish? Learning when to respond to incremental interpretation results in interactive dialogue. In: 10th SIGdial Workshop on Discourse and Dialogue. LondonGoogle Scholar
  9. 9.
    DeVault D, Sagae K, Traum D (2011) Detecting the status of a predictive incremental speech understanding model for real-time decision-making in a spoken dialogue system. In: The 12th Annual Conference of the International Speech Communication Association (InterSpeech 2011)Google Scholar
  10. 10.
    DeVault D, Sagae K, Traum D (2011) Incremental interpretation and prediction of utterance meaning for interactive dialogue. Dialog Discourse 2(1)Google Scholar
  11. 11.
    DeVault D, Traum D (2013) A method for the approximation of incremental understanding of explicit utterance meaning using predictive models in finite domains. NAACL-HLT 2013Google Scholar
  12. 12.
    Gratch J, Okhmatovskaia A, Lamothe F, Marsella S, Morales M, van der Werf R, Morency LP (2006) Virtual rapport. In: Gratch J, Young M, Aylett R, Ballin D, Olivier P (eds) Intelligent virtual agents, vol 2. Springer, Berlin, pp 14–27. doi: 10.1007/11821830_2 CrossRefGoogle Scholar
  13. 13.
    Hartholt A, Traum DR, Marsella SC, Shapiro A, Stratou G, Leuski A, Morency LP, Gratch J (2013) All together now—introducing the virtual human toolkit. In: Aylett R, Krenn B, Pelachaud C, Shimodaira H (eds) IVA, Lecture notes in computer science, vol 8108. Springer, Berlin, pp 368–381Google Scholar
  14. 14.
    Huang L, Morency L, Gratch J (2011) Virtual rapport 2.0. Intelligent virtual agents. Springer, Berlin, pp 68–79CrossRefGoogle Scholar
  15. 15.
    Kopp S, Allwood J, Grammer K, Ahlsen E, Stocksmeier T (2008) Modeling embodied feedback with virtual humans. In: Proceedings of the Embodied communication in humans and machines, 2nd ZiF research group international conference on Modeling communication with robots and virtual humans, ZiF’06, Springer-Verlag, Berlin, pp 18–37.
  16. 16.
    Matheson C, Poesio M, Traum D (2000) Modelling grounding and discourse obligations using update rules. In: Proceedings of the First Conference of the North American Chapter of the Association for Computational LinguisticsGoogle Scholar
  17. 17.
    Milward D (1992) Dynamics, dependency grammar and incremental interpretation. In: COLING92, pp 1095–1099Google Scholar
  18. 18.
    Morency LP, Kok I, Gratch J (2010) A probabilistic multimodal approach for predicting listener backchannels. Autonom Agent Multi-Agent Syst 20:70–84. doi: 10.1007/s10458-009-9092-y CrossRefGoogle Scholar
  19. 19.
    Nakatani C, Traum D (1999) Coding discourse structure in dialogue (version 1.0). Tech. Rep. UMIACS-TR-99-03, University of MarylandGoogle Scholar
  20. 20.
    Oviatt S, Cohen P (1991) Discourse structure and performance efficiency in interactive and non-interactive spoken modalities. Comp Speech Lang 5(4):297–326CrossRefGoogle Scholar
  21. 21.
    Plüss B, DeVault D, Traum D (2011) Toward rapid development of multi-party virtual human negotiation scenarios. In: Proceedings of SemDialGoogle Scholar
  22. 22.
    Poesio M, Traum DR (1997) Conversational actions and discourse situations. Comput Intell 13(3)Google Scholar
  23. 23.
    Roque A (2009) Dialogue management in spoken dialogue systems with degrees of grounding. Ph.D. thesis, University of Southern California, Los AngelesGoogle Scholar
  24. 24.
    Roque A, Traum D (2008) Degrees of grounding based on evidence of understanding. In: Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue, Association for, Computational Linguistics. pp 54–63Google Scholar
  25. 25.
    Schlangen D, Baumann T, Buschmeier H, Buß O, Kopp S, Skantze G, Yaghoubzadeh R (2010) Middleware for incremental processing in conversational agents. In: Proceedings of SigDial 2010. TokyoGoogle Scholar
  26. 26.
    Schlangen D, Skantze G (2009) A general, abstract model of incremental dialogue processing. In: Proc. of the 12th Conference of the European Chapter of the ACLGoogle Scholar
  27. 27.
    Schuler W, Wu S, Schwartz L (2009) A framework for fast incremental interpretation during speech decoding. Comput Ling 35(3):313–343CrossRefGoogle Scholar
  28. 28.
    Selfridge E, Arizmendi I, Heeman P, Williams J (2011) Stability and accuracy in incremental speech recognition. In: Proceedings of the SIGDIAL 2011 Conference, Association for Computational Linguistics, Portland, pp 110–119.
  29. 29.
    Skantze G, Hjalmarsson A (2010) Towards incremental speech generation in dialogue systems. In: Proceedings of the SIGDIAL 2010 Conference, Association for Computational Linguistics, Tokyo, pp 1–8.
  30. 30.
    Skantze G, Schlangen D (2009) Incremental dialogue processing in a micro-domain. In: Proceedings of the 12th Conference of the European Association for Computational Linguistics (EACL)Google Scholar
  31. 31.
    Tanenhaus M, Brown-Schmidt S (2008) Language processing in the natural world. Philos Trans Royal Soc B 363(1493):1105–1122CrossRefGoogle Scholar
  32. 32.
    Traum D (2003) Semantics and pragmatics of questions and answers for dialogue agents. In: proceedings of the International Workshop on Computational Semantics, pp 380–394Google Scholar
  33. 33.
    Traum D, DeVault D, Lee J, Wang Z, Marsella S (2012) Incremental dialogue understanding and feedback for multiparty, multimodal conversation. In: Intelligent Virtual Agents. SpringerGoogle Scholar
  34. 34.
    Traum D, Rickel J, Marsella S, Gratch J (2003) Negotiation over tasks in hybrid human-agent teams for simulation-based training. In: Proceedings of AAMAS 2003: Second International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp 441–448Google Scholar
  35. 35.
    Traum D, Swartout W, Gratch J, Marsella S (2008) A virtual human dialogue model for non-team interaction. In: Dybkjaer L, Minker W (eds) Recent trends in discourse and dialogue. Springer, NetherlandsGoogle Scholar
  36. 36.
    Traum DR (1994) A computational theory of grounding in natural language conversation. Ph.D. thesis, University of Rochester, RochesterGoogle Scholar
  37. 37.
    Traum DR, Marsella S, Gratch J, Lee J, Hartholt A (2008) Multi-party, multi-issue, multi-strategy negotiation for multi-modal virtual agents. In: Prendinger H, Lester JC, Ishizuka M (eds) IVA, lecture notes in computer science, vol 5208. Springer, Berlin, pp 117–130Google Scholar
  38. 38.
    Traum DR, Morency LP (2010) Integration of visual perception in dialogue understanding for virtual humans in multi-party interaction. In: AAMAS International Workshop on Interacting with ECAs as Virtual CharactersGoogle Scholar
  39. 39.
    Traum DR, Schubert LK, Poesio M, Martin NG, Light M, Hwang CH, Heeman P, Ferguson G, Allen JF (1996) Knowledge representation in the TRAINS-93 conversation system. Intern J Exp Syst 9(1):173–223Google Scholar
  40. 40.
    Wang Z, Lee J, Marsella S (2011) Towards more comprehensive listening behavior: beyond the bobble head. In: Intelligent Virtual Agents, Springer, Berlin, pp 216–227Google Scholar
  41. 41.
    Ward N, Tsukahara W (1999) A responsive dialogue system. In: Wilks Y (eds) Machine conversations. Springer, New YorkGoogle Scholar

Copyright information

© OpenInterface Association 2014

Authors and Affiliations

  • Thomas Visser
    • 1
  • David Traum
    • 2
  • David DeVault
    • 2
  • Rieks op den Akker
    • 1
  1. 1.University of TwenteEnschedeThe Netherlands
  2. 2.USC Institute for Creative TechnologiesPlaya VistaUSA

Personalised recommendations