Incremental Recognition and Prediction of Dialogue Acts

  • Volha Petukhova
  • Harry Bunt
Part of the Text, Speech and Language Technology book series (TLTB, volume 47)


This chapter is concerned with the incremental understanding of utterances in spoken dialogue, with a focus on how their intended (possibly multiple) communicative functions can be recognized in a data-oriented way on the basis of observable features of communicative behaviour. An incremental, token-based approach is described which combines the use of local classifiers, that exploit local utterance features, and global classifiers that use the outputs of local classifiers applied to previous and subsequent tokens. This approach is shown to result in excellent dialogue act recognition scores for unsegmented spoken dialogue. This can be seen as a significant step forward towards the development of fully incremental, on-line methods for computing the meaning of utterances in spoken dialogue.


Belief State Communicative Function Local Classifier Dialogue Behaviour Global Classifier 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research was conducted within the project ‘Multidimensional Dialogue Modelling’, sponsored by the Netherlands Organisation for Scientific Research (NWO), under grant reference 017.003.090.


  1. Aist, G., Allen, J., Campana, E., Gomez Gallo, C., Stoness, S., Swift, M., & Tanenhaus, M. K. (2007). Incremental understanding in human-computer dialogue and experimental evidence for advantages over nonincremental methods. In R. Arstein & L. Vieu (Eds.), Proceedings of the 11th workshop on the semantics and pragmatics of dialogue, Trento, Italy (pp. 149–154). Google Scholar
  2. Ang, J., Liu, Y., & Shriberg, E. (2005). Automatic dialog act segmentation and classification in multiparty meetings. In Proceedings of the international conference on acoustics, speech and signal processing (ICASSP), Philadelphia, USA (Vol. 1, pp. 1061–1064). Google Scholar
  3. Barkhuysen, P., Krahmer, E., & Swerts, M. (2008). The interplay between auditory and visual cues for end-of-utterance detection. Journal of the Acoustical Society of America, 123(1), 354–365. CrossRefGoogle Scholar
  4. Bever, T. (1970). The cognitive basis for linguistic structure. In I. Hayes (Ed.), Cognitive development of language, New York: Wiley. Google Scholar
  5. Bos, J. (2002). Underspecification and resolution in discourse semantics. PhD Thesis, Saarbrücken: Saarland University. Google Scholar
  6. Bunt, H. (2000). Dialogue pragmatics and context specification. In H. Bunt & W. Black (Eds.), Abduction, belief and context in dialogue; studies in computational pragmatics (pp. 81–105). Amsterdam: John Benjamins. Google Scholar
  7. Bunt, H. (2007). Semantic underspecification: Which techniques for what purpose? In H. Bunt & R. Muskens (Eds.), Computing meaning (Vol. 3, pp. 55–85). Dordrecht: Springer. CrossRefGoogle Scholar
  8. Bunt, H. (2009). The DIT++ taxonomy for functional dialogue markup. In H. Heylen, C. Pelachaud, R. Catizone, & D. Traum (Eds.), Proceedings of the AAMAS 2009 workshop ‘Towards a standard markup language for embodied dialogue acts’ (EDAML 2009), Budapest (pp. 13–25). Google Scholar
  9. Bunt, H. (2011). Multifunctionality in dialogue. Computer, Speech and Language, 25, 222–245. CrossRefGoogle Scholar
  10. Bunt, H., Fang, A., Cao, J., Liu, X., & Petukhova, V. (2013). Issues in the addition of ISO standard annotations to the switchboard corpus. In Proceedings ninth joint ISO – ACL SIGSeM workshop on interoperable semantic annotation (ISA-9), Potsdam (pp. 59–70). Google Scholar
  11. Chater, N., Pickering, M., & Milward, D. (1995). What is incremental interpretation? Edingburg Working Papers in Cognitive Science, 11, 1–23. Google Scholar
  12. Cohen, W. (1995). Fast effective rule induction. In Proceedings of the 12th international conference on machine learning (ICML’95) (pp. 115–123). Google Scholar
  13. Corbett, A., & Chang, F. (1983). Pronoun disambiguating: Accessing potential antecedents. Memory and Cognition, 11, 283–294. CrossRefGoogle Scholar
  14. de Ruiter, J., Mitterer, H., & Enfield, N. (2006). Projecting the end of a speaker’s turn: A cognitive cornerstone of conversation. Language, 82, 515–535. CrossRefGoogle Scholar
  15. DeVault, D., & Stone, M. (2003). Domain inference in incremental interpretation. In Proceedings of the workshop on inference in computational semantics, INRIA Lorraine, Nancy, France (pp. 73–87). Google Scholar
  16. Dietterich, T. (2002). Machine learning for sequential data: A review. In T. Caelli, A. Amin, R. Duin, M. Kamel, & D. Ridder (Eds.), Proceedings of the joint IAPR international workshop on structural, syntactic, and statistical pattern recognition (pp. 15–30). CrossRefGoogle Scholar
  17. Fernandez, R., & Picard, R. W. (2002). Dialog act classification from prosodic features using support vector machines. In Proceedings of speech prosody 2002, Aix-en-Provence, France. Google Scholar
  18. Ford, C., & Thompson, S. (1996). Interactional units in conversation: Syntactic, intonational, and pragmatic resources for the management of turns. In E. Schegloff & S. Thompson (Eds.), Interaction and grammar (pp. 135–184). Cambridge: Cambridge University Press. Google Scholar
  19. Frazier, L., & Rayner, K. (1982). Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology, 14, 178–210. CrossRefGoogle Scholar
  20. Geertzen, J. (2009). Dialogue act recognition and prediction: Exploration in computational dialogue modelling. PhD Thesis, The Netherlands: Tilburg University. Google Scholar
  21. Geertzen, J., Girard, Y., & Morante, R. (2004). The DIAMOND project. In Poster at the 8th workshop on the semantics and pragmatics of dialogue (CATALOG 2004), Barcelona, Spain. Google Scholar
  22. Geertzen, J., Petukhova, V., & Bunt, H. (2007). A multidimensional approach to utterance segmentation and dialogue act classification. In Proceedings of the 8th SIGdial workshop on discourse and dialogue, Antwerp, Belgium (pp. 140–149). Stroudsburg: Association for Computational Linguistics. Google Scholar
  23. Grosjean, F., & Hirt, C. (1996). Using prosody to predict the end of sentences in English and French: Normal and brain-damaged subjects. Language and Cognitive Processes, 11, 107–134. CrossRefGoogle Scholar
  24. Haddock, N. (1989). Computational models of incremental semantic interpretation. Language and Cognitive Processes, 14(3), SI337–SI380. CrossRefGoogle Scholar
  25. Hobbs, J. (1985). Ontological promiscuity. In Proceedings 23rd annual meeting of the ACL, Chicago (pp. 61–69). Google Scholar
  26. Jurafsky, D., Shriberg, E., Fox, B., & Curl, T. (1998). Lexical, prosodic, and syntactic cues for dialogue acts. In M. Stede, L. Wanner, & E. Hovy (Eds.), Discourse relations and discourse markers: Proceedings of the workshop on discourse relations and discourse markers, Somerset, New Jersey, USA (pp. 114–120). Stroudsburg: Association for Computational Linguistics. Google Scholar
  27. Keizer, S. (2003). Reasoning under uncertainty in natural language dialogue using Bayesian networks. PhD Thesis, The Netherlands: Twente University. Google Scholar
  28. Lendvai, P., Bosch, v. d. A., Krahmer, E., & Canisius, S. (2004). Memory-based robust interpretation of recognised speech. In Proceedings of the 9th international conference on speech and computer (SPECOM ’04), St. Petersburgh, Russia (pp. 415–422). Google Scholar
  29. Lendvai, P., & Geertzen, J. (2007). Token-based chunking of turn-internal dialogue act sequences. In Proceedings of the 8th SIGdial workshop on discourse and dialogue, Antwerp, Belgium (pp. 174–181). Google Scholar
  30. Meteer, & Taylor, R. A. (1995). Dysfluency annotation stylebook for the switchboard corpus.
  31. Milward, D., & Cooper, R. (2009). Incremental interpretation: Applications, theory, and relationship to dynamic semantics. In Proceedings COLING 2009, Kyoto, Japan (pp. 748–754). Google Scholar
  32. Nakano, M., Miyazaki, N., Hirasawa, J., Dohsaka, K., & Kawabata, T. (1999). Understanding unsegmented user utterances in real-time spoken dialogue systems. In Proceedings of the 37th annual conference of the Association of Computational Linguistics, ACL (pp. 200–207). Google Scholar
  33. Petukhova, V., & Bunt, H. (2009). Who’s next? Speaker-selection mechanisms in multiparty dialogue. In Proceedings of the workshop on the semantics and pragmatics of dialogue, Stockholm (pp. 19–26). Google Scholar
  34. Pinkal, M. (1999). On semantic underspecification. In H. Bunt & R. Muskens (Eds.), Computing meaning (Vol. 1, pp. 33–56). Dordrecht: Kluwer. CrossRefGoogle Scholar
  35. Poesio, M., & Traum, D. (1998). Towards an axiomatization of dialogue acts. In Proceedings of the Twente workshop on the formal semantics and pragmatics of dialogue (pp. 309–347). The Netherlands: University of Twente. Google Scholar
  36. Reithinger, N., & Klesen, M. (1997). Dialogue act classification using language models. In Proceedings of EuroSpeech-97 (pp. 2235–2238). Google Scholar
  37. Samuel, K., Carberry, S., & Vijay-Shanker, K. (1998). Dialogue act tagging with transformation-based learning. In Proceedings of the 36th annual meeting of the Association for Computational Linguistics and 17th international conference on computational linguistics, Montreal (Vol. 2, pp. 1150–1156). CrossRefGoogle Scholar
  38. Sedivy, J. (2003). Pragmatic versus form-based accounts of referential contrast: Evidence for effects of informativity expectations. Journal of Psycholinguistic Research, 32(1), 3–23. CrossRefGoogle Scholar
  39. Sedivy, J., Tanenhaus, M., Chambers, C., & Carlson, G. (1999). Achieving incremental semantic interpretation through contextual representation. Cognition, 71, 109–147. CrossRefGoogle Scholar
  40. Shriberg, E., Bates, R., Stolcke, A., Taylor, P., Jurafsky, D., Ries, K., Coccaro, N., Martin, R., Meteer, M., & van Ess-Dykema, C. (1998). Can prosody aid the automatic classification of dialog acts in conversational speech? Language and Speech (Special Issue on Prosody and Conversation), 41(3–4), 439–487. Google Scholar
  41. Simpson, G. (1994). Context and the processing of ambiguous words. In M. Gernsbacher (Ed.), Handbook of psycholinguistics (pp. 359–374). San Diego: Academic Press. Google Scholar
  42. Stolcke, A., Ries, K., Coccaro, K., Shriberg, E., Bates, R., Jurafsky, D., Taylor, P., Martin, R., van Ess-Dykema, C., & Meteer, M. (2000). Dialogue act modeling for automatic tagging and recognition of conversational speech. Computational Linguistics, 26(3), 339–373. CrossRefGoogle Scholar
  43. Swinney, D. (1979). Lexical access during sentence comprehension: (Re)consideration of context effects. Journal of Verbal Learning and Verbal Behaviour, 18, 545–567. Google Scholar
  44. Tanenhaus, M., Spivey-Knowlton, M., Eberhard, K., & Sedivy, J. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, 1632–1634. CrossRefGoogle Scholar
  45. Tomita, M. (1986). Efficient parsing for natural language. Dordrecht: Kluwer. CrossRefGoogle Scholar
  46. Traum, D., & Heeman, P. (1997). Utterance units in spoken dialogue. In Proceedings of ECAI workshop on dialogue processing in social language systems, London, UK (pp. 125–140). CrossRefGoogle Scholar
  47. Traum, D., & Larsson, S. (2003). The information state approach to dialogue acts. In R. Smith & J. van Kuppevelt (Eds.), Current and new directions in discourse and dialogue (pp. 325–353). Dordrecht: Kluwer. CrossRefGoogle Scholar
  48. Van den Bosch, A. (1997). Learning to pronounce written words: A study in inductive language learning. PhD thesis, The Netherlands: Maastricht University. Google Scholar
  49. Webb, N., Hepple, M., & Wilks, Y. (2005). Error analysis of dialogue act classification. In Proceedings of the 8th international conference on text, speech and dialogue, Karlovy Vary, Czech Republic (Vol. 3658, pp. 451–458). CrossRefGoogle Scholar
  50. Zimmermann, M., Lui, Y., Shriberg, E., & Stolcke, A. (2005). Toward joint segmentation and classification of dialog acts in multiparty meetings. In Proceedings of the multimodal interaction and related machine learning algorithms workshop (MLMI-05) (pp. 187–193). Berlin: Springer. Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.Department of Spoken Language SystemsSaarland UniversitySaarbrückenGermany
  2. 2.Tilburg Center for Cognition and Communication (TiCC) and Department of PhilosophyTilburg UniversityTilburgThe Netherlands

Personalised recommendations