Bridging Gaps Between Planning and Open-Domain Spoken Dialogues



In social media, Wikipedia is the outstanding example of a collaborative wiki. After reviewing progress in open-domain question answering systems, the paper discusses a recent system, WikiTalk, that supports open-domain dialogues by using Wikipedia as a knowledge source. With the collaboratively-written sentences and paragraphs from Wikipedia, the WikiTalk system apparently succeeds in enabling “open-domain talking”. In view of recent advances in web-based language processing, the paper proposes steps towards open-domain “listening” that combine anticipated progress in open vocabulary speech recognition with recent developments in named entity recognition, where Wikipedia is now used as a dynamically updated knowledge source instead of fixed gazetteer lists. The paper proposes that Wikipedia-based open-domain talking and open-domain listening will be combined in a new generation of web-based open-domain spoken dialogue systems. Technological and social development affects our interaction with the environment: interactive systems are embedded in our environment, information flow increases, and interaction becomes more complex. In order to address challenges of the complex environment, to respond to needs of various users, and to provide possibilities to test innovative interactive systems, it is important to investigate processes that underlie human-computer interaction, to provide models and concepts that enable us to experiment with various types of complex systems, and to design and build tools and prototypes that demonstrate the ideas and techniques in a working system. In this article, I will discuss the “gap” between dialogue management and response planning and focus on the communicatively adequate contributions that are produced in the context of a situated robot agent. The WikiTalk system supports open-domain conversations by using Wikipedia as the knowledge source, and a version of it is implemented on the Nao-robot.


Wikitalk interaction Open-domain dialogues Newinfo Topic trees Planning Generation 


  1. Allwood, J. (1976). Linguistic communication as action and cooperation. Gothenburg Monographs in Linguistics 2. University of Gothenburg.Google Scholar
  2. André, E., & Pelachaud, C. (2010). Interacting with embodied conversational agents. In F. Cheng, & K. Jokinen (Eds.), Speech technology: Theory and applications (pp. 123–150). Berlin: Springer.Google Scholar
  3. Buscaldi, D., & Rosso, P. (2006). Mining knowledge from Wikipedia for the question answering task. In Proceedings of 5th Language Resources and Evaluation Conference (LREC 2006), Genoa.Google Scholar
  4. Clark, H. H., & Brennan, S. E. (1991). Grounding in communication. In L. B. Resnick, J. M. Levine, & S. D. Teasley (Eds.), Perspectives on socially shared cognition (pp. 127–149). Washington: APA Books.CrossRefGoogle Scholar
  5. Csapo, A., Gilmartin, E., Grizou, J., Han, J. G., Meena, R., & Anastasiou, D., et al. (2012). Multimodal conversational interaction with a humanoid robot. In Proceedings of 3rd IEEE International Conference on Cognitive Infocommunications (CogInfoCom 2012), Kosice.Google Scholar
  6. El Ayari, S, & Grau, B. (2009). A framework of evaluation for question-answering systems. ECIR, Lecture notes in computer science (vol. 5478, pp. 744–748). Berlin: Springer.Google Scholar
  7. Franz, A., & Milch, B. (2002). Searching the web by voice. In Proceedings of 19th International Conference on Computational Linguistics (COLING 2002) (pp. 1213–1217). Taipei.Google Scholar
  8. Ferrucci, D. A. (2012). Introduction to this is watson. IBM Journal of Research and Development, 56(3.4), 1:1–1:15.Google Scholar
  9. Ginzburg, J. (1996). Interrogatives: Questions, facts and dialogue. In S. Lappin (Ed.), The handbook of contemporary semantic theory (pp. 385–422). Blackwell: Blackwell Textbooks in Linguistics.Google Scholar
  10. Grappy, A., Grau, B., & Rosset, S. (2012). Methods combination and ML-based re-ranking of multiple hypothesis for question-answering systems. In Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data (pp. 87–96). Avignon, France, April 2012.Google Scholar
  11. Greenwood, M. A. (2006). Open-domain question answering. PhD Thesis, Department of Computer Science, The University of Sheffield.Google Scholar
  12. Holmes, J. N., & Holmes, W. J. (2002). Speech synthesis and recognition. UK: Taylor Francis Ltd.Google Scholar
  13. Jokinen, K. (2009). Constructive dialogue modelling: Speech interaction and rational agents. New York: Wiley.Google Scholar
  14. Jokinen, K., & McTear, M. (2009). Spoken dialogue systems. Synthesis lectures on human language technologies. San Rafael, CA: Morgan and Claypool. doi:10.2200/S00204ED1V01Y200910HLT005.
  15. Jokinen, K., Tanaka, H., & Yokoo, A. (1998). Planning dialogue contributions with new information. Proceedings of the ninth international workshop on natural language generation (pp. 158–167). Ontario: Niagara-on-the-Lake.Google Scholar
  16. Jokinen, K., & Wilcock, G. (2003). Adaptivity and response generation in a spoken dialogue system. In J. van Kuppevelt, & R. W. Smith (Eds.), Current and new directions in discourse and dialogue. (pp. 213–234). UK: Kluwer Academic Publishers.Google Scholar
  17. Jokinen, K., & Wilcock, G. (2012). Constructive interaction for talking about interesting topics. In Proceedings of Eighth International Conference on Language Resources and Evaluation (LREC 2012), Istanbul.Google Scholar
  18. Jokinen, K., & Wilcock, G. (2013). Multimodal open-domain conversations with the nao robot. In J. Mariani, L. Devillers, M. Garnier-Rizet, & S. Rosset (Eds.), Natural interaction with robots, knowbots and smartphones—putting spoken dialog systems into practice. Berlin: Springer.Google Scholar
  19. Kirschner, M. (2007). Applying a focus tree model of dialogue context to interactive question answering. In Proceedings of ESSLLI’07 Student Session, Dublin, Ireland.Google Scholar
  20. McCoy K. F., & Cheng, J. (1991). Focus of attention: Constraining what can be said next. In C. Paris, W. Swartout, & W. Mann (Eds.), Natural language generation in artificial intelligence and computational linguistics, (pp. 103–124). UK: Kluwer Academic Publishers.Google Scholar
  21. Misu, T., Mizumaki, E., Shiga, Y., Kawamoto, S., Kawai, H., & Nakamura, S. (2011). Analysis on effects of text-to speech and avatar agent on evoking users’ spontaneous listener’s reactions. In Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems Workshop (pp. 77–89), Granada.Google Scholar
  22. Moriceau, V., SanJuan, E., Tannier, X., & Bellot, P. (2009). Overview of the 2009 QA track: Towards a common task for QA, focused IR and automatic summarization systems. In Focused Retrieval and Evaluation, 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009. Brisbane, Australia, (pp. 355–365). Springer Verlag. Lecture Notes in Computer Science (LNCS 6203).Google Scholar
  23. McDonald, D. (1993). Does natural language generation start from a specification?. In: H. Horacek & M. Zock (Eds.), New concepts in natural language generation (pp. 275–297). London: Pinter Publishers.Google Scholar
  24. Rosset, S., Galibert, O., Illouz, G., & Max, A. (2006). Integrating spoken dialogue and question answering: The RITEL project. In Proceedings of InterSpeech 06, Pittsburgh.Google Scholar
  25. Reiter, E., & Dale, R. (2000). Building natural language generation systems. Cambridge: Cambridge University Press. Reissued in paperback in 2006.Google Scholar
  26. Theune, M. (2000). From data to speech: language generation in context. Ph.D. thesis, Eindhoven University of Technology.Google Scholar
  27. Traum, D. R. (1994) A computational theory of grounding in natural language conversation, TR 545 and Ph.D. Thesis, Computer Science Dept., U. Rochester, December 1994.Google Scholar
  28. Traum, D., & Larsson, S. (2003). The information state approach to dialogue management. In J. van Kuppevelt and R. Smith (Eds.), Current and new directions in discourse and dialogue (pp. 325–353). South Holland: Kluwer.Google Scholar
  29. Weizenbaum, J. (1966). Eliza—a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45.CrossRefGoogle Scholar
  30. Jokinen, K., & Wilcock, G. (2001). Pipelines, templates and transformations: XML for natural language generation. In Proceedings of the 1st NLP and XML Workshop (pp. 1–8). Tokyo.Google Scholar
  31. Wilcock, G. (2012). WikiTalk: A spoken Wikipedia-based open-domain knowledge access system. In Proceedings of the COLING 2012 Workshop on Question Answering for Complex Domains (pp. 57–69). Mumbai, India.Google Scholar
  32. Wilcock, G. & Jokinen, K. (2013). Towards cloud-based speech interfaces for open-domain coginfocom systems. In Proceedings of the 4th IEEE International Conference on Cognitive Infocommunications (CogInfoCom) 2013, Budapest, Hungary.Google Scholar
  33. Wilcock, G., & Jokinen, K. (2011). Adding speech to a robotics simulator. In R. Lopez Delgado, et al. (Eds.) Proceedings of the Third International Conference on Spoken Dialogue Systems: Ambient Intelligence. (pp. 375–380). Granada, Spain: Springer Publishers.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.University of HelsinkiHelsinkiFinland

Personalised recommendations