Vidiam: Corpus-based Development of a Dialogue Manager for Multimodal Question Answering

  • Boris van SchootenEmail author
  • Rieks op den Akker
Part of the Theory and Applications of Natural Language Processing book series (NLP)


This chapter describes the Vidiam project, which covered the development of a dialogue management system for multimodal question answering (QA) dialogues, as carried out in the IMIX project. The approach followed was datadriven, i.e., corpus-based. Since research in QA dialogue of multimodal information retrieval is still new, no suitable corpora were available to base a system on. This chapter reports on the collection and analysis of three QA dialogue corpora, involving textual follow-up utterances, multimodal follow-up questions, and speech dialogues. Based on the data, a dialogue act typology was created, which helps translate user utterances to practical interactive QA strategies. The chapter goes on to explain how the dialogue manager and its components: dialogue act recognition; interactive QA strategy handling; reference resolution; and multimodal fusion, were built and evaluated using off-line analysis of the corpus data.


Information Retrieval Question Answering Dialogue System Visual Element Text Fragment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bertomeu N, Uszkoreit H, Frank A, Krieger HU, J¨org B (2006) Contextual phenomena and thematic relations in database QA dialogues: results from a Wizard-of-Oz experiment. In: Workshop on Interactive Question Answering, HLT-NAACL 06, pp 1–8CrossRefGoogle Scholar
  2. Bouma G, Mur J, van Noord G, van der Plas L, Tiedemann J (2006) Question answering for dutch using dependency relations. In: Proceedings of the CLEF2005 workshopGoogle Scholar
  3. Cohen J (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20:37–46CrossRefGoogle Scholar
  4. De Boni M, Manandhar S (2004) Implementing clarification dialogues in open domain question answering. Journal of Natural Language EngineeringGoogle Scholar
  5. Forner P, Pe˜nas, Agirre E, Alegrian I For˘ascu C, Moreau N, Osenova P, Prokopidis P, Rocha P, Sacaleanu B, Sutcliffe R, Tjong Kim Sang E (2009) Overview of the clef 2008 multilingual question answering track. In: CLEF’08: Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access, Springer-Verlag, Berlin, Heidelberg, pp 262–295Google Scholar
  6. Fukumoto J (2006) Answering questions of information access dialogue (iad) task using ellipsis handling of follow-up questions. In: Workshop on Interactive Question Answering, HLT-NAACL 06Google Scholar
  7. Fukumoto J, Niwa T, Itoigawa M, MatsudaM(2004) RitsQA: List answer detection and context task with ellipses handling. In: Working notes of the Fourth NTCIR Workshop Meeting, pp 310–314Google Scholar
  8. Galibert O, Illouz G, Rosset S (2005) Ritel: an open-domain, human-computer dialog system. In: Interspeech 2005, pp 909–912Google Scholar
  9. Gildea D, Palmer M (2001) The necessity of parsing for predicate argument recognition. In: Proceedings of the 40th Annual Meeting on Association for C omputational Linguistics, Philadelphia, Annual Meeting of the ACL, URL
  10. Hickl A,Wang P, Lehmann J, Harabagiu SM (2006) FERRET: Interactive questionanswering for real-world environments. In: ACL 2006, pp 25–28Google Scholar
  11. Hofs D, Theune M, Op den Akker R (2010) Natural interaction with a virtual guide in a virtual environment: A multimodal dialogue system. Journal on Multimodal User Interfaces 3 (1-2):141–153CrossRefGoogle Scholar
  12. Inui K, Yamashita A, Matsumoto Y (2003) Dialogue management for languagebased information seeking. In: Proc. First International Workshop on Language Understanding and Agents for Real World Interaction, pp 32–38Google Scholar
  13. Kato T, Fukumoto J, Masui F (2004) Question answering challenge for information access dialogue – overview of NTCIR4 QAC2 subtask 3. In: Working notes of the Fourth NTCIR Workshop MeetingGoogle Scholar
  14. Lappin S, Leass HJ (1994) An algorithm for pronominal anaphora resolution. Computational Linguistics 20(4):535–561, URL lappin94algorithm.htmlGoogle Scholar
  15. Lin CJ, Chen HH (2001) Description of NTU system at TREC-10 QA track. In: TREC 10Google Scholar
  16. Lin J, Quan D, Sinha V, Bakshi K, Huynh D, Katz B, Karger DR (2003) What makes a good answer? the role of context in question answering. In: Proceedings of the Ninth IFIP TC13 International Conference on Human-Computer Interaction (INTERACT-2003)Google Scholar
  17. Martin JC, Buisine S, Pitel G, Bernsen NO (2006) Fusion of children’s speech and 2D gestures when conversing with 3D characters. Special issue on multimodal interfaces of the Signal Processing journal 86(12):3596–3624zbMATHGoogle Scholar
  18. Oh JH, Lee KS, Chang DS, Seo CW, Choi KS (2001) Trec-10 experiments at kaist: Batch filtering and question answering. In: TRECGoogle Scholar
  19. Reithinger N, Bergweiler S, Engel R, Herzog G, Pfleger N, Romanelli M, Sonntag D (2005) A look under the hood: design and development of the first smartweb system demonstrator. In: ICMI ’05: Proceedings of the 7th international conference on Multimodal interfaces, ACM Press, New York, NY, USA, pp 159– 166, DOI
  20. van Schooten B, op den Akker R (2005) Follow-up utterances in QA dialogue. Traitement Automatique des Langues 46(3):181–206Google Scholar
  21. van Schooten B, op den Akker R (2007) Multimodal follow-up questions to multimodal answers in a QA system. In: Tenth international symposium on social communication, Universidad de Oriente Santiago de Cuba, pp 469–474Google Scholar
  22. van Schooten B, Rosset S, Galibert O, Max A, op den Akker R, Illouz G (2007) Handling speech input in the Ritel QA dialogue system. In: Interspeech 2007Google Scholar
  23. van Schooten B, op den Akker R, Rosset S, Galibert O, Max A, Illouz G (2009) Follow-up question handling in the IMIX and Ritel systems: a comparative study. JNLE 15(1):97–118Google Scholar
  24. Small S, Liu T, Shimizu N, Strzalkowski T (2003) HITIQA: an interactive question answering system: A preliminary report. In: Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question AnsweringGoogle Scholar
  25. Theune M, Krahmer E, van Schooten B, op den Akker R, van Hooijdonk C, Marsi E, Bosma W, Hofs D, Nijholt A (2007) Questions, pictures, answers: Introducing pictures in question-answering systems. In: Tenth international symposium on social communication, Universidad de Oriente Santiago de Cuba, pp 450–463Google Scholar
  26. Voorhees EM (2001) Overview of TREC 2001. In: TRECGoogle Scholar
  27. Voorhees EM (2005) Overview of the TREC 2005 question answering track. Tech. rep., NISTGoogle Scholar
  28. Wang D, Zhang J, Dai G (2006) A multimodal fusion framework for children’s storytelling systems. In: Edutainment, pp 585–588Google Scholar
  29. Willems DJM, Rossignol SYP, Vuurpijl LG (2005) Features for mode detection in natural online pen input. In: BIGS 2005: Proceedings of the 12th Biennial Conference of the International Graphonomics Society, pp 113–117Google Scholar
  30. Witten IH, Frank E (2005) Data Mining: Practical machine learning tools and techniques, 2nd Edition. Morgan KaufmannGoogle Scholar
  31. Yang F, Feng J, Di Fabbrizio G (2006) A data driven approach to relevancy recognition for contextual question answering. In: Workshop on Interactive Question Answering, HLT-NAACL 06, pp 33–40CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.University of TwenteEnschedeThe Netherlands

Personalised recommendations