Open-Domain Question Answering Framework Using Wikipedia

  • Saleem Ameen
  • Hyunsuk Chung
  • Soyeon Caren Han
  • Byeong Ho Kang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9992)


This paper explores the feasibility of implementing a model for an open domain, automated question and answering framework that leverages Wikipedia’s knowledgebase. While Wikipedia implicitly comprises answers to common questions, the disambiguation of natural language and the difficulty of developing an information retrieval process that produces answers with specificity present pertinent challenges. However, observational analysis suggests that it is possible to discount the syntactical and lexical structure of a sentence in contexts where questions contain a specific target entity (words that identify a person, location or organisation) and that correspondingly query a property related to it. To investigate this, we implemented an algorithmic process that extracted the target entity from the question using CRF based named entity recognition (NER) and utilised all remaining words as potential properties. Using DBPedia, an ontological database of Wikipedia’s knowledge, we searched for the closest matching property that would produce an answer by applying standardised string matching algorithms including the Levenshtein distance, similar text and Dice’s coefficient. Our experimental results illustrate that using Wikipedia as a knowledgebase produces high precision for questions that contain a singular unambiguous entity as the subject, but lowered accuracy for questions where the entity exists as part of the object.


Open-domain Question answering Wikipedia 



This work was supported by the Industrial Strategic Technology Development Program, 10052955, Experiential Knowledge Platform Development Research for the Acquisition and Utilization of Field Expert Knowledge, funded by the Ministry of Trade, Industry & Energy (MI, Korea). This work was supported as part of the the Office of Naval ResearchgrantN62909-16-1-2219.


  1. 1.
    Grosz, B.J., et al.: TEAM: an experiment in the design of transportable natural-language interfaces. Artif. Intell. 32(2), 173–243 (1987)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Voorhees, E.M., Tice, D.M.: Building a question answering test collection. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2000)Google Scholar
  3. 3.
    Unger, C., et al.: Template-based question answering over RDF data. In: Proceedings of the 21st International Conference on World Wide Web. ACM (2012)Google Scholar
  4. 4.
    Kwiatkowski, T., et al.: Scaling semantic parsers with on-the-fly ontology matching. Association for Computational Linguistics (ACL) (2013)Google Scholar
  5. 5.
    Berant, J., et al.: Semantic parsing on freebase from question-answer Pairs. In: EMNLP (2013)Google Scholar
  6. 6.
    Cai, Q., Yates, A.: Large-scale semantic parsing via schema matching and Lexicon extension. ACL (1). Citeseer (2013)Google Scholar
  7. 7.
    Tsai, C., Yih, W., Burges, C.: Web-based question answering: revisiting AskMSR. Technical report MSR-TR-2015-20, Microsoft Research (2015)Google Scholar
  8. 8.
    Liang, P., Jordan, M.I., Klein, D.: Learning dependency-based compositional semantics. Comput. Linguist. 39(2), 389–446 (2013)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Fader, A., Zettlemoyer, L., Etzioni, O.: Open question answering over curated and extracted knowledge bases. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Saleem Ameen
    • 1
  • Hyunsuk Chung
    • 1
  • Soyeon Caren Han
    • 1
  • Byeong Ho Kang
    • 1
  1. 1.School of Engineering and ICTTasmaniaAustralia

Personalised recommendations