AQA: Automatic Question Answering System for Czech
Question answering (QA) systems have become popular nowadays, however, a majority of them concentrates on the English language and most of them are oriented to a specific limited problem domain.
In this paper, we present a new question answering system called AQA (Automatic Question Answering). AQA is an open-domain QA system which allows users to ask all common questions related to a selected text collection. The first version of the AQA system is developed and tested for the Czech language, but we also plan to include more languages in future versions.
The AQA strategy consists of three main parts: question processing, answer selection and answer extraction. All modules are syntax-based with advanced scoring obtained by a combination of TF-IDF, tree distance between the question and candidate answers and other selected criteria. The answer extraction module utilizes named entity recognizer which allows the system to catch entities that are most likely to answer the question.
Evaluation of the AQA system is performed on a previously published Simple Question-Answering Database, or SQAD, with more than 3,000 question-answer pairs.
KeywordsQuestion Answering AQA Simple Question Answering Database SQAD Named entity recognition
This work has been partly supported by the Grant Agency of CR within the project 15-13277S. The research leading to these results has received funding from the Norwegian Financial Mechanism 2009–2014 and the Ministry of Education, Youth and Sports under Project Contract no. MSMT-28477/2014 within the HaBiT Project 7F14047.
- 2.Fader, A., Zettlemoyer, L., Etzioni, O.: Open question answering over curated and extracted knowledge bases. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1156–1165. ACM (2014)Google Scholar
- 3.Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, pp. 363–370. Association for Computational Linguistics, Stroudsburg (2005). http://dx.doi.org/10.3115/1219840.1219885
- 4.Horák, A., Medved’, M.: SQAD: simple question answering database. In: Eighth Workshop on Recent Advances in Slavonic Natural Language Processing, pp. 121–128. Tribun EU, Brno (2014)Google Scholar
- 5.Jakubíček, M., Kovář, V., Šmerk, P.: Czech morphological tagset revisited. In: Proceedings of Recent Advances in Slavonic Natural Language Processing 2011, pp. 29–42 (2011)Google Scholar
- 6.Kovář, V., Horák, A., Jakubíček, M.: Syntactic analysis using finite patterns: a new parsing system for Czech. In: Vetulani, Z. (ed.) LTC 2009. LNCS, vol. 6562, pp. 161–171. Springer, Heidelberg (2011)Google Scholar
- 7.Šmerk, P.: Towards morphological disambiguation of Czech (2007)Google Scholar
- 8.Ševčíková, M., Žabokrtský, Z., Straková, J., Straka, M.: Czech named entity corpus 1.1 (2014). http://hdl.handle.net/11858/00-097C-0000-0023-1B04-C, LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague
- 9.Shtok, A., Dror, G., Maarek, Y., Szpektor, I.: Learning from the past: answering new questions with past answers. In: Proceedings of the 21st International Conference on World Wide Web, pp. 759–768. ACM (2012)Google Scholar
- 10.Yih, W.T., He, X., Meek, C.: Semantic parsing for single-relation question answering. In: Proceedings of ACL 2014, vol. 2, pp. 643–648. Citeseer (2014)Google Scholar
- 11.Šmerk, P.: Fast morphological analysis of Czech. In: Proceedings of the RASLAN Workshop 2009, Brno (2009)Google Scholar