Russian-Language Question Classification: A New Typology and First Results

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10716)


This paper deals with automatic classification of questions in the Russian language, a natural early step in building a question answering system. We developed a typology of Russian questions using interrogative particles, pronouns and word order as the main features. A corpus of 2008 questions was manually compiled and annotated according to our typology. We used a fine-grained class set and a coarse-grained one (23 and 14 classes, respectively). The training data, represented as character bi-/trigrams and word uni-/bi-/trigrams, was used to approach the task of question classification. We tested several widely used machine-learning methods (logistic regression, support vector machines, naïve Bayes) against a regular expression baseline on a held-out test corpus annotated by an external expert. The best results were achieved by a SVM classifier (linear kernel) that achieved the accuracy of 65.3% (fine-grained) and 68.7% (coarse-grained), while the baseline regular expression model showed 52.7% accuracy.


Question answering Question answering systems QA systems Russian-language questions Question classification Question tagging Russian question typology 


  1. Bunescu, R., Huang, Y.: Towards a general model of answer typing: question focus identification. In: Proceedings of the 11th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2010), RCS Volume, pp. 231–242 (2010)Google Scholar
  2. Burger, J., Cardie C., Chaudhri V., Gaizauskas R., Harabagiu S., Israel D., Jacquemin, C., Lin, C.Y., Maiorano, S., Miller, G., Moldovan, D.: Issues, tasks and program structures to roadmap research in question & answering (Q&A). In: Document Understanding Conferences Roadmapping Documents, pp. 1–35 (2001)Google Scholar
  3. Damljanovic, D., Agatonovic, M., Cunningham, H.: Identification of the Question focus: combining syntactic analysis and ontology-based lookup through the user interaction. In: LREC (2010)Google Scholar
  4. Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A.A., Lally, A., Murdock, J.W., Nyberg, E., Prager, J., Schlaefer, N.: Building Watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)CrossRefGoogle Scholar
  5. Galea, A.: Open-domain surface-based question answering system. In: Proceedings of CSAW, vol. 3 (2003)Google Scholar
  6. Gobeill, J., Pasche, E., Teodoro, D., Veuthey, A.L., Ruch, P.: Answering gene ontology terms to proteomics questions by supervised macro reading in Medline. EMBnet Journal 18(B), 29–31 (2012)CrossRefGoogle Scholar
  7. Ittycheriah, A.: A statistical approach for open domain question answering. In: Strzalkowski, T., Harabagiu, S.M. (eds.) Advances in Open Domain Question Answering, vol. 32, pp. 35–69. Springer, Dordrecht (2008). CrossRefGoogle Scholar
  8. Katz, B., Borchardt, G.C., Felshin, S.: Natural language annotations for question answering. In: FLAIRS Conference, pp. 303–306 (2006)Google Scholar
  9. Klinkenberg, R. (ed.): RapidMiner: Data Mining Use Cases and Business Analytics Applications. Chapman and Hall/CRC, Boca Raton (2013)Google Scholar
  10. Lauer, T.W., Peacock, E., Graesser, A.C.: Questions and Information Systems. Psychology Press, Routledge (2013)Google Scholar
  11. Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics, Association for Computational Linguistics, vol. 1, pp. 1–7 (2002)Google Scholar
  12. Loni, B.: A survey of state-of-the-art methods on question classification. Literature survey, Published on TU Delft Repository (2011)Google Scholar
  13. Monz, C.: Document retrieval in the context of question answering. In: Sebastiani, F. (ed.) Advances in Information Retrieval, ECIR 2003. LNCS, vol. 2633, pp. 571–579. Springer, Heidelberg (2003a).
  14. Monz, C.: From Document Retrieval to Question Answering. Institute for Logic, Language and Computation (2003b)Google Scholar
  15. Mozgovoy, M.V.: A simple question-answering system based on a semantic analyzer for the Russian language [Prostaya voprosno-otvetnaya sistema na osnove semanticheskogo analizatora russkogo yazyka], Vestnik of the St. Petersburg University. Series 10. Applied mathematics. Informatics. Management processes [Vestnik SPbGU. Seriya 10. Prikladnaya matematika. Informatika. Protsessy upravleniya], no. 1, pp. 116–122 (2006)Google Scholar
  16. Nevolnikova, S.V.: Functional and semantic types of Russian interrogative sentences and their role in text formation [Funktsional’no-semanticheskie raznovidnosti russkikh voprositel’nykh predlozheniy i ikh rol’ v tekstoobrazovanii]. Rostov-on-Don (2004)Google Scholar
  17. Pereira, F., Mitchell, T., Botvinick, M.: Machine learning classifiers and fMRI: a tutorial overview. Neuroimage 45(1), S199–S209 (2009)CrossRefGoogle Scholar
  18. Pinchak, C., Lin, D.A.: Probabilistic Answer Type Model. In: EACL (2006)Google Scholar
  19. Sharoff, S.: Creating general-purpose corpora using automated search engine queries. In: WaCky, pp. 63–98 (2006)Google Scholar
  20. Shvedova, N.Y.: Russkaja Grammatika [Russian Grammar]. AN SSSR Publ, Moscow (1980)Google Scholar
  21. Silva, J., Coheur, L., Mendes, A.C., Wichert, A.: From symbolic to sub-symbolic information in question classification. Artif. Intell. Rev. 35(2), 137–154 (2011)CrossRefGoogle Scholar
  22. Solov’ev, A.A., Peskova, O.V.: Building a question-answering system for the Russian language: question analysis module [Postroenie voprosno-otvetnoy sistemy dlya russkogo yazyka: modul’ analiza voprosov], New information technologies in automated systems [Novye informatsionnye tekhnologii v avtomatizirovannykh sistemakh], no. 13, pp. 41–49 (2010)Google Scholar
  23. Sosnin, P.I.: Question-Answer Modeling in the Development of Automated Systems [Voprosno-otvetnoe modelirovanie v razrabotke avtomatizovannykh sistem]. Ul’yanovsk, USTU (2007)Google Scholar
  24. Suleymanov, D.S.: A study of the basic principles of building a semantic interpreter for questions and answers in natural language in AOS [Issledovanie bazovykh printsipov postroeniya semanticheskogo interpretatora voprosno-otvetnykh tekstov na estestvennom yazyke v AOS]. Educational technologies and society [Obrazovatel’nye tekhnologii i obshchestvo], no. 3, pp. 178–192 (2001)Google Scholar
  25. Tikhomirov, I.A.: Question-answering search in the intelligent search system Exactus [Voprosno-otvetnyy poisk v intellektual’noy poiskovoy sisteme Exactus]. In: Proceedings of the Fourth Russian Seminar on Evaluation of Information Retrieval Methods ROMIP [Trudy chetvertogo rossiyskogo seminara po otsenke metodov informatsionnogo poiska ROMIP], pp. 80–85 (2006)Google Scholar
  26. van Zaanen, M.: Multi-lingual Question Answering using OpenEphyra. CLEF (Working Notes) (2008)Google Scholar
  27. Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 26–32 (2003)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.National Research University Higher School of EconomicsNizhny NovgorodRussia

Personalised recommendations