Advertisement

A Close Look at Russian Morphological Parsers: Which One Is the Best?

  • Evgeny KotelnikovEmail author
  • Elena Razova
  • Irina Fishcheva
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 789)

Abstract

This article presents a comparative study of four morphological parsers of Russian – mystem, pymorphy2, TreeTagger, and FreeLing – involving the two main tasks of morphological analysis: lemmatization and POS tagging. The experiments were conducted on three currently available Russian corpora which have qualitative morphological labeling – Russian National Corpus, OpenCorpora, and RU-EVAL (a small corpus created in 2010 to evaluate parsers). As evaluation measures, the authors use accuracy for lemmatization and F1-measure for POS tagging. The authors give error analysis, identify the most difficult parts of speech for the parsers, and analyze the work of parsers on dictionary words and predicted words.

Keywords

Morphological analysis POS tagging Comparison of parsers Text corpora 

Notes

Acknowledgments

The reported study was funded by RFBR according to research project No. 16-07-00342a.

References

  1. 1.
    AOT – Avtomaticheskaja obrabotka teksta (Automatic processing of text). http://www.aot.ru. Accessed 20 June 2017
  2. 2.
    Bocharov, V., Granovsky, D., Bichineva, S., Ostapuk, N., Stepanova, M.: Quality assurance tools in the OpenCorpora project. In: Proceedings of Dialogue-2011, pp. 109–114 (2011)Google Scholar
  3. 3.
    Kuznetsov, S.A. (ed.): Bol’shoj tolkovyj slovar’ russkogo jazyka (The Large Explanatory Dictionary of the Russian Language). Norint, Saint-Petersburg (1998)Google Scholar
  4. 4.
    Brants, T.: TnT: a statistical part-of-speech tagger. In: Proceedings of the 6th Conference on Applied Natural Language Processing (ANLP), pp. 224–231 (2000)Google Scholar
  5. 5.
    Dereza, O.V., Kayutenko, D.A., Fenogenova, A.S.: Automatic morphological analysis for Russian: a comparative study. In: Proceedings of Student Session of Dialogue-2016 (2016)Google Scholar
  6. 6.
    Jongejan, B., Dalianis, H.: Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNL, vol. 1, pp. 145–153 (2009)Google Scholar
  7. 7.
    Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Prentice-Hall Inc., Upper Saddle River (2009)Google Scholar
  8. 8.
    Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. In: Proceedings of 4th International Conference on Analysis of Images, Social Networks and Texts (AIST-2015), pp. 320–332 (2015)Google Scholar
  9. 9.
    Kuzmenko, E.: Morphological analysis for Russian: integration and comparison of taggers. In: Proceedings of 5th International Conference on Analysis of Images, Social Networks and Texts (AIST-2016), pp. 162–171 (2016)Google Scholar
  10. 10.
    Leont’eva, N.N.: Avtomaticheskoe ponimanie tekstov: sistemy, modeli, resursy (Automatic text comprehension: systems, models, resources). Akademia Publ., Moscow (2006)Google Scholar
  11. 11.
    Lyashevskaya, O., Astaf’eva, I., Bonch-Osmolovskaya, A., Garejshina, A., Grishina, J., D’jachkov, V., Ionov, M., Koroleva, A., Kudrinsky, M., Lityagina, A., Luchina, E., Sidorova, E., Toldova, S., Savchuk, S., Koval’, S.: NLP evaluation: Russian morphological parsers. In: Proceedings of Dialogue-2010, pp. 318–326 (2010)Google Scholar
  12. 12.
    Padró, L., Stanilovsky, E.: FreeLing 3.0: towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2012), pp. 2473–2479. ELRA. Istanbul, Turkey (2012)Google Scholar
  13. 13.
    Padró, L.: A hybrid environment for syntax-semantic tagging. PhD thesis, Dept. Llenguatges i Sistemes Informàtics. Universitat Politècnica de Catalunya (1998)Google Scholar
  14. 14.
    POS Tagging (State of the art). https://www.aclweb.org/aclwiki/index.php?title=POS_Tagging_(State_of_the_art). Accessed 20 June 2017
  15. 15.
    Nikolaev, I.S., Mitrenina, O.V., Lando, T.M. (eds.): Prikladnaja i komp’juternaja lingvistika (Applied and computational linguistics). Lenand, Moscow (2016)Google Scholar
  16. 16.
    Russian National Corpus. http://www.ruscorpora.ru. Accessed 20 June 2017
  17. 17.
    Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, pp. 44–49, Manchester, UK (1994)Google Scholar
  18. 18.
    Segalovich, I.: A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In: Proceedings of MLMTA-2003, pp. 273–280 (2003)Google Scholar
  19. 19.
    Sharoff, S., Kopotev, M., Erjavec, T., Feldman, A., Divjak, D.: Designing and evaluating Russian tagsets. In: Proceedings of LREC-2008, pp. 279–285, Marrakech (2008)Google Scholar
  20. 20.
    Sharoff, S., Nivre, J.: The proper place of men and machines in language technology: processing Russian without any linguistic knowledge. In: Proceedings of Dialogue-2011, pp. 657–670 (2011)Google Scholar
  21. 21.
    Ushakov, D.N. (ed.): Tolkovyj slovar’ russkogo jazyka (Dictionary of the Russian Language). Four volumes. State Publishing House of Foreign and National Dictionaries, Moscow (1940)Google Scholar
  22. 22.
    Zaliznyak, A.A.: Grammaticheskij slovar’ russkogo jazyka (Russian Grammar Dictionary), Moscow (1977)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Evgeny Kotelnikov
    • 1
    Email author
  • Elena Razova
    • 1
  • Irina Fishcheva
    • 1
  1. 1.Vyatka State UniversityKirovRussia

Personalised recommendations