Advertisement

Morphological Analysis for Russian: Integration and Comparison of Taggers

  • Elizaveta KuzmenkoEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 661)

Abstract

In this paper we present a comparison of three morphological taggers for Russian with regard to the quality of morphological disambiguation performed by these taggers. We test the quality of the analysis in three different ways: lemmatization, POS-tagging and assigning full morphological tags. We analyze the mistakes made by the taggers, outline their strengths and weaknesses, and present a possible way to improve the quality of morphological analysis for Russian.

Keywords

Morphological analysis Russian POS-tagging Gold standard 

Notes

Acknowledgments

I would like to thank Elmira Mustakimova, Svetlana Toldova and Timofey Arkhangelskiy for their participation in the project. I am also grateful to Mikhail Korobov for his valuable remarks on pymorphy performance and explanations of error causes (and for developing pymorphy, of course).

This article is an output of a research project implemented as part of the Basic Research Program at the National Research University Higher School of Economics (HSE).

References

  1. 1.
    Erjavec, T.: Multext-east version 3: multilingual morphosyntactic specifications, lexicons and corpora. In: LREC (2004)Google Scholar
  2. 2.
    Bocharov, V., Bichineva, S., Granovsky, D., Ostapuk, N., Stepanova, M.: Quality assurance tools in the opencorpora project. In: Proceeding of the International Conference on Computational Linguistics and Intelligent Technology, Dialog 2011, pp. 10–17 (2011)Google Scholar
  3. 3.
    Astaf’eva, I., Bonch-Osmolovskaya, A., Garejshina, A., Grishina, J., D’jachkov, V., Ionov, M., Koroleva, A., Kudrinsky, M., Lityagina, A., Luchina, E., et al.: NLP evaluation: Russian morphological parsers. In: Proceedings of Dialog Conference, Moscow, Russia (2010)Google Scholar
  4. 4.
    Segalovich, I.: A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In: MLMTA, Citeseer, pp. 273–280 (2003)Google Scholar
  5. 5.
    Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) Analysis of Images, Social Networks and Texts. Communications in Computer and Information Science, vol. 542, pp. 320–332. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  6. 6.
    Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: LREC2012 (2012)Google Scholar
  7. 7.
    Schmid, H.: Improvements in part-of-speech tagging with an application to German. In: Proceedings of the ACL SIGDAT-Workshop, Citeseer (1995)Google Scholar
  8. 8.
    Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, vol. 12, pp. 44–49. Citeseer (1994)Google Scholar
  9. 9.
    Sharoff, S., Kopotev, M., Erjavec, T., Feldman, A., Divjak, D.: Designing and evaluating a Russian tagset. In: LREC (2008)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.National Research University Higher School of EconomicsMoscowRussia

Personalised recommendations