Advertisement

A Morphological Processor for Russian with Extended Functionality

  • Elena I. BolshakovaEmail author
  • Alexander S. Sapin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10716)

Abstract

The paper presents an open-source morphological processor of Russian texts recently developed and named CrossMorphy. The processor performs lemmatization, morphological tagging of both dictionary and non-dictionary words, contextual and non-contextual morphological disambiguation, generation of word forms, as well as morphemic parsing of words. Besides the extended functionality, emphasis is put on linguistic quality of word processing and easy integration into programming projects. CrossMorphy is fully implemented in C++ programming language on the base of OpenCorpora vocabulary data. To clarify the reasons of its development, a comparison of several freely available morphological processors for Russian is given, across their linguistic and some technological properties. The experimental evaluation shows that CrossMorphy ensures rather high quality of word processing.

Keywords

Morphological tagging Morphological parsers for Russian Functionality of morphological processors Morphological disambiguation Morphemic parsing 

Notes

Acknowledgements

We would like to thank the reviewers of our paper for their helpful comments.

References

  1. 1.
    Bernhard, D.: Simple morpheme labelling in unsupervised morpheme analysis. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 873–880. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-85760-0_112 CrossRefGoogle Scholar
  2. 2.
    Bocharov, V., Bichineva, S., Granovsky, D., Ostapuk, N., Stepanova, M.: Quality assurance tools in the opencorpora project. In: Computational Linguistics and Intelligent Technologies: Papers from the Annual International Conference “Dialogue” (2011)Google Scholar
  3. 3.
    Bolshakov, I.A.: CrossLexica, the universe of links between Russian words. In: Busyness Informatica, No. 3 (2013)Google Scholar
  4. 4.
    Bolshakova, E., Efremova, N., Noskov, A.: LSPL-patterns as a tool for information extraction from natural language texts. In: New Trends in Classification and Data Mining. Markov, K., et al. (eds.) ITHEA, Sofia, pp. 110–118 (2010)Google Scholar
  5. 5.
    Daciuk, J., Mihov, S., Watson, B., Watson, R.: Incremental construction of minimal acyclic finite state automata. Comput. Linguist. 26(1), 3–16 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Harris, Z.S.: Morpheme boundaries within words: report on a computer test. In: Transformations and Discourse Analysis Papers, vol. 73, pp. 68–77 (1970)Google Scholar
  7. 7.
    Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-26123-2_31 CrossRefGoogle Scholar
  8. 8.
    Kuzmenko, E.: Morphological analysis for Russian: integration and comparison of taggers. In: Ignatov, D.I., Khachay, M.Y., Labunets, V.G., Loukachevitch, N., Nikolenko, S.I., Panchenko, A., Savchenko, A.V., Vorontsov, K. (eds.) AIST 2016. CCIS, vol. 661, pp. 162–171. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-52920-2_16 CrossRefGoogle Scholar
  9. 9.
    Ljashevskaya, O., Astaf’eva, I., Bonch-Osmolovskaja, A., Garejshina, A., Grishina, J., D’jachkov, V., Ionov, M., Koroleva, A., Kudrinskij, M., Litjagina, A., Luchina, E., Sidorova, E., Toldova, S.: NLP evaluation: Russian morphological parsers. In: Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue”, pp. 318–326 (2010)Google Scholar
  10. 10.
    Muzychka, S.A., Romanenko, A.A., Piontkovskaja, I.I.: Conditional random field for morphological disambiguation in Russian. In.: Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue”, pp. 456–465 (2014)Google Scholar
  11. 11.
    Segalovich, I.A.: Fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In: MLMTA, pp. 273–280 (2003)Google Scholar
  12. 12.
    Shen, Q., Clothiaux, D., Tagtow, E., Littell, P., Dyer, C.: The role of context in neural morphological disambiguation. In: COLING 2016, 26th International Conference on Computational Linguistics. Proceedings of the Conference: Technical Papers, Osaka, Japan. ACL, pp. 181–191 (2016)Google Scholar
  13. 13.
    Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, pp. 44–49 (1994)Google Scholar
  14. 14.
    Smit, P., Virpioja, S., Gronroos, S., Kurimo, M.: Morfessor 2.0: toolkit for statistical morphological segmentation. In: Proceedings of the Demonstrations at the Conference of the European Chapter of the ACL, pp. 21–24 (2014)Google Scholar
  15. 15.
    Sorokin, A., Shavrina, T., Lyashevskaya, O., Bocharov, V., Alexeeva, S., Droganova, K., Fenogenova, A.: MorphoRuEval-2017: an evaluation track for the automatic morphological analysis methods for Russian. In: Computational Linguistics and Intellectual Technologies. Proceedings of International Conference Dialogue 2017, Moscow (2017)Google Scholar
  16. 16.
    Zaliznjak, A.A.: Grammatical Dictionary of Russian: Inflection. Russkij Jazyk Publisher, Moscow (1977)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Lomonosov Moscow State UniversityMoscowRussia
  2. 2.National Research University Higher School of EconomicsMoscowRussia

Personalised recommendations