Skip to main content

A Morphological Processor for Russian with Extended Functionality

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10716))

  • 2263 Accesses

Abstract

The paper presents an open-source morphological processor of Russian texts recently developed and named CrossMorphy. The processor performs lemmatization, morphological tagging of both dictionary and non-dictionary words, contextual and non-contextual morphological disambiguation, generation of word forms, as well as morphemic parsing of words. Besides the extended functionality, emphasis is put on linguistic quality of word processing and easy integration into programming projects. CrossMorphy is fully implemented in C++ programming language on the base of OpenCorpora vocabulary data. To clarify the reasons of its development, a comparison of several freely available morphological processors for Russian is given, across their linguistic and some technological properties. The experimental evaluation shows that CrossMorphy ensures rather high quality of word processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://aot.ru/docs/rusmorph.html.

  2. 2.

    https://tech.yandex.ru/mystem/doc/.

  3. 3.

    http://corpus.leeds.ac.uk/mocky/.

  4. 4.

    http://pymorphy2.readthedocs.io/en/latest/index.html.

  5. 5.

    https://github.com/alesapin/XMorphy.

  6. 6.

    http://opencorpora.org.

  7. 7.

    http://lib.rus.ec/.

  8. 8.

    https://ru.wiktionary.org/wiki/.

  9. 9.

    http://ruscorpora.ru/.

  10. 10.

    http://ruscorpora.ru/.

  11. 11.

    http://universaldependencies.org/u/overview/morphology.html.

  12. 12.

    https://ru.wiktionary.org/wiki/.

References

  1. Bernhard, D.: Simple morpheme labelling in unsupervised morpheme analysis. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 873–880. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-85760-0_112

    Chapter  Google Scholar 

  2. Bocharov, V., Bichineva, S., Granovsky, D., Ostapuk, N., Stepanova, M.: Quality assurance tools in the opencorpora project. In: Computational Linguistics and Intelligent Technologies: Papers from the Annual International Conference “Dialogue” (2011)

    Google Scholar 

  3. Bolshakov, I.A.: CrossLexica, the universe of links between Russian words. In: Busyness Informatica, No. 3 (2013)

    Google Scholar 

  4. Bolshakova, E., Efremova, N., Noskov, A.: LSPL-patterns as a tool for information extraction from natural language texts. In: New Trends in Classification and Data Mining. Markov, K., et al. (eds.) ITHEA, Sofia, pp. 110–118 (2010)

    Google Scholar 

  5. Daciuk, J., Mihov, S., Watson, B., Watson, R.: Incremental construction of minimal acyclic finite state automata. Comput. Linguist. 26(1), 3–16 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  6. Harris, Z.S.: Morpheme boundaries within words: report on a computer test. In: Transformations and Discourse Analysis Papers, vol. 73, pp. 68–77 (1970)

    Google Scholar 

  7. Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26123-2_31

    Chapter  Google Scholar 

  8. Kuzmenko, E.: Morphological analysis for Russian: integration and comparison of taggers. In: Ignatov, D.I., Khachay, M.Y., Labunets, V.G., Loukachevitch, N., Nikolenko, S.I., Panchenko, A., Savchenko, A.V., Vorontsov, K. (eds.) AIST 2016. CCIS, vol. 661, pp. 162–171. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-52920-2_16

    Chapter  Google Scholar 

  9. Ljashevskaya, O., Astaf’eva, I., Bonch-Osmolovskaja, A., Garejshina, A., Grishina, J., D’jachkov, V., Ionov, M., Koroleva, A., Kudrinskij, M., Litjagina, A., Luchina, E., Sidorova, E., Toldova, S.: NLP evaluation: Russian morphological parsers. In: Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue”, pp. 318–326 (2010)

    Google Scholar 

  10. Muzychka, S.A., Romanenko, A.A., Piontkovskaja, I.I.: Conditional random field for morphological disambiguation in Russian. In.: Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue”, pp. 456–465 (2014)

    Google Scholar 

  11. Segalovich, I.A.: Fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In: MLMTA, pp. 273–280 (2003)

    Google Scholar 

  12. Shen, Q., Clothiaux, D., Tagtow, E., Littell, P., Dyer, C.: The role of context in neural morphological disambiguation. In: COLING 2016, 26th International Conference on Computational Linguistics. Proceedings of the Conference: Technical Papers, Osaka, Japan. ACL, pp. 181–191 (2016)

    Google Scholar 

  13. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, pp. 44–49 (1994)

    Google Scholar 

  14. Smit, P., Virpioja, S., Gronroos, S., Kurimo, M.: Morfessor 2.0: toolkit for statistical morphological segmentation. In: Proceedings of the Demonstrations at the Conference of the European Chapter of the ACL, pp. 21–24 (2014)

    Google Scholar 

  15. Sorokin, A., Shavrina, T., Lyashevskaya, O., Bocharov, V., Alexeeva, S., Droganova, K., Fenogenova, A.: MorphoRuEval-2017: an evaluation track for the automatic morphological analysis methods for Russian. In: Computational Linguistics and Intellectual Technologies. Proceedings of International Conference Dialogue 2017, Moscow (2017)

    Google Scholar 

  16. Zaliznjak, A.A.: Grammatical Dictionary of Russian: Inflection. Russkij Jazyk Publisher, Moscow (1977)

    Google Scholar 

Download references

Acknowledgements

We would like to thank the reviewers of our paper for their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elena I. Bolshakova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bolshakova, E.I., Sapin, A.S. (2018). A Morphological Processor for Russian with Extended Functionality. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2017. Lecture Notes in Computer Science(), vol 10716. Springer, Cham. https://doi.org/10.1007/978-3-319-73013-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73013-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73012-7

  • Online ISBN: 978-3-319-73013-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics