Advertisement

Morphological Analyzer and Generator for Russian and Ukrainian Languages

  • Mikhail KorobovEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 542)

Abstract

pymorphy2 is a morphological analyzer and generator for Russian and Ukrainian languages. It uses large efficiently encoded lexicons built from OpenCorpora and LanguageTool data. A set of linguistically motivated rules is developed to enable morphological analysis and generation of out-of-vocabulary words observed in real-world documents. For Russian pymorphy2 provides state-of-the-arts morphological analysis quality. The analyzer is implemented in Python programming language with optional C++ extensions. Emphasis is put on ease of use, documentation and extensibility. The package is distributed under a permissive open-source license, encouraging its use in both academic and commercial setting.

Keywords

Morphological analyzer Russian Ukrainian Morphological generator Open source OpenCorpora LanguageTool pymorphy2 pymorphy 

References

  1. 1.
    Astaf’eva, I., Bonch-Osmolovskaya, A., Garejshina, A., Ju, G., D’jachkov, V., Ionov, M., Koroleva, A., Kudrinsky, M., Lityagina, A., Luchina, E., Sidorova, E., Toldova, S., Lyashevskaya, O., Savchuk, S., Koval’, S.: NLP evaluation: Russian morphological parsers. In: Kibrik, A. (ed.) Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialouge”, vol. 1 (2010)Google Scholar
  2. 2.
    Bocharov, V.V., Granovsky, D.V., Surikov, A.V.: Probabilistic Tokenization model in the OpenCorpora project [Veroyatnastnaya model’ tokenizacii v proekte Otkritiy Korpus]. In: New Information Technology in Automated Systems: Proceedings of the 15th Seminar [Noviye informacionnie tehnologii v avtomatizirovannih sistemah: materiali pyatnadcatogo nauchno-prakticheskogo seminara] (2012)Google Scholar
  3. 3.
    Bocharov, V.V., Alexeeva, S.V., Granovsky, D.V., Protopopova, E.V., Stepanova, M.E., Surikov, A.V.: Crowdsourcing morphological annotation. In: Selegey, V. (ed.) Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue”, vol. 1 (2013)Google Scholar
  4. 4.
    Bolshakov, I.A., Bolshakova, E.I.: An automatic morphological classifier of noun phrases in Russian. In: Kibrik, A. (ed.) Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue”, vol. 1 (2012)Google Scholar
  5. 5.
    Daciuk, J., Watson, B.W., Mihov, S., Watson, R.E.: Incremental construction of minimal acyclic finite-state automata. Comput. Linguist. 26(1), 3–16 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Daciuk, J.: Treatment of unknown words. In: Boldt, O., Jürgensen, H. (eds.) WIA 1999. LNCS, vol. 2214, p. 71. Springer, Heidelberg (2001) CrossRefGoogle Scholar
  7. 7.
    Krylov, S.A., Starostin, S.A.: Current morphological analysis and synthesis challanges in the STARLING system [Aktualniye zadachi morfologicheskogo analiza i sinteza v integrirovannoy informacionnoy srede STARLING]. In: Proceedings of the International Conference “Dialog 2003” (2003)Google Scholar
  8. 8.
    Mikheev, A.: Automatic rule induction for unknown word guessing. Comput. Linguist. 23(3), 405–423 (1997)Google Scholar
  9. 9.
    Segalovich, I.: A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In: Proceedings of MLMTA 2003, Las Vegas (2003)Google Scholar
  10. 10.
    Sokirko, A.: Morphological modules on the web-site www.aot.ru [Morphologicheskie Moduli na saite www.aot.ru]. In: Computational Linguistics and Intelligent Technologies: Proceedings of the International Conference “Dialog 2004” (2004)
  11. 11.
    Yata, S., Morita, K., Fuketa, M., Aoe, J.: Fast string matching with space-efficient word graphs. In: Innovations in Information Technology (Innovations 2008), Al Ain, United Arab Emirates, pp. 79–83, December 2008Google Scholar
  12. 12.
    Zaliznjak, A.A.: Grammaticeskij slovar’ russkogo jazyka, Moscow, Russia (1977)Google Scholar
  13. 13.
    Zanegina, N.N.: Improvised-temporary-compounds as a new expressive mean in Russian. In: Kibrik, A. (ed.) Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue”, vol. 1 (2012)Google Scholar
  14. 14.
    Zipf, G.K.: Selected Studies of the Principle of Relative Frequency in Language. Harvard University Press, Cambridge (1932) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.ScrapingHub, Inc.YekaterinburgRussia

Personalised recommendations