Morpheme Level Word Embedding

  • Ruslan GalinskyEmail author
  • Tatiana Kovalenko
  • Julia Yakovleva
  • Andrey Filchenkov
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 789)


Modern NLP tasks such as sentiment analysis, semantic analysis, text entity extraction and others depend on the language model quality. Language structure influences quality: a model that fits well the analytic languages for some NLP tasks, doesn’t fit well enough the synthetic languages for the same tasks. For example, a well known Word2Vec [27] model shows good results for the English language which is rather an analytic language than a synthetic one, but Word2Vec has some problems with synthetic languages due to their high inflection for some NLP tasks. Since every morpheme in synthetic languages provides some information, we propose to discuss morpheme level-model to solve different NLP tasks. We consider the Russian language in our experiments. Firstly, we describe how to build morpheme extractor from prepared vocabularies. Our extractor reached 91% accuracy on the vocabularies of known morpheme segmentation. Secondly we show the way how it can be applied for NLP tasks, and then we discuss our results, pros and cons, and our future work.


NLP Word2Vec Word embedding Morphemes Synthetic language Semantic analysis Sentiment analysis Text entity extraction Naive bayesian classifier 


  1. 1.
    Dictionary of financial terms and economic concepts. Accessed 28 Aug 2017
  2. 2.
    Great automotive dictionary. Accessed 28 Aug 2017
  3. 3.
    Medical dictionary. Accessed 28 Aug 2017
  4. 4.
    Words segmentation to morphemes. Accessed 28 Aug 2017
  5. 5.
    Words segmentation to morphemes. Accessed 28 Aug 2017
  6. 6.
    Kamchatov, A., et al.: Russian “drevoslov”. Accessed 28 Aug 2017
  7. 7.
    Kutuzov, A., Kuzmenko, E.: Rusvectores: distributional semantic models for the Russian. Accessed 28 Aug 2017
  8. 8.
    Kuznetsova, A.: Efremova publisher=Springer T. Dictionary of the Morphemes of RussianGoogle Scholar
  9. 9.
    Panchenko, A., Loukachevitch, N.V., Ustalov, D., Paperno, D., Meyer, C.M., Konstantinova, N.: Russe: the first workshop on russian semantic similarity. In: Proceedings of the International Conference on Computational Linguistics DIALOGUE, pp. 89–105 (2015)Google Scholar
  10. 10.
    Safyanova, A.: The meaning of Latin morphemes. Accessed 28 Aug 2017
  11. 11.
    Safyanova, A.: The meanings of prefixes in Russian. Accessed 28 Aug 2017
  12. 12.
    Safyanova, A.: The meanings of suffixes in the Russian. Accessed 28 Aug 2017
  13. 13.
    Tikhonov, A.: Morphology and spelling dictionary of the Russian language. Astrel, AST (2002)Google Scholar
  14. 14.
    Do, C.B., Batzoglou, S.: What is the expectation maximization algorithm? Nature Biotechnol. 26, 897–899 (2008)CrossRefGoogle Scholar
  15. 15.
    Garshin, I.: Slavic roots of the Russian. Accessed 28 Aug 2017
  16. 16.
    Leviant, I., Reichart, R.: Separated by an un-common language: towards judgment language informed vector space modeling. Accessed 28 Aug 2017
  17. 17.
    Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representations (2014)Google Scholar
  18. 18.
    Van Der Maaten, L., Hinton, G.: Visualizing data using t-sne (2008). Accessed 28 Aug 2017
  19. 19.
    Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using morfessor 1.0. Publications in Computer and Information Science and Report A81 and Helsinki University of Technology, March 2005Google Scholar
  20. 20.
    Grunwald, P.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)Google Scholar
  21. 21.
    Cotterell, R., Schutze, H.: Morphological word-embeddings. Accessed 28 Aug 2017
  22. 22.
    Galinsky, R., Alekseev, A., Nikolenko, S.: Improving neural networks models for natural language processing in Russian with synonyms (2016)Google Scholar
  23. 23.
    Aivazyan, S., Enukov, I., Meshalkin, L.: Applied Statistics: Basics of Modeling and Primary Data Processing. Finance and Statistics, Moscow (1983)Google Scholar
  24. 24.
    Bordag, S.: Unsupervised knowledge-free morpheme boundary detection. Accessed 28 Aug 2017
  25. 25.
    Qiu, S., Cui, Q., Bian, J., Gao, B., Liu, T.-Y.: Co-learning of word representations and morpheme representations. Accessed 28 Aug 2017
  26. 26.
    Virpioja, S., Smit, P., Gronroos, S.-A., Kurimo, M.: Morfessor 2.0: Python implementation and extensions for morfessor baseline. Aalto University publication series SCIENCE + TECHNOLOGY (2013)Google Scholar
  27. 27.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Accessed 28 Aug 2017
  28. 28.
    Ling, W., Tiago, L., Marujo, L., Fernandez Astudillo, R., Amir, S., Dyer, C., Black, A.W., Trancoso, I.: Finding function in form: compositional character models for open vocabulary word representation. Accessed 28 Aug 2017
  29. 29.
    Xu, Y., Liu, J.: Implicitly incorporating morphological information into word embedding. Accessed 28 Aug 2017
  30. 30.
    Xiang, Z., Junbo, Z., Yann, L.: Character-level convolutional networks for text classification. Accessed 28 Aug 2017

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Ruslan Galinsky
    • 1
    Email author
  • Tatiana Kovalenko
    • 2
  • Julia Yakovleva
    • 2
  • Andrey Filchenkov
    • 1
  1. 1.ITMO UniversitySt. PetersburgRussia
  2. 2.Peter the Great St. Petersburg Polytechnic UniversitySt. PetersburgRussia

Personalised recommendations