Skip to main content

Morpheme Level Word Embedding

  • Conference paper
  • First Online:
Artificial Intelligence and Natural Language (AINL 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 789))

Included in the following conference series:

Abstract

Modern NLP tasks such as sentiment analysis, semantic analysis, text entity extraction and others depend on the language model quality. Language structure influences quality: a model that fits well the analytic languages for some NLP tasks, doesn’t fit well enough the synthetic languages for the same tasks. For example, a well known Word2Vec [27] model shows good results for the English language which is rather an analytic language than a synthetic one, but Word2Vec has some problems with synthetic languages due to their high inflection for some NLP tasks. Since every morpheme in synthetic languages provides some information, we propose to discuss morpheme level-model to solve different NLP tasks. We consider the Russian language in our experiments. Firstly, we describe how to build morpheme extractor from prepared vocabularies. Our extractor reached 91% accuracy on the vocabularies of known morpheme segmentation. Secondly we show the way how it can be applied for NLP tasks, and then we discuss our results, pros and cons, and our future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/TanyaKovalenko/Morpheme.

References

  1. Dictionary of financial terms and economic concepts. http://www.fingramota.org/servisy/slovar. Accessed 28 Aug 2017

  2. Great automotive dictionary. http://www.perfekt.ru/dictionaries/avto/s_rus.html. Accessed 28 Aug 2017

  3. Medical dictionary. http://www.medslv.ru. Accessed 28 Aug 2017

  4. Words segmentation to morphemes. http://www.sostavslova.ru. Accessed 28 Aug 2017

  5. Words segmentation to morphemes. http://www.morphemeonline.ru. Accessed 28 Aug 2017

  6. Kamchatov, A., et al.: Russian “drevoslov”. http://www.drevoslov.ru/wordcreation/morphem. Accessed 28 Aug 2017

  7. Kutuzov, A., Kuzmenko, E.: Rusvectores: distributional semantic models for the Russian. http://www.rusvectores.org. Accessed 28 Aug 2017

  8. Kuznetsova, A.: Efremova publisher=Springer T. Dictionary of the Morphemes of Russian

    Google Scholar 

  9. Panchenko, A., Loukachevitch, N.V., Ustalov, D., Paperno, D., Meyer, C.M., Konstantinova, N.: Russe: the first workshop on russian semantic similarity. In: Proceedings of the International Conference on Computational Linguistics DIALOGUE, pp. 89–105 (2015)

    Google Scholar 

  10. Safyanova, A.: The meaning of Latin morphemes. http://www.grammatika-rus.ru/znachenie-latinskih-morfem. Accessed 28 Aug 2017

  11. Safyanova, A.: The meanings of prefixes in Russian. http://www.spelling.siteedit.ru/page51. Accessed 28 Aug 2017

  12. Safyanova, A.: The meanings of suffixes in the Russian. http://www.spelling.siteedit.ru/page50. Accessed 28 Aug 2017

  13. Tikhonov, A.: Morphology and spelling dictionary of the Russian language. Astrel, AST (2002)

    Google Scholar 

  14. Do, C.B., Batzoglou, S.: What is the expectation maximization algorithm? Nature Biotechnol. 26, 897–899 (2008)

    Article  Google Scholar 

  15. Garshin, I.: Slavic roots of the Russian. http://www.slovorod.ru/slavic-roots. Accessed 28 Aug 2017

  16. Leviant, I., Reichart, R.: Separated by an un-common language: towards judgment language informed vector space modeling. https://arxiv.org/abs/1508.00106. Accessed 28 Aug 2017

  17. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representations (2014)

    Google Scholar 

  18. Van Der Maaten, L., Hinton, G.: Visualizing data using t-sne (2008). http://jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf. Accessed 28 Aug 2017

  19. Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using morfessor 1.0. Publications in Computer and Information Science and Report A81 and Helsinki University of Technology, March 2005

    Google Scholar 

  20. Grunwald, P.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)

    Google Scholar 

  21. Cotterell, R., Schutze, H.: Morphological word-embeddings. https://ryancotterell.github.io/papers/cotterell+schuetze.naacl15.pdf. Accessed 28 Aug 2017

  22. Galinsky, R., Alekseev, A., Nikolenko, S.: Improving neural networks models for natural language processing in Russian with synonyms (2016)

    Google Scholar 

  23. Aivazyan, S., Enukov, I., Meshalkin, L.: Applied Statistics: Basics of Modeling and Primary Data Processing. Finance and Statistics, Moscow (1983)

    Google Scholar 

  24. Bordag, S.: Unsupervised knowledge-free morpheme boundary detection. http://nlp.cs.swarthmore.edu/~richardw/papers/bordag2005-unsupervised.pdf. Accessed 28 Aug 2017

  25. Qiu, S., Cui, Q., Bian, J., Gao, B., Liu, T.-Y.: Co-learning of word representations and morpheme representations. https://www.aclweb.org/anthology/C14-1015. Accessed 28 Aug 2017

  26. Virpioja, S., Smit, P., Gronroos, S.-A., Kurimo, M.: Morfessor 2.0: Python implementation and extensions for morfessor baseline. Aalto University publication series SCIENCE + TECHNOLOGY (2013)

    Google Scholar 

  27. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. https://arxiv.org/pdf/1301.3781.pdf. Accessed 28 Aug 2017

  28. Ling, W., Tiago, L., Marujo, L., Fernandez Astudillo, R., Amir, S., Dyer, C., Black, A.W., Trancoso, I.: Finding function in form: compositional character models for open vocabulary word representation. https://arxiv.org/abs/1508.02096. Accessed 28 Aug 2017

  29. Xu, Y., Liu, J.: Implicitly incorporating morphological information into word embedding. https://arxiv.org/abs/1701.02481. Accessed 28 Aug 2017

  30. Xiang, Z., Junbo, Z., Yann, L.: Character-level convolutional networks for text classification. http://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf. Accessed 28 Aug 2017

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruslan Galinsky .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Galinsky, R., Kovalenko, T., Yakovleva, J., Filchenkov, A. (2018). Morpheme Level Word Embedding. In: Filchenkov, A., Pivovarova, L., Žižka, J. (eds) Artificial Intelligence and Natural Language. AINL 2017. Communications in Computer and Information Science, vol 789. Springer, Cham. https://doi.org/10.1007/978-3-319-71746-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-71746-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-71745-6

  • Online ISBN: 978-3-319-71746-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics