Morpheme Level Word Embedding

Galinsky, Ruslan; Kovalenko, Tatiana; Yakovleva, Julia; Filchenkov, Andrey

doi:10.1007/978-3-319-71746-3_13

Ruslan Galinsky¹²,
Tatiana Kovalenko¹³,
Julia Yakovleva¹³ &
…
Andrey Filchenkov¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 789))

Included in the following conference series:

Conference on Artificial Intelligence and Natural Language

1382 Accesses
3 Citations

Abstract

Modern NLP tasks such as sentiment analysis, semantic analysis, text entity extraction and others depend on the language model quality. Language structure influences quality: a model that fits well the analytic languages for some NLP tasks, doesn’t fit well enough the synthetic languages for the same tasks. For example, a well known Word2Vec [27] model shows good results for the English language which is rather an analytic language than a synthetic one, but Word2Vec has some problems with synthetic languages due to their high inflection for some NLP tasks. Since every morpheme in synthetic languages provides some information, we propose to discuss morpheme level-model to solve different NLP tasks. We consider the Russian language in our experiments. Firstly, we describe how to build morpheme extractor from prepared vocabularies. Our extractor reached 91% accuracy on the vocabularies of known morpheme segmentation. Secondly we show the way how it can be applied for NLP tasks, and then we discuss our results, pros and cons, and our future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/TanyaKovalenko/Morpheme.

References

Dictionary of financial terms and economic concepts. http://www.fingramota.org/servisy/slovar. Accessed 28 Aug 2017
Great automotive dictionary. http://www.perfekt.ru/dictionaries/avto/s_rus.html. Accessed 28 Aug 2017
Medical dictionary. http://www.medslv.ru. Accessed 28 Aug 2017
Words segmentation to morphemes. http://www.sostavslova.ru. Accessed 28 Aug 2017
Words segmentation to morphemes. http://www.morphemeonline.ru. Accessed 28 Aug 2017
Kamchatov, A., et al.: Russian “drevoslov”. http://www.drevoslov.ru/wordcreation/morphem. Accessed 28 Aug 2017
Kutuzov, A., Kuzmenko, E.: Rusvectores: distributional semantic models for the Russian. http://www.rusvectores.org. Accessed 28 Aug 2017
Kuznetsova, A.: Efremova publisher=Springer T. Dictionary of the Morphemes of Russian
Google Scholar
Panchenko, A., Loukachevitch, N.V., Ustalov, D., Paperno, D., Meyer, C.M., Konstantinova, N.: Russe: the first workshop on russian semantic similarity. In: Proceedings of the International Conference on Computational Linguistics DIALOGUE, pp. 89–105 (2015)
Google Scholar
Safyanova, A.: The meaning of Latin morphemes. http://www.grammatika-rus.ru/znachenie-latinskih-morfem. Accessed 28 Aug 2017
Safyanova, A.: The meanings of prefixes in Russian. http://www.spelling.siteedit.ru/page51. Accessed 28 Aug 2017
Safyanova, A.: The meanings of suffixes in the Russian. http://www.spelling.siteedit.ru/page50. Accessed 28 Aug 2017
Tikhonov, A.: Morphology and spelling dictionary of the Russian language. Astrel, AST (2002)
Google Scholar
Do, C.B., Batzoglou, S.: What is the expectation maximization algorithm? Nature Biotechnol. 26, 897–899 (2008)
Article Google Scholar
Garshin, I.: Slavic roots of the Russian. http://www.slovorod.ru/slavic-roots. Accessed 28 Aug 2017
Leviant, I., Reichart, R.: Separated by an un-common language: towards judgment language informed vector space modeling. https://arxiv.org/abs/1508.00106. Accessed 28 Aug 2017
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representations (2014)
Google Scholar
Van Der Maaten, L., Hinton, G.: Visualizing data using t-sne (2008). http://jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf. Accessed 28 Aug 2017
Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using morfessor 1.0. Publications in Computer and Information Science and Report A81 and Helsinki University of Technology, March 2005
Google Scholar
Grunwald, P.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)
Google Scholar
Cotterell, R., Schutze, H.: Morphological word-embeddings. https://ryancotterell.github.io/papers/cotterell+schuetze.naacl15.pdf. Accessed 28 Aug 2017
Galinsky, R., Alekseev, A., Nikolenko, S.: Improving neural networks models for natural language processing in Russian with synonyms (2016)
Google Scholar
Aivazyan, S., Enukov, I., Meshalkin, L.: Applied Statistics: Basics of Modeling and Primary Data Processing. Finance and Statistics, Moscow (1983)
Google Scholar
Bordag, S.: Unsupervised knowledge-free morpheme boundary detection. http://nlp.cs.swarthmore.edu/~richardw/papers/bordag2005-unsupervised.pdf. Accessed 28 Aug 2017
Qiu, S., Cui, Q., Bian, J., Gao, B., Liu, T.-Y.: Co-learning of word representations and morpheme representations. https://www.aclweb.org/anthology/C14-1015. Accessed 28 Aug 2017
Virpioja, S., Smit, P., Gronroos, S.-A., Kurimo, M.: Morfessor 2.0: Python implementation and extensions for morfessor baseline. Aalto University publication series SCIENCE + TECHNOLOGY (2013)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. https://arxiv.org/pdf/1301.3781.pdf. Accessed 28 Aug 2017
Ling, W., Tiago, L., Marujo, L., Fernandez Astudillo, R., Amir, S., Dyer, C., Black, A.W., Trancoso, I.: Finding function in form: compositional character models for open vocabulary word representation. https://arxiv.org/abs/1508.02096. Accessed 28 Aug 2017
Xu, Y., Liu, J.: Implicitly incorporating morphological information into word embedding. https://arxiv.org/abs/1701.02481. Accessed 28 Aug 2017
Xiang, Z., Junbo, Z., Yann, L.: Character-level convolutional networks for text classification. http://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf. Accessed 28 Aug 2017

Download references

Author information

Authors and Affiliations

ITMO University, Kronverksky Pr. 49, St. Petersburg, Russia
Ruslan Galinsky & Andrey Filchenkov
Peter the Great St. Petersburg Polytechnic University, Polytechnicheskaya, 29, St. Petersburg, Russia
Tatiana Kovalenko & Julia Yakovleva

Authors

Ruslan Galinsky
View author publications
You can also search for this author in PubMed Google Scholar
Tatiana Kovalenko
View author publications
You can also search for this author in PubMed Google Scholar
Julia Yakovleva
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Filchenkov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruslan Galinsky .

Editor information

Editors and Affiliations

ITMO University, St. Petersburg, Russia
Andrey Filchenkov
University of Helsinki, Helsinki, Finland
Lidia Pivovarova
Mendel University , Brno, Czech Republic
Jan Žižka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Galinsky, R., Kovalenko, T., Yakovleva, J., Filchenkov, A. (2018). Morpheme Level Word Embedding. In: Filchenkov, A., Pivovarova, L., Žižka, J. (eds) Artificial Intelligence and Natural Language. AINL 2017. Communications in Computer and Information Science, vol 789. Springer, Cham. https://doi.org/10.1007/978-3-319-71746-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-71746-3_13
Published: 28 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71745-6
Online ISBN: 978-3-319-71746-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics