Abstract
Modern NLP tasks such as sentiment analysis, semantic analysis, text entity extraction and others depend on the language model quality. Language structure influences quality: a model that fits well the analytic languages for some NLP tasks, doesn’t fit well enough the synthetic languages for the same tasks. For example, a well known Word2Vec [27] model shows good results for the English language which is rather an analytic language than a synthetic one, but Word2Vec has some problems with synthetic languages due to their high inflection for some NLP tasks. Since every morpheme in synthetic languages provides some information, we propose to discuss morpheme level-model to solve different NLP tasks. We consider the Russian language in our experiments. Firstly, we describe how to build morpheme extractor from prepared vocabularies. Our extractor reached 91% accuracy on the vocabularies of known morpheme segmentation. Secondly we show the way how it can be applied for NLP tasks, and then we discuss our results, pros and cons, and our future work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dictionary of financial terms and economic concepts. http://www.fingramota.org/servisy/slovar. Accessed 28 Aug 2017
Great automotive dictionary. http://www.perfekt.ru/dictionaries/avto/s_rus.html. Accessed 28 Aug 2017
Medical dictionary. http://www.medslv.ru. Accessed 28 Aug 2017
Words segmentation to morphemes. http://www.sostavslova.ru. Accessed 28 Aug 2017
Words segmentation to morphemes. http://www.morphemeonline.ru. Accessed 28 Aug 2017
Kamchatov, A., et al.: Russian “drevoslov”. http://www.drevoslov.ru/wordcreation/morphem. Accessed 28 Aug 2017
Kutuzov, A., Kuzmenko, E.: Rusvectores: distributional semantic models for the Russian. http://www.rusvectores.org. Accessed 28 Aug 2017
Kuznetsova, A.: Efremova publisher=Springer T. Dictionary of the Morphemes of Russian
Panchenko, A., Loukachevitch, N.V., Ustalov, D., Paperno, D., Meyer, C.M., Konstantinova, N.: Russe: the first workshop on russian semantic similarity. In: Proceedings of the International Conference on Computational Linguistics DIALOGUE, pp. 89–105 (2015)
Safyanova, A.: The meaning of Latin morphemes. http://www.grammatika-rus.ru/znachenie-latinskih-morfem. Accessed 28 Aug 2017
Safyanova, A.: The meanings of prefixes in Russian. http://www.spelling.siteedit.ru/page51. Accessed 28 Aug 2017
Safyanova, A.: The meanings of suffixes in the Russian. http://www.spelling.siteedit.ru/page50. Accessed 28 Aug 2017
Tikhonov, A.: Morphology and spelling dictionary of the Russian language. Astrel, AST (2002)
Do, C.B., Batzoglou, S.: What is the expectation maximization algorithm? Nature Biotechnol. 26, 897–899 (2008)
Garshin, I.: Slavic roots of the Russian. http://www.slovorod.ru/slavic-roots. Accessed 28 Aug 2017
Leviant, I., Reichart, R.: Separated by an un-common language: towards judgment language informed vector space modeling. https://arxiv.org/abs/1508.00106. Accessed 28 Aug 2017
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representations (2014)
Van Der Maaten, L., Hinton, G.: Visualizing data using t-sne (2008). http://jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf. Accessed 28 Aug 2017
Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using morfessor 1.0. Publications in Computer and Information Science and Report A81 and Helsinki University of Technology, March 2005
Grunwald, P.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)
Cotterell, R., Schutze, H.: Morphological word-embeddings. https://ryancotterell.github.io/papers/cotterell+schuetze.naacl15.pdf. Accessed 28 Aug 2017
Galinsky, R., Alekseev, A., Nikolenko, S.: Improving neural networks models for natural language processing in Russian with synonyms (2016)
Aivazyan, S., Enukov, I., Meshalkin, L.: Applied Statistics: Basics of Modeling and Primary Data Processing. Finance and Statistics, Moscow (1983)
Bordag, S.: Unsupervised knowledge-free morpheme boundary detection. http://nlp.cs.swarthmore.edu/~richardw/papers/bordag2005-unsupervised.pdf. Accessed 28 Aug 2017
Qiu, S., Cui, Q., Bian, J., Gao, B., Liu, T.-Y.: Co-learning of word representations and morpheme representations. https://www.aclweb.org/anthology/C14-1015. Accessed 28 Aug 2017
Virpioja, S., Smit, P., Gronroos, S.-A., Kurimo, M.: Morfessor 2.0: Python implementation and extensions for morfessor baseline. Aalto University publication series SCIENCE + TECHNOLOGY (2013)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. https://arxiv.org/pdf/1301.3781.pdf. Accessed 28 Aug 2017
Ling, W., Tiago, L., Marujo, L., Fernandez Astudillo, R., Amir, S., Dyer, C., Black, A.W., Trancoso, I.: Finding function in form: compositional character models for open vocabulary word representation. https://arxiv.org/abs/1508.02096. Accessed 28 Aug 2017
Xu, Y., Liu, J.: Implicitly incorporating morphological information into word embedding. https://arxiv.org/abs/1701.02481. Accessed 28 Aug 2017
Xiang, Z., Junbo, Z., Yann, L.: Character-level convolutional networks for text classification. http://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf. Accessed 28 Aug 2017
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Galinsky, R., Kovalenko, T., Yakovleva, J., Filchenkov, A. (2018). Morpheme Level Word Embedding. In: Filchenkov, A., Pivovarova, L., Žižka, J. (eds) Artificial Intelligence and Natural Language. AINL 2017. Communications in Computer and Information Science, vol 789. Springer, Cham. https://doi.org/10.1007/978-3-319-71746-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-71746-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71745-6
Online ISBN: 978-3-319-71746-3
eBook Packages: Computer ScienceComputer Science (R0)