Abstract
The paper addresses the task of automatic morpheme segmentation involving both splitting words into morphs and classification of resulted morphs. For segmentation of Russian words, a new model based on Bi-LSTM neural network is proposed and experimentally evaluated on several training data sets differing in labeling. The proposed model has comparable quality with the best supervised machine learning models for morpheme segmentation with classification, slightly outperforming them in word-level classification accuracy with score 89%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arefyev, N.V., Gratsianova, T.Y., Popov, K.P.: Morphological segmentation with sequence to sequence neural network. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference, Dialogue 2018, Moscow, pp. 82–91 (2018)
Bernhard, D.: Simple morpheme labelling in unsupervised morpheme analysis. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 873–880. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85760-0_112
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Bolshakov, I.A.: CrossLexica – Universum of links between Russian words. Bus. Inform. 3(25), 12–19 (2013). (in Russian)
Bolshakova, E.I., Sapin, A.S.: Comparing models of morpheme analysis for Russian words based on machine learning: In: Comput. Linguistics and Intellectual Technologies: Papers from the Annual Int. Conference, Dialogue 2019, Moscow, pp. 104–113 (2019)
Chollet, F.: Keras: Deep learning library for theano and tensorflow (2015). https://keras.io/
Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Trans. Speech Lang. Process. 4(1), Article 3 (2007)
Harris, Z.S.: Morpheme boundaries within words: report on a computer test. Transform. Discourse Anal. Pap. 73, 68–77 (1967)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Maltina, L., Malafeev, A.: Automatic morphemic analysis of Russian words. In: CEUR Workshop, vol. 2268, pp. 85–94 (2018)
Ruokolainen, T., et al.: Painless semi-supervised morphological segmentation using conditional random fields. In: Proceedings of the 14th Conference of the European Chapter of the ACL, vol. 2: Short Papers, pp. 84–89 (2014)
Shao, Y.: Cross-lingual word segmentation and morpheme segmentation as sequence labelling. In: First Workshop on Multi-Language Processing in a Globalizing World, MLP 2017, Dublin, Ireland. arXiv preprint arXiv:1709.03759 (2017)
Sorokin, A., Kravtsova, A.: Deep convolutional networks for supervised morpheme segmentation of Russian language. In: Ustalov, D., Filchenkov, A., Pivovarova, L., Žižka, J. (eds.) AINL 2018. CCIS, vol. 930, pp. 3–10. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01204-5_1
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Tikhonov, A.N.: Word Formation Dictionary of Russian language. Russkiy yazyk, Moscow (1990)
Acknowledgements
We would like to thank the anonymous reviewers of our paper for their helpful and constructive comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Bolshakova, E., Sapin, A. (2019). Bi-LSTM Model for Morpheme Segmentation of Russian Words. In: Ustalov, D., Filchenkov, A., Pivovarova, L. (eds) Artificial Intelligence and Natural Language. AINL 2019. Communications in Computer and Information Science, vol 1119. Springer, Cham. https://doi.org/10.1007/978-3-030-34518-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-34518-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34517-4
Online ISBN: 978-3-030-34518-1
eBook Packages: Computer ScienceComputer Science (R0)