Skip to main content

Bi-LSTM Model for Morpheme Segmentation of Russian Words

  • Conference paper
  • First Online:
Artificial Intelligence and Natural Language (AINL 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1119))

Included in the following conference series:

Abstract

The paper addresses the task of automatic morpheme segmentation involving both splitting words into morphs and classification of resulted morphs. For segmentation of Russian words, a new model based on Bi-LSTM neural network is proposed and experimentally evaluated on several training data sets differing in labeling. The proposed model has comparable quality with the best supervised machine learning models for morpheme segmentation with classification, slightly outperforming them in word-level classification accuracy with score 89%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/AlexeySorokin/NeuralMorphemeSegmentation.

  2. 2.

    https://github.com/alesapin/GBDTMorphParsing.

  3. 3.

    https://github.com/alesapin/RussianMorphParsing.

  4. 4.

    https://github.com/fchollet/keras.

  5. 5.

    https://gihub.com/alesapin/XMorphy.

References

  1. Arefyev, N.V., Gratsianova, T.Y., Popov, K.P.: Morphological segmentation with sequence to sequence neural network. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference, Dialogue 2018, Moscow, pp. 82–91 (2018)

    Google Scholar 

  2. Bernhard, D.: Simple morpheme labelling in unsupervised morpheme analysis. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 873–880. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85760-0_112

    Chapter  Google Scholar 

  3. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  4. Bolshakov, I.A.: CrossLexica – Universum of links between Russian words. Bus. Inform. 3(25), 12–19 (2013). (in Russian)

    Google Scholar 

  5. Bolshakova, E.I., Sapin, A.S.: Comparing models of morpheme analysis for Russian words based on machine learning: In: Comput. Linguistics and Intellectual Technologies: Papers from the Annual Int. Conference, Dialogue 2019, Moscow, pp. 104–113 (2019)

    Google Scholar 

  6. Chollet, F.: Keras: Deep learning library for theano and tensorflow (2015). https://keras.io/

  7. Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Trans. Speech Lang. Process. 4(1), Article 3 (2007)

    Google Scholar 

  8. Harris, Z.S.: Morpheme boundaries within words: report on a computer test. Transform. Discourse Anal. Pap. 73, 68–77 (1967)

    Google Scholar 

  9. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  10. Maltina, L., Malafeev, A.: Automatic morphemic analysis of Russian words. In: CEUR Workshop, vol. 2268, pp. 85–94 (2018)

    Google Scholar 

  11. Ruokolainen, T., et al.: Painless semi-supervised morphological segmentation using conditional random fields. In: Proceedings of the 14th Conference of the European Chapter of the ACL, vol. 2: Short Papers, pp. 84–89 (2014)

    Google Scholar 

  12. Shao, Y.: Cross-lingual word segmentation and morpheme segmentation as sequence labelling. In: First Workshop on Multi-Language Processing in a Globalizing World, MLP 2017, Dublin, Ireland. arXiv preprint arXiv:1709.03759 (2017)

  13. Sorokin, A., Kravtsova, A.: Deep convolutional networks for supervised morpheme segmentation of Russian language. In: Ustalov, D., Filchenkov, A., Pivovarova, L., Žižka, J. (eds.) AINL 2018. CCIS, vol. 930, pp. 3–10. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01204-5_1

    Chapter  Google Scholar 

  14. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

    Google Scholar 

  15. Tikhonov, A.N.: Word Formation Dictionary of Russian language. Russkiy yazyk, Moscow (1990)

    Google Scholar 

Download references

Acknowledgements

We would like to thank the anonymous reviewers of our paper for their helpful and constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Sapin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bolshakova, E., Sapin, A. (2019). Bi-LSTM Model for Morpheme Segmentation of Russian Words. In: Ustalov, D., Filchenkov, A., Pivovarova, L. (eds) Artificial Intelligence and Natural Language. AINL 2019. Communications in Computer and Information Science, vol 1119. Springer, Cham. https://doi.org/10.1007/978-3-030-34518-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34518-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34517-4

  • Online ISBN: 978-3-030-34518-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics