Skip to main content
Log in

Improving Neural Models for Natural Language Processing in Russian with Synonyms

  • Published:
Journal of Mathematical Sciences Aims and scope Submit manuscript

Large-scale neural network models, including models for natural language processing, require large datasets that could be unavailable for low-resource languages or for special domains. We consider a way to approach the problem of poor variability and small size of available data for training NLP models based on augmenting the data with synonyms. We design a novel augmentation scheme that includes replacing words with synonyms, apply it to the Russian language and report improved results for the sentiment analysis task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. N. Abramov, Dictionary of Russian Synonyms and Synonymous Phrases, Russkie Slovari, Moscow (1999).

    Google Scholar 

  2. Z. E. Alexandrova, Dictionary of Russian Synonyms, Russkii Yazyk, Moscow (2001).

    Google Scholar 

  3. Y. Bengio, R. Ducharme, and P. Vincent, A Neural Probabilistic Language Model, 3, 1137–1155 (2003).

    Google Scholar 

  4. Y. Bengio, H. Schwenk, J.-S. Senécal, F. Morin, and J.-L. Gauvain, “Neural probabilistic language models,” in: Innovations in Machine Learning, Springer (2006), pp. 137–186.

  5. M. D. Bloice, Chr. Stocker, and A. Holzinger, “Augmentor: an image augmentation library for machine learning,” arXiv preprint arXiv:1708.04680 (2017).

  6. J. A. Botha and Ph. Blunsom, “Compositional morphology for word representations and language modelling,” in: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China (2014), pp. 1899–1907.

  7. P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai, “Class-based n-gram models of natural language,” Comput. Linguist., 18, No. 4, 467–479 (1992).

    Google Scholar 

  8. S. F. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” in: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (Stroudsburg, PA, USA), ACL ’96, Association for Computational Linguistics (1996), pp. 310–318.

  9. F. Chollet, “Keras”, https://github.com/fchollet/keras (2015).

  10. R. Cotterell, H. Schütze, and J. Eisner, “Morphological smoothing and extrapolation of word embeddings,” in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Vol. 1, Long Papers, ACL Berlin, Germany (2016).

  11. Chr. Fellbaum (ed.), WordNet: an Electronic Lexical Database, MIT Press (1998).

  12. R. Galinsky, A. Alekseev, and S. I. Nikolenko, “Improving neural network models for natural language processing in russian with synonyms,” in: Proc. 5th conference on Artificial Intelligence and Natural Language (2016), pp. 45–51.

  13. Yoav Goldberg, “A primer on neural network models for natural language processing,” CoRR abs/1510.00726 (2015).

  14. J. T. Goodman, “A bit of progress in language modeling,” Comput. Speech Lang., 15, No. 4, 403–434 (2001).

    Article  Google Scholar 

  15. A. Graves, S. Fernández, and J. Schmidhuber, “Bidirectional LSTM networks for improved phoneme classification and recognition,” Artificial Neural Networks: Formal Models and Their Applications - ICANN 2005, 15th International Conference, Part II,Warsaw, Poland, (2005), pp. 799–804.

  16. A. Graves and J. Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural Networks, 18, No. 5-6, 602–610 (2005).

    Article  Google Scholar 

  17. J. Howard and S. Ruder, “Universal language model fine-tuning for text classification,” in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics Vol. 1, Long Papers (2018), pp. 328–339.

  18. A. B. Jung, “imgaug,” https://github.com/aleju/imgaug (2018), [Online; accessed 30-Dec-2018].

  19. K.Kann and H. Schütze, “Single-model encoder-decoder with explicit morphological representation for reinflection,” in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL Vol. 2, Short Papers, Berlin, Germany (2016).

  20. D.P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR abs/1412.6980 (2014).

  21. R. Kneser and H. Ney, “Improved backing-off for m-gram language modeling,” in: Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on, Vol. 1 (1995), pp. 181–184.

  22. S. Kobayashi, “Contextual augmentation: Data augmentation by words with paradigmatic relations,” in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 2, Short Papers (New Orleans, Louisiana), Association for Computational Linguistics (2018), pp. 452–457.

  23. M. Korobov, “Morphological analyzer and generator for russian and ukrainian languages,” in: Analysis of Images, Social Networks and Texts (M. Yu. Khachay, N. Konstantinova, A. Panchenko, D.I. Ignatov, and V.G. Labunets, eds.), Communications in Computer and Information Science, Vol. 542, Springer International Publishing (2015), pp. 320–332.

  24. O. Kozlowa and A. Kutuzov, “Improving distributional semantic models using anaphora resolution during linguistic preprocessing,” in: Proceedings of International Conference on Computational Linguistics “Dialogue 2016” (2016).

  25. Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks and applications in vision,” in: International Symposium on Circuits and Systems (ISCAS 2010), May 30 – June 2, 2010, Paris, France (2010), pp. 253–256.

  26. W. Ling, Chr. Dyer, A. W. Black, I. Trancoso, R. Fermandez, S. Amir, L. Marujo, and T. Luis, “Finding function in form: Compositional character models for open vocabulary word representation,” in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (Lisbon, Portugal), Association for Computational Linguistics (2015), pp. 1520–1530.

  27. N. Loukachevitch, M. Nokel, and K. Ivanov, “Combining thesaurus knowledge and probabilistic topic models,” in: International Conference on Analysis of Images, Social Networks and Texts, Springer (2017), pp. 59–71.

    Google Scholar 

  28. M.-Th. Luong, R. Socher, and Chr. D. Manning, Better Word Representations With Recursive Neural Networks for Morphology, CoNLL, Sofia, Bulgaria (2013).

  29. V. Malykh, “Robust word vectors for russian language,” in: Proceedings of Artificial Intelligence and Natural Language AINL FRUCT 2016 Conference, St.Petersburg, Russia (2016), pp. 10–12.

  30. V. Malykh, “Generalizable architecture for robust word vectors tested by noisy paraphrases,” in: Proc. of The 6th International Conference On Analysis of Images, Social Networks, and Texts (AIST) (2017).

  31. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” CoRR abs/1301.3781 (2013).

  32. T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, and S. Khudanpur, “Recurrent neural network based language model,” INTERSPEECH, 2, 3 (2010).

  33. T. Mikolov, S. Kombrink, L. Burget, J. H. ˇCernockỳ, and S. Khudanpur, “Extensions of recurrent neural network language model,” in: Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, IEEE (2011), pp. 5528–5531.

  34. T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” CoRR abs/1310.4546 (2013).

  35. G. A. Miller, “Wordnet: a lexical database for english,” Communications of the ACM, 38, No. 11, 39–41 (1995).

    Article  Google Scholar 

  36. A. Mnih and G. E. Hinton, “A scalable hierarchical distributed language model,” in: Advances in Neural Information Processing Systems (2009), pp. 1081–1088.

  37. M. Ranzato, G.E. Hinton, and Y. LeCun, “Guest editorial: Deep learning,” International J. Computer Vision, 113, No. 1, 1–2 (2015).

    Article  MathSciNet  Google Scholar 

  38. S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv preprint arXiv:1706.05098 (2017).

  39. R. ”Sennrich, B. Haddow, and A.” Birch, “Edinburgh neural machine translation systems for WMT 16,” in: Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, ”Association for Computational Linguistics (2016), pp. 371–376.

  40. V. Solovyev and V. Ivanov, “Knowledge-driven event extraction in russian: corpus-based linguistic resources,” Computational Intelligence and Neuroscience, 2016, 16 (2016).

    Article  Google Scholar 

  41. R. Soricut and F. Och, “Unsupervised morphology induction using word embeddings,” in: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Denver, Colorado), Association for Computational Linguistics (2015), pp. 1627–1637.

  42. E. Tutubalina and S. Nikolenko, “Constructing aspect-based sentiment lexicons with topic modeling,” in: International Conference on Analysis of Images, Social Networks and Texts, Springer (2016), pp. 208–220.

    Google Scholar 

  43. W. Y. Wang and D. Yang, “That’s so annoying!!!: A lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using #petpeeve tweets,” in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (Lisbon, Portugal), Association for Computational Linguistics (2015), pp. 2557–2563.

  44. X. Wang, H. Pham, Z. Dai, and G. Neubig, “SwitchOut: an efficient data augmentation algorithm for neural machine translation,” in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics (2018), pp. 856–861.

  45. Z. Xie, Wang S.I., J. Li, D. Lévy, A. Nie, D. Jurafsky, and A.Y. Ng, “Data noising as smoothing in neural network language models,” in: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net (2017).

  46. X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” in: Advances in Neural Information Processing Systems 28 (C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, eds.), Curran Associates, Inc. (2015), pp. 649–657.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. M. Alekseev.

Additional information

Published in Zapiski Nauchnykh Seminarov POMI, Vol. 499, 2021, pp. 206–221.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Galinsky, R.B., Alekseev, A.M. & Nikolenko, S.I. Improving Neural Models for Natural Language Processing in Russian with Synonyms. J Math Sci 273, 583–594 (2023). https://doi.org/10.1007/s10958-023-06520-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10958-023-06520-z

Navigation