Large-scale neural network models, including models for natural language processing, require large datasets that could be unavailable for low-resource languages or for special domains. We consider a way to approach the problem of poor variability and small size of available data for training NLP models based on augmenting the data with synonyms. We design a novel augmentation scheme that includes replacing words with synonyms, apply it to the Russian language and report improved results for the sentiment analysis task.
Similar content being viewed by others
References
N. Abramov, Dictionary of Russian Synonyms and Synonymous Phrases, Russkie Slovari, Moscow (1999).
Z. E. Alexandrova, Dictionary of Russian Synonyms, Russkii Yazyk, Moscow (2001).
Y. Bengio, R. Ducharme, and P. Vincent, A Neural Probabilistic Language Model, 3, 1137–1155 (2003).
Y. Bengio, H. Schwenk, J.-S. Senécal, F. Morin, and J.-L. Gauvain, “Neural probabilistic language models,” in: Innovations in Machine Learning, Springer (2006), pp. 137–186.
M. D. Bloice, Chr. Stocker, and A. Holzinger, “Augmentor: an image augmentation library for machine learning,” arXiv preprint arXiv:1708.04680 (2017).
J. A. Botha and Ph. Blunsom, “Compositional morphology for word representations and language modelling,” in: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China (2014), pp. 1899–1907.
P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai, “Class-based n-gram models of natural language,” Comput. Linguist., 18, No. 4, 467–479 (1992).
S. F. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” in: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (Stroudsburg, PA, USA), ACL ’96, Association for Computational Linguistics (1996), pp. 310–318.
F. Chollet, “Keras”, https://github.com/fchollet/keras (2015).
R. Cotterell, H. Schütze, and J. Eisner, “Morphological smoothing and extrapolation of word embeddings,” in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Vol. 1, Long Papers, ACL Berlin, Germany (2016).
Chr. Fellbaum (ed.), WordNet: an Electronic Lexical Database, MIT Press (1998).
R. Galinsky, A. Alekseev, and S. I. Nikolenko, “Improving neural network models for natural language processing in russian with synonyms,” in: Proc. 5th conference on Artificial Intelligence and Natural Language (2016), pp. 45–51.
Yoav Goldberg, “A primer on neural network models for natural language processing,” CoRR abs/1510.00726 (2015).
J. T. Goodman, “A bit of progress in language modeling,” Comput. Speech Lang., 15, No. 4, 403–434 (2001).
A. Graves, S. Fernández, and J. Schmidhuber, “Bidirectional LSTM networks for improved phoneme classification and recognition,” Artificial Neural Networks: Formal Models and Their Applications - ICANN 2005, 15th International Conference, Part II,Warsaw, Poland, (2005), pp. 799–804.
A. Graves and J. Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural Networks, 18, No. 5-6, 602–610 (2005).
J. Howard and S. Ruder, “Universal language model fine-tuning for text classification,” in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics Vol. 1, Long Papers (2018), pp. 328–339.
A. B. Jung, “imgaug,” https://github.com/aleju/imgaug (2018), [Online; accessed 30-Dec-2018].
K.Kann and H. Schütze, “Single-model encoder-decoder with explicit morphological representation for reinflection,” in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL Vol. 2, Short Papers, Berlin, Germany (2016).
D.P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR abs/1412.6980 (2014).
R. Kneser and H. Ney, “Improved backing-off for m-gram language modeling,” in: Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on, Vol. 1 (1995), pp. 181–184.
S. Kobayashi, “Contextual augmentation: Data augmentation by words with paradigmatic relations,” in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 2, Short Papers (New Orleans, Louisiana), Association for Computational Linguistics (2018), pp. 452–457.
M. Korobov, “Morphological analyzer and generator for russian and ukrainian languages,” in: Analysis of Images, Social Networks and Texts (M. Yu. Khachay, N. Konstantinova, A. Panchenko, D.I. Ignatov, and V.G. Labunets, eds.), Communications in Computer and Information Science, Vol. 542, Springer International Publishing (2015), pp. 320–332.
O. Kozlowa and A. Kutuzov, “Improving distributional semantic models using anaphora resolution during linguistic preprocessing,” in: Proceedings of International Conference on Computational Linguistics “Dialogue 2016” (2016).
Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks and applications in vision,” in: International Symposium on Circuits and Systems (ISCAS 2010), May 30 – June 2, 2010, Paris, France (2010), pp. 253–256.
W. Ling, Chr. Dyer, A. W. Black, I. Trancoso, R. Fermandez, S. Amir, L. Marujo, and T. Luis, “Finding function in form: Compositional character models for open vocabulary word representation,” in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (Lisbon, Portugal), Association for Computational Linguistics (2015), pp. 1520–1530.
N. Loukachevitch, M. Nokel, and K. Ivanov, “Combining thesaurus knowledge and probabilistic topic models,” in: International Conference on Analysis of Images, Social Networks and Texts, Springer (2017), pp. 59–71.
M.-Th. Luong, R. Socher, and Chr. D. Manning, Better Word Representations With Recursive Neural Networks for Morphology, CoNLL, Sofia, Bulgaria (2013).
V. Malykh, “Robust word vectors for russian language,” in: Proceedings of Artificial Intelligence and Natural Language AINL FRUCT 2016 Conference, St.Petersburg, Russia (2016), pp. 10–12.
V. Malykh, “Generalizable architecture for robust word vectors tested by noisy paraphrases,” in: Proc. of The 6th International Conference On Analysis of Images, Social Networks, and Texts (AIST) (2017).
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” CoRR abs/1301.3781 (2013).
T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, and S. Khudanpur, “Recurrent neural network based language model,” INTERSPEECH, 2, 3 (2010).
T. Mikolov, S. Kombrink, L. Burget, J. H. ˇCernockỳ, and S. Khudanpur, “Extensions of recurrent neural network language model,” in: Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, IEEE (2011), pp. 5528–5531.
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” CoRR abs/1310.4546 (2013).
G. A. Miller, “Wordnet: a lexical database for english,” Communications of the ACM, 38, No. 11, 39–41 (1995).
A. Mnih and G. E. Hinton, “A scalable hierarchical distributed language model,” in: Advances in Neural Information Processing Systems (2009), pp. 1081–1088.
M. Ranzato, G.E. Hinton, and Y. LeCun, “Guest editorial: Deep learning,” International J. Computer Vision, 113, No. 1, 1–2 (2015).
S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv preprint arXiv:1706.05098 (2017).
R. ”Sennrich, B. Haddow, and A.” Birch, “Edinburgh neural machine translation systems for WMT 16,” in: Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, ”Association for Computational Linguistics (2016), pp. 371–376.
V. Solovyev and V. Ivanov, “Knowledge-driven event extraction in russian: corpus-based linguistic resources,” Computational Intelligence and Neuroscience, 2016, 16 (2016).
R. Soricut and F. Och, “Unsupervised morphology induction using word embeddings,” in: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Denver, Colorado), Association for Computational Linguistics (2015), pp. 1627–1637.
E. Tutubalina and S. Nikolenko, “Constructing aspect-based sentiment lexicons with topic modeling,” in: International Conference on Analysis of Images, Social Networks and Texts, Springer (2016), pp. 208–220.
W. Y. Wang and D. Yang, “That’s so annoying!!!: A lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using #petpeeve tweets,” in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (Lisbon, Portugal), Association for Computational Linguistics (2015), pp. 2557–2563.
X. Wang, H. Pham, Z. Dai, and G. Neubig, “SwitchOut: an efficient data augmentation algorithm for neural machine translation,” in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics (2018), pp. 856–861.
Z. Xie, Wang S.I., J. Li, D. Lévy, A. Nie, D. Jurafsky, and A.Y. Ng, “Data noising as smoothing in neural network language models,” in: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net (2017).
X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” in: Advances in Neural Information Processing Systems 28 (C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, eds.), Curran Associates, Inc. (2015), pp. 649–657.
Author information
Authors and Affiliations
Corresponding author
Additional information
Published in Zapiski Nauchnykh Seminarov POMI, Vol. 499, 2021, pp. 206–221.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Galinsky, R.B., Alekseev, A.M. & Nikolenko, S.I. Improving Neural Models for Natural Language Processing in Russian with Synonyms. J Math Sci 273, 583–594 (2023). https://doi.org/10.1007/s10958-023-06520-z
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10958-023-06520-z