Abstract
The performance of sentiment analysis methods has greatly increased in recent years. This is due to the use of various models based on the Transformer architecture, in particular BERT. However, deep neural network models are difficult to train and poorly interpretable. An alternative approach is rule-based methods using sentiment lexicons. They are fast, require no training, and are well interpreted. But recently, due to the widespread use of deep learning, lexicon-based methods have receded into the background. The purpose of the article is to study the performance of the SO-CAL and SentiStrength lexicon-based methods, adapted for the Russian language. We have tested these methods, as well as the RuBERT neural network model, on 16 text corpora and have analyzed their results. RuBERT outperforms both lexicon-based methods on average, but SO-CAL surpasses RuBERT for four corpora out of 16.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
See for example: https://paperswithcode.com/task/sentiment-analysis.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
Given on the website: http://sentistrength.wlv.ac.uk.
- 9.
References
Belinkov, Y., Gehrmann, S., Pavlick, E.: Interpretability and analysis in neural NLP. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1–5 (2020)
Birjali, M., Kasri, M., Beni-Hssane, A.: A comprehensive survey on sentiment analysis: approaches, challenges and trends. Knowl.-Based Syst. 226, 107134 (2021)
Blinov, P.D., Klekovkina, M.V., Kotelnikov, E.V., Pestov, O.A.: Research of lexical approach and machine learning methods for sentiment analysis. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue”, vol. 12, no. 19, pp. 51–61 (2013)
Chen, Y., Skiena, S.: Building sentiment lexicons for all major languages. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 383–389 (2014)
Chetviorkin, I., Braslavskiy, P., Loukachevitch, N.: Sentiment Analysis Track at ROMIP 2011. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, vol. 2, pp. 1–14 (2012)
Chetviorkin, I.I., Loukachevitch, N.V.: Sentiment analysis track at ROMIP 2012. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, vol. 2, pp. 40–50 (2013)
De Smedt, T., Daelemans, W.: Pattern for Python. J. Mach. Learn. Res. 13, 2063–2067 (2012)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 7th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), pp. 4171–4186 (2019)
Golubev, A., Loukachevitch, N.: Transfer Learning for Improving results on Russian Sentiment Datasets. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, pp. 268–277 (2021)
Hutto, C.J., Gilbert, E.: VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the International AAAI Conference on Web and Social Media, pp. 216–225 (2014)
Koltsova, O.Y., Alexeeva, S.V., Kolcov, S.N.: An opinion word lexicon and a training dataset for russian sentiment analysis of social media. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, pp. 277–287 (2016)
Kotelnikov, E., Bushmeleva, N., Razova, E., Peskisheva, T., Pletneva, M.: Manually created sentiment lexicons: research and development. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, vol. 15(22), pp. 300–314 (2016)
Kotelnikov, E., Peskisheva, T., Kotelnikova, A., Razova, E.: A comparative study of publicly available russian sentiment lexicons. In: Ustalov, D., Filchenkov, A., Pivovarova, L., Žižka, J. (eds.) AINL 2018. CCIS, vol. 930, pp. 139–151. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01204-5_14
Kotelnikova, A., Kotelnikov, E.: SentiRusColl: Russian collocation lexicon for sentiment analysis. In: Ustalov, D., Filchenkov, A., Pivovarova, L. (eds.) AINL 2019. CCIS, vol. 1119, pp. 18–32. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34518-1_2
Kulagin, D.: Russian word sentiment polarity dictionary: a publicly available dataset. In: Artificial Intelligence and Natural Language. AINL 2019 (2019)
Kuratov, Y., Arkhipov, M.: Adaptation of deep bidirectional multilingual transformers for Russian language. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, pp. 333–340 (2019)
Kuznetsova, E.S., Chetviorkin, I.I., Loukachevitch, N.V.: Testing rules for sentiment analysis system. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, vol. 2, pp. 71–80 (2013)
Li, H.: Deep learning for natural language processing: advantages and challenges. Natl. Sci. Rev. 5(1), 24–26 (2018)
Loukachevitch, N.V., Rubtsova, Y.V.: SentiRuEval-2016: overcoming time gap and data sparsity in tweet sentiment analysis. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, pp. 416–426 (2016)
Loukachevitch, N., Levchik, A.: Creating a general Russian sentiment lexicon. In: Proceedings of Language Resources and Evaluation Conference (LREC), pp. 1171–1176 (2016)
Loukashevitch, N.V., Blinov, P.D., Kotelnikov, E.V., Rubtsova, Y.V., Ivanov, V.V., Tutubalina, E.V.: SentiRuEval: testing object-oriented sentiment analysis systems in Russian. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, vol. 2, pp. 2–13 (2015)
Mohammad, S.M., Turney, P.D.: Crowdsourcing a word-emotion association lexicon. Comput. Intell. 29(3), 436–465 (2013)
Pontiki, M., et al.: SemEval-2016 task 5: aspect based sentiment analysis. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval), pp. 19–30 (2016)
Rogers, A., Romanov, A., Rumshisky, A., Volkova, S., Gronas, M., Gribov, A.: RuSentiment: an enriched sentiment analysis dataset for social media in Russian. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 755–763 (2018)
Schmidt, T., Dangel, J., Wolff, C.: SentText: a tool for lexicon-based sentiment analysis in digital humanities. In: Proceedings of the 16th International Symposium of Information Science (ISI), pp. 156–172 (2021)
Smetanin, S.: The applications of sentiment analysis for Russian language texts: Current challenges and future perspectives. IEEE Access 8, 110693–110719 (2020)
Smetanin, S., Komarov, M.: Deep transfer learning baselines for sentiment analysis in Russian. Inf. Process. Manage. 58, 102484 (2021)
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1631–1642 (2013)
Sun, Z., Fan, C., Han, Q., Sun, X., Meng, Y., et al.: Self-explaining structures improve NLP models (2020). https://arxiv.org/abs/2012.01786
Taboada, M.: Sentiment Analysis: An Overview from Linguistics. Ann. Rev. Linguist. 2, 325–347 (2016)
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)
Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment strength detection in short informal text. J. Am. Soc. Inform. Sci. Technol. 61(12), 2544–2558 (2010)
Tutubalina, E.V.: Extraction and summarization methods for critical user reviews of a product. Ph.D. thesis, Kazan Federal University, Kazan, Russia (2016)
Vaswani, A., et al.: Attention is All you Need. In: Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS), vol. 30, pp. 5998–6008 (2017)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), vol. 32 (2019)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the 29th Conference on Neural Information Processing Systems (NeurIPS), vol. 28 (2015)
Acknowledgement
This research is financially supported by The Russian Science Foundation, Agreement №17–71-30029 with co-financing of Bank Saint Petersburg.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kotelnikova, A., Paschenko, D., Bochenina, K., Kotelnikov, E. (2022). Lexicon-Based Methods vs. BERT for Text Sentiment Analysis. In: Burnaev, E., et al. Analysis of Images, Social Networks and Texts. AIST 2021. Lecture Notes in Computer Science, vol 13217. Springer, Cham. https://doi.org/10.1007/978-3-031-16500-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-16500-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16499-6
Online ISBN: 978-3-031-16500-9
eBook Packages: Computer ScienceComputer Science (R0)