Skip to main content

Lexicon-Based Methods vs. BERT for Text Sentiment Analysis

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2021)

Abstract

The performance of sentiment analysis methods has greatly increased in recent years. This is due to the use of various models based on the Transformer architecture, in particular BERT. However, deep neural network models are difficult to train and poorly interpretable. An alternative approach is rule-based methods using sentiment lexicons. They are fast, require no training, and are well interpreted. But recently, due to the widespread use of deep learning, lexicon-based methods have receded into the background. The purpose of the article is to study the performance of the SO-CAL and SentiStrength lexicon-based methods, adapted for the Russian language. We have tested these methods, as well as the RuBERT neural network model, on 16 text corpora and have analyzed their results. RuBERT outperforms both lexicon-based methods on average, but SO-CAL surpasses RuBERT for four corpora out of 16.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See for example: https://paperswithcode.com/task/sentiment-analysis.

  2. 2.

    https://github.com/sfu-discourse-lab/SO-CAL.

  3. 3.

    https://github.com/cjhutto/vaderSentiment.

  4. 4.

    https://github.com/clips/pattern.

  5. 5.

    https://textblob.readthedocs.io.

  6. 6.

    http://sentistrength.wlv.ac.uk.

  7. 7.

    https://thomasschmidtur.pythonanywhere.com.

  8. 8.

    Given on the website: http://sentistrength.wlv.ac.uk.

  9. 9.

    https://github.com/IlyaGusev/rnnmorph.

References

  1. Belinkov, Y., Gehrmann, S., Pavlick, E.: Interpretability and analysis in neural NLP. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1–5 (2020)

    Google Scholar 

  2. Birjali, M., Kasri, M., Beni-Hssane, A.: A comprehensive survey on sentiment analysis: approaches, challenges and trends. Knowl.-Based Syst. 226, 107134 (2021)

    Article  Google Scholar 

  3. Blinov, P.D., Klekovkina, M.V., Kotelnikov, E.V., Pestov, O.A.: Research of lexical approach and machine learning methods for sentiment analysis. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue”, vol. 12, no. 19, pp. 51–61 (2013)

    Google Scholar 

  4. Chen, Y., Skiena, S.: Building sentiment lexicons for all major languages. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 383–389 (2014)

    Google Scholar 

  5. Chetviorkin, I., Braslavskiy, P., Loukachevitch, N.: Sentiment Analysis Track at ROMIP 2011. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, vol. 2, pp. 1–14 (2012)

    Google Scholar 

  6. Chetviorkin, I.I., Loukachevitch, N.V.: Sentiment analysis track at ROMIP 2012. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, vol. 2, pp. 40–50 (2013)

    Google Scholar 

  7. De Smedt, T., Daelemans, W.: Pattern for Python. J. Mach. Learn. Res. 13, 2063–2067 (2012)

    Google Scholar 

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 7th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), pp. 4171–4186 (2019)

    Google Scholar 

  9. Golubev, A., Loukachevitch, N.: Transfer Learning for Improving results on Russian Sentiment Datasets. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, pp. 268–277 (2021)

    Google Scholar 

  10. Hutto, C.J., Gilbert, E.: VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the International AAAI Conference on Web and Social Media, pp. 216–225 (2014)

    Google Scholar 

  11. Koltsova, O.Y., Alexeeva, S.V., Kolcov, S.N.: An opinion word lexicon and a training dataset for russian sentiment analysis of social media. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, pp. 277–287 (2016)

    Google Scholar 

  12. Kotelnikov, E., Bushmeleva, N., Razova, E., Peskisheva, T., Pletneva, M.: Manually created sentiment lexicons: research and development. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, vol. 15(22), pp. 300–314 (2016)

    Google Scholar 

  13. Kotelnikov, E., Peskisheva, T., Kotelnikova, A., Razova, E.: A comparative study of publicly available russian sentiment lexicons. In: Ustalov, D., Filchenkov, A., Pivovarova, L., Žižka, J. (eds.) AINL 2018. CCIS, vol. 930, pp. 139–151. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01204-5_14

    Chapter  Google Scholar 

  14. Kotelnikova, A., Kotelnikov, E.: SentiRusColl: Russian collocation lexicon for sentiment analysis. In: Ustalov, D., Filchenkov, A., Pivovarova, L. (eds.) AINL 2019. CCIS, vol. 1119, pp. 18–32. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34518-1_2

    Chapter  Google Scholar 

  15. Kulagin, D.: Russian word sentiment polarity dictionary: a publicly available dataset. In: Artificial Intelligence and Natural Language. AINL 2019 (2019)

    Google Scholar 

  16. Kuratov, Y., Arkhipov, M.: Adaptation of deep bidirectional multilingual transformers for Russian language. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, pp. 333–340 (2019)

    Google Scholar 

  17. Kuznetsova, E.S., Chetviorkin, I.I., Loukachevitch, N.V.: Testing rules for sentiment analysis system. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, vol. 2, pp. 71–80 (2013)

    Google Scholar 

  18. Li, H.: Deep learning for natural language processing: advantages and challenges. Natl. Sci. Rev. 5(1), 24–26 (2018)

    Article  Google Scholar 

  19. Loukachevitch, N.V., Rubtsova, Y.V.: SentiRuEval-2016: overcoming time gap and data sparsity in tweet sentiment analysis. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, pp. 416–426 (2016)

    Google Scholar 

  20. Loukachevitch, N., Levchik, A.: Creating a general Russian sentiment lexicon. In: Proceedings of Language Resources and Evaluation Conference (LREC), pp. 1171–1176 (2016)

    Google Scholar 

  21. Loukashevitch, N.V., Blinov, P.D., Kotelnikov, E.V., Rubtsova, Y.V., Ivanov, V.V., Tutubalina, E.V.: SentiRuEval: testing object-oriented sentiment analysis systems in Russian. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog”, vol. 2, pp. 2–13 (2015)

    Google Scholar 

  22. Mohammad, S.M., Turney, P.D.: Crowdsourcing a word-emotion association lexicon. Comput. Intell. 29(3), 436–465 (2013)

    Article  MathSciNet  Google Scholar 

  23. Pontiki, M., et al.: SemEval-2016 task 5: aspect based sentiment analysis. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval), pp. 19–30 (2016)

    Google Scholar 

  24. Rogers, A., Romanov, A., Rumshisky, A., Volkova, S., Gronas, M., Gribov, A.: RuSentiment: an enriched sentiment analysis dataset for social media in Russian. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 755–763 (2018)

    Google Scholar 

  25. Schmidt, T., Dangel, J., Wolff, C.: SentText: a tool for lexicon-based sentiment analysis in digital humanities. In: Proceedings of the 16th International Symposium of Information Science (ISI), pp. 156–172 (2021)

    Google Scholar 

  26. Smetanin, S.: The applications of sentiment analysis for Russian language texts: Current challenges and future perspectives. IEEE Access 8, 110693–110719 (2020)

    Article  Google Scholar 

  27. Smetanin, S., Komarov, M.: Deep transfer learning baselines for sentiment analysis in Russian. Inf. Process. Manage. 58, 102484 (2021)

    Article  Google Scholar 

  28. Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1631–1642 (2013)

    Google Scholar 

  29. Sun, Z., Fan, C., Han, Q., Sun, X., Meng, Y., et al.: Self-explaining structures improve NLP models (2020). https://arxiv.org/abs/2012.01786

  30. Taboada, M.: Sentiment Analysis: An Overview from Linguistics. Ann. Rev. Linguist. 2, 325–347 (2016)

    Article  Google Scholar 

  31. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)

    Article  Google Scholar 

  32. Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment strength detection in short informal text. J. Am. Soc. Inform. Sci. Technol. 61(12), 2544–2558 (2010)

    Article  Google Scholar 

  33. Tutubalina, E.V.: Extraction and summarization methods for critical user reviews of a product. Ph.D. thesis, Kazan Federal University, Kazan, Russia (2016)

    Google Scholar 

  34. Vaswani, A., et al.: Attention is All you Need. In: Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS), vol. 30, pp. 5998–6008 (2017)

    Google Scholar 

  35. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), vol. 32 (2019)

    Google Scholar 

  36. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the 29th Conference on Neural Information Processing Systems (NeurIPS), vol. 28 (2015)

    Google Scholar 

Download references

Acknowledgement

This research is financially supported by The Russian Science Foundation, Agreement №17–71-30029 with co-financing of Bank Saint Petersburg.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Evgeny Kotelnikov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kotelnikova, A., Paschenko, D., Bochenina, K., Kotelnikov, E. (2022). Lexicon-Based Methods vs. BERT for Text Sentiment Analysis. In: Burnaev, E., et al. Analysis of Images, Social Networks and Texts. AIST 2021. Lecture Notes in Computer Science, vol 13217. Springer, Cham. https://doi.org/10.1007/978-3-031-16500-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16500-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16499-6

  • Online ISBN: 978-3-031-16500-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics