SentiRusColl: Russian Collocation Lexicon for Sentiment Analysis

  • Anastasia Kotelnikova
  • Evgeny KotelnikovEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1119)


Most sentiment lexicons include individual words rather than collocations. However, the use of collocations can improve the performance of sentiment analysis since the meaning of some collocations cannot be derived from the meaning of their constituents, for example, “ Open image in new window (“kick the bucket”) or “ Open image in new window (“it is impossible to take one’s eyes off something”). In our study, we create sentiment collocation lexicons for ten domains – reviews of books, movies, music, cars, computers, house appliances, phones, banks, hotels and restaurants. The lexicons are built on the basis of a semi-automatic approach using the corpora of reviews. What is more, we form a universal SentiRusColl lexicon with the help of union of created domain-oriented lexicons. We demonstrate the possibility of using the generated lexicon for various domains. In addition, we reveal the improved performance of sentiment analysis when union of SentiRusColl and existing lexicon – RuSentiLex – is used.


Sentiment lexicons Sentiment analysis Opinion mining Collocations 



The reported study was funded by the Ministry of Education and Science of the Russian Federation according to the research project No. 34.2092.2017/4.6.


  1. 1.
    Blinov, P.D., Klekovkina, M.V., Kotelnikov, E.V., Pestov, O.A.: Research of lexical approach and machine learning methods for sentiment analysis. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue 2013”, Bekasovo, Russia, vol. 12 (19), pp. 51–61 (2013)Google Scholar
  2. 2.
    Chetviorkin, I.I., Loukachevitch, N.V.: Extraction of Russian sentiment lexicon for product meta-domain. In: Proceedings of COLING 2012: Technical Papers, pp. 593–610 (2012)Google Scholar
  3. 3.
    Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)Google Scholar
  4. 4.
    Constant, M., et al.: Multiword expression processing: a survey. Comput. Lingu. 43(4), 837–892 (2017)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Dehkharghani, R.: Building phrase polarity lexicons for sentiment analysis. Int. J. Interact. Multimed. Artif. Intell. 5, 98–105 (2018)Google Scholar
  6. 6.
    Domeniconi, G., Moro, G., Pasolini, R., Sartori, C.: A comparison of term weighting schemes for text classification and sentiment analysis with a supervised variant of tf.idf. In: Proceedings of International Conference on Data Management Technologies and Applications, pp. 39–58 (2015)Google Scholar
  7. 7.
    Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)CrossRefGoogle Scholar
  8. 8.
    Kiritchenko, S., Mohammad, S.M.: Happy accident: a sentiment composition lexicon for opposing polarity phrases. In: Proceedings of LREC-2016, pp. 1157–1164 (2016)Google Scholar
  9. 9.
    Korayem, M., Aljadda, K., Crandall, D.: Sentiment/subjectivity analysis survey for languages other than English. Soc. Netw. Anal. Min. 6, 75 (2016)CrossRefGoogle Scholar
  10. 10.
    Kotelnikov, E., Peskisheva, T., Kotelnikova, A., Razova, E.: A comparative study of publicly available Russian sentiment lexicons. In: Ustalov, D., Filchenkov, A., Pivovarova, L., Žižka, J. (eds.) AINL 2018. CCIS, vol. 930, pp. 139–151. Springer, Cham (2018). Scholar
  11. 11.
    Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 721–735 (2009)CrossRefGoogle Scholar
  12. 12.
    Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)CrossRefGoogle Scholar
  13. 13.
    Cantos-Gómez, P., Almela-Sánchez, M. (eds.): Lexical Collocation Analysis. Advances and Applications. Springer, Cham (2018). Scholar
  14. 14.
    Liu, B.: Sentiment Analysis and Opinion Mining. Cambridge University Press, New York (2015)CrossRefGoogle Scholar
  15. 15.
    Loukachevitch, N., Dobrov, B.: RuThes linguistic ontology vs. Russian WordNets. In: Proceedings of the 7th Global WordNet Conference (GWC 2014), Tartu, Estonia, pp. 154–162 (2014)Google Scholar
  16. 16.
    Loukachevitch, N., Levchik, A.: Creating a general Russian sentiment lexicon. In: Proceedings of Language Resources and Evaluation Conference, LREC 2016, pp. 1171–1176 (2016)Google Scholar
  17. 17.
    Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  18. 18.
    Moreno-Ortiz, A., Pérez-Hernández, C., Del-Olmo, M.A.: Managing multiword expressions in a lexicon-based sentiment analysis system for Spanish. In: Proceedings of the 9th Workshop on Multiword Expressions (MWE 2013), Atlanta, Georgia, USA, pp. 1–10 (2013)Google Scholar
  19. 19.
    Mitkov, R., Monti, J., Pastor, G.C., Seretan, V. (eds.): Multiword Units in Machine Translation and Translation Technology. John Benjamins Publishing Company, Amsterdam (2018)Google Scholar
  20. 20.
    MacIntosh, C. (ed.): Oxford Collocations Dictionary for Students of English, 2nd edn. Oxford University Press, Oxford (2009)Google Scholar
  21. 21.
    Reinel, D., Scheidt, J., Henrich, A., Brucker, N.: Sentiment phrase generation using statistical methods. In: Proceedings of the 33rd Annual ACM Symposium on Applied Computing (SAC 2018), pp. 452–460. ACM, New York (2018)Google Scholar
  22. 22.
    Segalovich, I.: A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In: International Conference on Machine Learning: Models, Technologies and Applications (MLMTA 2003), pp. 273–280 (2003)Google Scholar
  23. 23.
    Sun, S., Luo, C., Chen, J.: A review of natural language processing techniques for opinion mining systems. Inf. Fusion 36, 10–25 (2017)CrossRefGoogle Scholar
  24. 24.
    Taboada, M.: Sentiment analysis: an overview from linguistics. Annu. Rev. Linguist. 2, 325–347 (2016)CrossRefGoogle Scholar
  25. 25.
    Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)CrossRefGoogle Scholar
  26. 26.
    Vu, P.M., Pham, H.V., Nguyen, T.T., Nguyen, T.T.: Phrase-based extraction of user opinions in mobile app reviews. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 726–731. ACM, New York (2016)Google Scholar
  27. 27.
    Williams, L., Bannister, C., Arribas-Ayllon, M., Preece, F., Spasic, I.: The role of idioms in sentiment analysis. Expert Syst. Appl. 42, 7375–7385 (2015)CrossRefGoogle Scholar
  28. 28.
    Yang, H.-L., Chao, A.F.: Sentiment analysis for Chinese reviews of movies in multi-genre based on morpheme-based features and collocations. Inf. Syst. Front. 17(6), 1335–1352 (2015)CrossRefGoogle Scholar
  29. 29.
    Zhou, J., Chen, B., Lin, Y.: An approach to constructing sentiment collocation dictionary for chinese short text based on Word2Vec. In: Huang, T.-C., Lau, R., Huang, Y.-M., Spaniol, M., Yuen, C.-H. (eds.) SETE 2017. LNCS, vol. 10676, pp. 548–556. Springer, Cham (2017). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Vyatka State UniversityKirovRussia

Personalised recommendations