Sentiment Classification in Multiple Languages: Fifty Shades of Customer Opinions

  • Tomáš Kincl
  • Michal Novák
  • Jiří Přibil
Conference paper
Part of the Eurasian Studies in Business and Economics book series (EBES, volume 2/2)


Sentiment analysis is a natural language processing task where the goal is to classify the sentiment polarity of the expressed opinions, although the aim to achieve the highest accuracy in sentiment classification for one particular language, does not truly reflect the needs of business. Sentiment analysis is often used by multinational companies operating on multiple markets. Such companies are interested in consumer opinions about their products and services in different countries (thus in different languages). However, most of the research in multi-language sentiment classification simply utilizes automated translation from minor languages to English (and then conducting sentiment analysis for English). This paper aims to contribute to the multi-language sentiment classification problem and proposes a language independent approach which could provide a good level of classification accuracy in multiple languages without using automated translations or language-dependent components (i.e. lexicons). The results indicate that the proposed approach could provide a high level of sentiment classification accuracy, even for multiple languages and without the language dependent components.


Sentiment analysis Language-independent Customer reviews Opinion mining Marketing 


  1. Aisopos, F., Papadakis, G., Tserpes, K., & Varvarigou, T. (2012). Content vs. context for sentiment analysis: A comparative analysis over microblogs. In: Proceedings of the 23rd ACM Conference on Hypertext and Social Media, Milwaukee, WI, USA, June 25–28, 2012 (pp. 187–196). New York, NY: ACM.Google Scholar
  2. Aldred, J., Astell, A., Behr, R., Cochrane, L., Hind, J., Pickard, A., Potter, L., Wignall, A., & Wiseman, E. (2008). The world’s 50 most powerful blogs. The Guardian [online]. Accessed April 6, 2013, from
  3. Anderson, E. W. (1998). Customer satisfaction and word of mouth. Journal of Service Research, 1(1), 5–17.CrossRefGoogle Scholar
  4. Anon. (n.d.a). Ähnliche Wörter Englisch–Deutsch. Wiktionary [online]. Accessed August 19, 2014, from
  5. Anon. (n.d.b). English-French relations. Wiktionary [online]. Accessed August 19, 2014, from
  6. Aue, A., & Gamon, M. (2005). Customizing sentiment classifiers to new domains: A case study. In Proceedings of the Recent Advances in Natural Language Processing RANLP 2005, Borovets, Bulgaria, September 21–23, 2005 (pp. 1–7). Microsoft Research.Google Scholar
  7. Banea, C., Mihalcea, R., & Wiebe, J. (2010). Multilingual subjectivity: Are more languages better? In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, August 23–27, 2010 (pp. 28–36). Association for Computational Linguistics.Google Scholar
  8. Berns, M., De Bot, K., & Hasebrink, U. (2007). In the presence of English: Media and European Youth. Berlin: Springer.CrossRefGoogle Scholar
  9. Blamey, B., Crick, T., & Oatley, G. (2012). RU:-) or:-(? character-vs. word-gram feature selection for sentiment classification of OSN corpora. In Proceedings of the 32nd SGAI International Conference on Artificial Intelligence, Cambridge, UK, December 11–13, 2012 (pp. 207–212). Springer.Google Scholar
  10. Brooke, J., Tofiloski, M., & Taboada, M. (2009). Cross-linguistic sentiment analysis: From English to Spanish. In Proceedings of the Recent Advances in Natural Language Processing RANLP 2005, Borovets, Bulgaria, September 14–16, 2009, pp. 50–54.Google Scholar
  11. Cambria, E., Schuller, B., Xia, Y., & Havasi, C. (2013). New avenues in opinion mining and sentiment analysis. Intelligent Systems, 28(2), 15–21.CrossRefGoogle Scholar
  12. Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 1–39.CrossRefADSGoogle Scholar
  13. Comcowich, W. J. (2010). Media monitoring: The complete guide. CyberAlert [online]. Accessed August 8, 2013, from
  14. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.zbMATHGoogle Scholar
  15. Escalante, H. J., Solorio, T., & Montes-Y-Gómez, M. (2011). Local histograms of character n-grams for authorship attribution. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, Portland, OR, June 19–24, 2011 (pp. 288–298). Association for Computational Linguistics.Google Scholar
  16. Goldenberg, J., Libai, B., & Muller, E. (2001). Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing Letters, 12(3), 211–223.CrossRefGoogle Scholar
  17. Habernal, I., Ptácek, T., & Steinberger, J. (2013). Sentiment analysis in Czech social media using supervised machine learning. In: Proceedings of the Fourth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Atlanta, GA, June 14, 2013, pp. 65–74.Google Scholar
  18. Horrigan, J. B. (2008). Online shopping. Pew Internet & American Life Project [online]. Washington, DC. Accessed August 8, 2014, from
  19. Kanaris, I., Kanaris, K., Houvardas, I., & Stamatatos, E. (2007). Words versus character n-grams for anti-spam filtering. International Journal on Artificial Intelligence Tools, 16(06), 1047–1067.CrossRefGoogle Scholar
  20. Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167.CrossRefGoogle Scholar
  21. Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, Portland, OR, June 19–24, 2011 (pp. 142–150). Association for Computational Linguistics.Google Scholar
  22. Mansour, R., Refaei, N., Gamon, M., Abdul-Hamid, A., & Sami, K. (2013). Revisiting the old kitchen sink: Do we need sentiment domain adaptation? In Proceedings of the Recent Advances in Natural Language Processing, RANLP 2013, Hissar, Bulgaria, September 9–11, 2013, pp. 420–427.Google Scholar
  23. Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of the International Conference on Language Resources and Evaluation, LREC, 2010, Valletta, Malta, May, 17–23, 2010, pp. 1320–1326.Google Scholar
  24. Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135.CrossRefGoogle Scholar
  25. Peng, F., Schuurmans, D., & Wang, S. (2003). Language and task independent text categorization with simple language models. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL '03, Edmonton, Canada, May 27–June 1, 2003 (pp. 110–117). Association for Computational Linguistics.Google Scholar
  26. Ptaszynski, M., Rzepka, R., Araki, K., & Momouchi, Y. (2011). Research on emoticons: review of the field and proposal of research framework. In Proceedings of the Seventeenth Annual Meeting of the Association for Natural Language Processing (NLP-2011) Toyohashi, Japan, March 7–11, 2011 (pp. 1159–1162). The Association for Natural Language Processing.Google Scholar
  27. Raaijmakers, S., & Kraaij, W. (2008). A shallow approach to subjectivity classification. In Proceedings of the Second International Conference on Weblogs and Social Media, ICWSM 2008, Seattle, WA, USA, March 30–April 2, 2008 (pp. 216–217). Association for the Advancement of Artificial Intelligence.Google Scholar
  28. Ritter, A., Clark, S., & Etzioni, O. (2011). Named entity recognition in tweets: An experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Edinburgh, UK, July, 27–31, 2011 (pp. 1524–1534). Association for Computational Linguistics.Google Scholar
  29. Rybina, K. (2012). Sentiment analysis of contexts around query terms in documents. Master’s thesis, Technische Universität Dresden.Google Scholar
  30. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2), 267–307.CrossRefGoogle Scholar
  31. Tsarfaty, R., Seddah, D., Goldberg, Y., Kuebler, S., Candito, M., Foster, J., Versley, Y., Rehbein, I., & Tounsi, L. (2010). Statistical parsing of morphologically rich languages (SPMRL): What, how and whither. In Proceedings of the First Workshop on Statistical Parsing of Morphologically-Rich Languages, NAACL HLT 2010, Los Angeles, CA, USA, June 5, 2010 (pp. 1–12). Association for Computational Linguistics.Google Scholar
  32. Ye, Q., Zhang, Z., & Law, R. (2009). Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Systems with Applications, 36(3), 6527–6535.CrossRefGoogle Scholar
  33. Zhang, L., Ghosh, R., Dekhil, M., Hsu, M., & Liu, B. (2011). Combining lexicon based and learning-based methods for twitter sentiment analysis(Technical Report HPL-2011-89). HP Laboratories.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Faculty of Management, Department of Exact MethodsUniversity of EconomicsPragueCzech Republic

Personalised recommendations