Abstract
Sentiment analysis is a natural language processing task where the goal is to classify the sentiment polarity of the expressed opinions, although the aim to achieve the highest accuracy in sentiment classification for one particular language, does not truly reflect the needs of business. Sentiment analysis is often used by multinational companies operating on multiple markets. Such companies are interested in consumer opinions about their products and services in different countries (thus in different languages). However, most of the research in multi-language sentiment classification simply utilizes automated translation from minor languages to English (and then conducting sentiment analysis for English). This paper aims to contribute to the multi-language sentiment classification problem and proposes a language independent approach which could provide a good level of classification accuracy in multiple languages without using automated translations or language-dependent components (i.e. lexicons). The results indicate that the proposed approach could provide a high level of sentiment classification accuracy, even for multiple languages and without the language dependent components.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aisopos, F., Papadakis, G., Tserpes, K., & Varvarigou, T. (2012). Content vs. context for sentiment analysis: A comparative analysis over microblogs. In: Proceedings of the 23rd ACM Conference on Hypertext and Social Media, Milwaukee, WI, USA, June 25–28, 2012 (pp. 187–196). New York, NY: ACM.
Aldred, J., Astell, A., Behr, R., Cochrane, L., Hind, J., Pickard, A., Potter, L., Wignall, A., & Wiseman, E. (2008). The world’s 50 most powerful blogs. The Guardian [online]. Accessed April 6, 2013, from http://www.guardian.co.uk/technology/2008/mar/09/blogs
Anderson, E. W. (1998). Customer satisfaction and word of mouth. Journal of Service Research, 1(1), 5–17.
Anon. (n.d.a). Ähnliche Wörter Englisch–Deutsch. Wiktionary [online]. Accessed August 19, 2014, from http://de.wiktionary.org/wiki/Verzeichnis:Englisch/%C3%84hnliche_W%C3%B6rter_Englisch%E2%80%93Deutsch
Anon. (n.d.b). English-French relations. Wiktionary [online]. Accessed August 19, 2014, from http://en.wiktionary.org/wiki/Appendix:English-French_relations
Aue, A., & Gamon, M. (2005). Customizing sentiment classifiers to new domains: A case study. In Proceedings of the Recent Advances in Natural Language Processing RANLP 2005, Borovets, Bulgaria, September 21–23, 2005 (pp. 1–7). Microsoft Research.
Banea, C., Mihalcea, R., & Wiebe, J. (2010). Multilingual subjectivity: Are more languages better? In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, August 23–27, 2010 (pp. 28–36). Association for Computational Linguistics.
Berns, M., De Bot, K., & Hasebrink, U. (2007). In the presence of English: Media and European Youth. Berlin: Springer.
Blamey, B., Crick, T., & Oatley, G. (2012). RU:-) or:-(? character-vs. word-gram feature selection for sentiment classification of OSN corpora. In Proceedings of the 32nd SGAI International Conference on Artificial Intelligence, Cambridge, UK, December 11–13, 2012 (pp. 207–212). Springer.
Brooke, J., Tofiloski, M., & Taboada, M. (2009). Cross-linguistic sentiment analysis: From English to Spanish. In Proceedings of the Recent Advances in Natural Language Processing RANLP 2005, Borovets, Bulgaria, September 14–16, 2009, pp. 50–54.
Cambria, E., Schuller, B., Xia, Y., & Havasi, C. (2013). New avenues in opinion mining and sentiment analysis. Intelligent Systems, 28(2), 15–21.
Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 1–39.
Comcowich, W. J. (2010). Media monitoring: The complete guide. CyberAlert [online]. Accessed August 8, 2013, from http://www.cyberalert.com/downloads/media_monitoring_whitepaper.pdf
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
Escalante, H. J., Solorio, T., & Montes-Y-Gómez, M. (2011). Local histograms of character n-grams for authorship attribution. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, Portland, OR, June 19–24, 2011 (pp. 288–298). Association for Computational Linguistics.
Goldenberg, J., Libai, B., & Muller, E. (2001). Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing Letters, 12(3), 211–223.
Habernal, I., Ptácek, T., & Steinberger, J. (2013). Sentiment analysis in Czech social media using supervised machine learning. In: Proceedings of the Fourth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Atlanta, GA, June 14, 2013, pp. 65–74.
Horrigan, J. B. (2008). Online shopping. Pew Internet & American Life Project [online]. Washington, DC. Accessed August 8, 2014, from http://www.pewinternet.org/Reports/2008/Online-Shopping/01-Summary-of-Findings.aspx
Kanaris, I., Kanaris, K., Houvardas, I., & Stamatatos, E. (2007). Words versus character n-grams for anti-spam filtering. International Journal on Artificial Intelligence Tools, 16(06), 1047–1067.
Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167.
Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, Portland, OR, June 19–24, 2011 (pp. 142–150). Association for Computational Linguistics.
Mansour, R., Refaei, N., Gamon, M., Abdul-Hamid, A., & Sami, K. (2013). Revisiting the old kitchen sink: Do we need sentiment domain adaptation? In Proceedings of the Recent Advances in Natural Language Processing, RANLP 2013, Hissar, Bulgaria, September 9–11, 2013, pp. 420–427.
Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of the International Conference on Language Resources and Evaluation, LREC, 2010, Valletta, Malta, May, 17–23, 2010, pp. 1320–1326.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135.
Peng, F., Schuurmans, D., & Wang, S. (2003). Language and task independent text categorization with simple language models. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL '03, Edmonton, Canada, May 27–June 1, 2003 (pp. 110–117). Association for Computational Linguistics.
Ptaszynski, M., Rzepka, R., Araki, K., & Momouchi, Y. (2011). Research on emoticons: review of the field and proposal of research framework. In Proceedings of the Seventeenth Annual Meeting of the Association for Natural Language Processing (NLP-2011) Toyohashi, Japan, March 7–11, 2011 (pp. 1159–1162). The Association for Natural Language Processing.
Raaijmakers, S., & Kraaij, W. (2008). A shallow approach to subjectivity classification. In Proceedings of the Second International Conference on Weblogs and Social Media, ICWSM 2008, Seattle, WA, USA, March 30–April 2, 2008 (pp. 216–217). Association for the Advancement of Artificial Intelligence.
Ritter, A., Clark, S., & Etzioni, O. (2011). Named entity recognition in tweets: An experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Edinburgh, UK, July, 27–31, 2011 (pp. 1524–1534). Association for Computational Linguistics.
Rybina, K. (2012). Sentiment analysis of contexts around query terms in documents. Master’s thesis, Technische Universität Dresden.
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2), 267–307.
Tsarfaty, R., Seddah, D., Goldberg, Y., Kuebler, S., Candito, M., Foster, J., Versley, Y., Rehbein, I., & Tounsi, L. (2010). Statistical parsing of morphologically rich languages (SPMRL): What, how and whither. In Proceedings of the First Workshop on Statistical Parsing of Morphologically-Rich Languages, NAACL HLT 2010, Los Angeles, CA, USA, June 5, 2010 (pp. 1–12). Association for Computational Linguistics.
Ye, Q., Zhang, Z., & Law, R. (2009). Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Systems with Applications, 36(3), 6527–6535.
Zhang, L., Ghosh, R., Dekhil, M., Hsu, M., & Liu, B. (2011). Combining lexicon based and learning-based methods for twitter sentiment analysis(Technical Report HPL-2011-89). HP Laboratories.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Kincl, T., Novák, M., Přibil, J. (2016). Sentiment Classification in Multiple Languages: Fifty Shades of Customer Opinions. In: Bilgin, M., Danis, H., Demir, E., Can, U. (eds) Business Challenges in the Changing Economic Landscape - Vol. 2. Eurasian Studies in Business and Economics, vol 2/2. Springer, Cham. https://doi.org/10.1007/978-3-319-22593-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-22593-7_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22592-0
Online ISBN: 978-3-319-22593-7
eBook Packages: Economics and FinanceEconomics and Finance (R0)