Skip to main content

A comparative empirical study on social media sentiment analysis over various genres and languages

Abstract

People express their opinions about things like products, celebrities and services using social media channels. The analysis of these textual contents for sentiments is a gold mine for marketing experts as well as for research in humanities, thus automatic sentiment analysis is a popular area of applied artificial intelligence. The chief objective of this paper is to investigate automatic sentiment analysis on social media contents over various text sources and languages. The comparative findings of the investigation may give useful insights to artificial intelligence researchers who develop sentiment analyzers for a new textual source. To achieve this, we describe supervised machine learning based systems which perform sentiment analysis and we comparatively evaluate them on seven publicly available English and Hungarian databases, which contain text documents taken from Twitter and product review sites. We discuss the differences among these text genres and languages in terms of document- and target-level sentiment analysis.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

Notes

  1. www.internetslang.com.

  2. http://www.arukereso.hu.

References

  • Amigó E, Carrillo de Albornoz J, Chugur I, Corujo A, Gonzalo J, Martín T, Meij E, de Rijke M, Spina D, Amigo E, de Albornoz JC, Martin T, de Rijke M (2013) Overview of replab 2013: evaluating online reputation monitoring systems. In: Information access evaluation. multilinguality, multimodality, and visualization, pp 333–352

  • Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the seventh international conference on language resources and evaluation (LREC’10)

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Bohnet B (2010) Top accuracy and fast dependency parsing is not a contradiction. In: Proceedings of the 23rd international conference on computational linguistics (Coling 2010), Beijing, China, pp 89–97

  • Ceylan H, Mihalcea R (2011) An efficient indexer for large N-gram corpora. In: ACL (system demonstrations), pp 103–108

  • Cossu JV, Bigot B, Bonnefoy L, Morchid M, Bost X, Senay G, Dufour R, Bouvier V, Torres-Moreno JM, El-Beze M (2013) LIA@RepLab 2013. In: Working notes of CLEF 2013 evaluation labs and workshop

  • Farkas R, Bohnet B (2012) Stacking of dependency and phrase structure parsers. In: Proceedings of COLING 2012, the COLING 2012 Organizing Committee, Mumbai, pp 849–866

  • Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82. doi:10.1145/2436256.2436274

    Article  Google Scholar 

  • Foster J, Çetinoglu Ö, Wagner J, Le Roux J, Hogan S, Nivre J, Hogan D, Van Genabith J (2011) # hardtoparse: POS tagging and parsing the twitterverse. In: AAAI 2011 workshop on analyzing microtext, pp 20–25

  • Hangya V, Farkas R (2013) Filtering and polarity detection for reputation management on tweets. In: Working notes of CLEF 2013 evaluation labs and workshop

  • Hangya V, Berend G, Farkas R (2013) SZTE-NLP: sentiment detection on twitter messages. In: Second joint conference on lexical and computational semantics (*SEM), volume 2: proceedings of the seventh international workshop on semantic evaluation (SemEval 2013), pp 549–553

  • Hangya V, Berend G, Varga I, Farkas R (2014) SZTE-NLP: aspect level opinion mining exploiting syntactic cues. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014). Dublin, Ireland, pp 610–614

  • Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inf Sci Technol 60(11):2169–2188

  • Jiang L, Yu M, Zhou M, Liu X, Zhao T (2011) Target-dependent Twitter sentiment classification. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics, pp 151–160

  • Jindal N, Liu B, Street SM (2008) Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining

  • Kessler JS, Eckert M, Clark L, Nicolov N (2010) The 2010 ICWSM JDPA sentiment corpus for the automotive domain. In: 4th international AAAI conference on weblogs and social media data workshop challenge (ICWSM-DWC 2010)

  • Kiritchenko S, Zhu X, Cherry C, Mohammad S (2014) NRC-Canada-2014: detecting aspects and sentiment in customer reviews. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), SemEval, p 437

  • Klein D, Manning CD (2003) Accurate unlexicalized parsing. In: Proceedings of the 41st ACL, pp 423–430. doi:10.3115/1075096.1075150

  • Kong L, Schneider N, Swayamdipta S, Bhatia A, Dyer C, Smith NA (2014) A dependency parser for tweets. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, pp 1001–1012

  • Lazaridou A, Titov II, Sporleder CC (2013) A Bayesian model for joint unsupervised induction of sentiment, aspect and discourse representations. In: 51st annual meeting of the Association for Computational Linguistics, ACL 2013, pp 1630–1639

  • Li S, Zhou L, Li Y (2015) Improving aspect extraction by augmenting a frequency-based method with web-based similarity measures. Inf Process Manag 51(1):58–67. doi:10.1016/j.ipm.2014.08.005

    Article  Google Scholar 

  • Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167

    Article  Google Scholar 

  • Martínez-Cámara E, Martín-Valadivia MT, Urena-López LA, Montejo-Ráez AR (2012) Sentiment analysis in Twitter. Nat Lang Eng 20(01):1–28. doi:10.1017/S1351324912000332

    Article  Google Scholar 

  • McCallum AK (2002) MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu

  • Miháltz M (2013) OpinHuBank: szabadon hozzáférhető annotált korpusz magyar nyelvű véleményelemzéshez. In: IX. Magyar Számítógépes Nyelvészeti Konferencia, pp 343–345

  • Montejo-Ráez A, Martínez-Cámara E, Martín-Valdivia MT, Ureña-López LA (2014) A knowledge-based approach for polarity classification in Twitter. J Assoc Inf Sci Technol 65(2):414–425. doi:10.1002/asi.22984

    Article  Google Scholar 

  • O’Connor B, Balasubramanyan R (2010) From tweets to polls: linking text sentiment to public opinion time series. In: ICWSM

  • Pontiki M, Galanis D, Pavlopoulos J, Papageorgiou H, Androutsopoulos I, Manandhar S (2014) Semeval-2014 task 4: aspect based sentiment analysis. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), SemEval ’14, pp 27–35

  • Poria S, Cambria E, Ku LW, Gui C, Gelbukh A (2014) A rule-based approach to aspect extraction from product reviews. In: Proceedings of the second workshop on natural language processing for social media (SocialNLP), Association for Computational Linguistics and Dublin City University, Dublin, pp 28–37

  • Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl-Based Syst 89:14–46. doi:10.1016/j.knosys.2015.06.015

    Article  Google Scholar 

  • Reyes A, Rosso P (2013) On the difficulty of automatically detecting irony: beyond a simple case of negation. Knowl Inf Syst 40(3):595–614. doi:10.1007/s10115-013-0652-8

    Article  Google Scholar 

  • Rosenthal S, Nakov P, Ritter A, Stoyanov V (2014) Semeval-2014 task 9: Sentiment analysis in twitter. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), SemEval, pp 73–80

  • Sang ETK, Bos J (2012) Predicting the 2011 Dutch Senate Election results with Twitter. In: Proceedings of the 13th conference of the European Chapter of the Association for Computational Linguistics, pp 53–60

  • Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642

  • Szántó Zs, Farkas R (2014) Special techniques for constituent parsing of morphologically rich languages. In: Proceedings of the 14th conference of the European Chapter of the Association for Computational Linguistics, pp 135–144

  • Varga I, Sano M, Torisawa K, Hashimoto C, Ohtake K, Kawai T, Oh JH, De Saeger S (2013) Aid is out there: looking for help from tweets during a large scale disaster. In: Proceedings of the 51st annual meeting of the ACL, pp 1619–1629

  • Vilares D, Alonso MA, Gómez-Rodriguez C (2015a) A syntactic approach for opinion mining on Spanish reviews. Nat Lang Eng 21(01):139–163. doi:10.1017/S1351324913000181

    Article  Google Scholar 

  • Vilares D, Alonso MA, Gómez-Rodríguez C (2015b) On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages. J Assoc Inf Sci Technol 66(9):1799–1816. doi:10.1002/asi.23284

    Article  Google Scholar 

  • Vinodhini G, Chandrasekaran RM (2012) Sentiment analysis and opinion mining: a survey. Int J Adv Res Comput Sci Softw Eng 2(6):282–292

  • Wagner J, Arora P, Cortes S (2014) DCU: aspect-based polarity classification for semeval task 4. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pp 223–229

  • Wiegand M, Balahur A, Roth B, Klakow D, Montoyo A (2010) A survey on the role of negation in sentiment analysis. In: Proceedings of the workshop on negation and speculation in natural language processing, pp 60–68

  • Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the conference on human language technology and empirical methods in natural language processing, Association for Computational Linguistics, pp 347–354

  • Wilson T, Kozareva Z, Nakov P, Rosenthal S, Stoyanov V, Ritter A (2013) SemEval-2013 Task 2: sentiment analysis in Twitter. In: Proceedings of the international workshop on semantic evaluation, SemEval‘3

  • Zhang C, Zeng D, Li J, Wang FY, Zuo W (2009) Sentiment analysis of Chinese documents: from sentence to document level. J Am Soc Inf Sci Technol 60(12):2474–2487. doi:10.1002/asi.21206

    Article  Google Scholar 

  • Zhu X, Kiritchenko S, Mohammad S (2014) NRC-Canada-2014: recent improvements in the sentiment analysis of tweets. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pp 443–447

  • Zsibrita J, Vincze V, Farkas R (2013) Magyarlanc: a toolkit for morphological and dependency parsing of Hungarian. In: Proceedings of RANLP, pp 763–771

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Viktor Hangya.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hangya, V., Farkas, R. A comparative empirical study on social media sentiment analysis over various genres and languages. Artif Intell Rev 47, 485–505 (2017). https://doi.org/10.1007/s10462-016-9489-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-016-9489-3

Keywords

  • Natural language processing
  • Sentiment analysis
  • Data mining