Skip to main content
Log in

A survey of multilingual human-tagged short message datasets for sentiment analysis tasks

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Today, the electronic word-of-mouth (eWOM) statements expressed on blogs, social media or shopping platforms are much frequent and enable customers to share his/her point of view about acquired products or services. These eWOM statements can be used for the industry to improve its products and services and for customers for making better purchase decisions. Sentiment analysis (SA) techniques can be used to extract and analyze these eWOM statements. Research in recent years on SA has advanced considerably, and its applications in business management have grown exponentially. Automatic techniques (such as machine learning, deep learning and statistic approaches) have been used for this purpose. However, training a machine for processing or analyzing sentiments is a hard task, mainly due to the complexity of the natural language. This task is more complicated in multilingual environments. There is still a great paucity regarding training datasets, one of the key resources in achieving more favorable results. Training datasets, in fact, are a reservoir of information serving to teach and refine the skills of automatic techniques. Hence, the higher the quality of the training datasets, the better predictive power of sentiment analysis tasks. English datasets are relatively easy to find in the literature; however, datasets in other languages are very scarce. So, this paper therefore describes and compiles information concerning 25 datasets gleaned from short messages (statements expressed in social media and shopping platforms) in seven different languages, for the most part from Twitter. For quality issues, all the resources were human-tagged, and they are currently available to the scientific community. A new sentiment dataset in English extracted from Twitter has also been drawn up and each message evaluated subjectively. The current survey therefore aims to provide essential quality information for future research related to automatic sentiment analysis in monolingual or multilingual scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. http://www.internetlivestats.com/twitter-statistics/ [accessed July 14, 2017].

  2. http://alt.qcri.org/semeval2015/task11/ [accessed July 14, 2017].

  3. https://crowdflower.com [accessed July 14, 2017].

  4. http://www.dai-labor.de/ [accessed July 14, 2017].

  5. http://www.dai-labor.de/ [accessed July 14, 2017].

  6. http://www3.nd.edu/~dwang5/courses/spring15/assignments/A1/Assignment1_SocialSensing.html [accessed July 14, 2017].

  7. http://www.dai-labor.de/ [accessed July 14, 2017].

  8. http://www.di.unito.it/~tutreeb/sentiTUT.html [accessed July 14, 2017].

  9. http://www.evalita.it/2016/tasks/sentipolc [accessed July 14, 2017].

  10. http://www.dai-labor.de/ [accessed July 14, 2017].

References

  • Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst 26(3):12:1–12:34. doi:10.1145/1361684.1361685

    Article  Google Scholar 

  • Abdulla NA, Ahmed NA, Shehab MA, Al-Ayyoub M (2013) Arabic sentiment analysis: lexicon-based and corpus-based. In: Proceedings of IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT’13)

  • Ahmad M, Aftab S, Muhammad SS, Waheed U (2017) Tools and techniques for lexicon driven sentiment analysis: a review. Int J Multidiscip Sci Eng 8(1):17–23

    Google Scholar 

  • Al-Kabi M, Al-Ayyoub M, Alsmadi I, Wahsheh H (2016) A prototype for a standard Arabic sentiment analysis corpus. Int Arab J Inf Technol 13:163–170

    Google Scholar 

  • Al-Twairesh N, Al-Khalifa H, Al-Salman A (2015) Subjectivity and sentiment analysis of Arabic: trends and challenges. In: Proceedings of IEEE/ACS International Conference on Computer Systems and Applications, (AICCSA’15), pp 148–155

  • Araujo M, Pereira A, Reis J, Benevenuto F (2016) An evaluation of machine translation for multilingual sentence-level sentiment analysis. 1140–1145. doi:10.1145/2851613.2851817

  • Baca-Gomez YR, Martinez A, Rosso P et al (2016) Web service SWePT: a hybrid opinion mining approach. J Univers Comput Sci 22:671–690

    MathSciNet  Google Scholar 

  • Balahur A, Hermida JM, Montoyo A (2012) Building and exploiting EmotiNet, a knowledge base for emotion detection based on the appraisal theory model. IEEE Trans Affect Comput 3(1):88–101

    Article  Google Scholar 

  • Balog K, Mishne G, Rijke M De (2006) Why are they excited? Identifying and explaining spikes in blog mood levels. In: Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters and Demonstrations (EACL ’06) (pp. 207–210). Retrieved from http://dl.acm.org/citation.cfm?id=1609010

  • Barbosa L, Feng J (2010) Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, pp 36–44. Retrieved from http://dl.acm.org/citation.cfm?id=1944571

  • Basile P, Basile V, Nissim M, Novielli N (2015) Deep tweets: from entity linking to sentiment analysis. In: Proceedings of Second Italian Conference on Computational Linguistics (CLiC-it’15), pp 41–45

  • Basile P, Novielli N (2014) UNIBA at EVALITA 2014-SENTIPOLC Task: predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features. In: Proceedings of 4th Evaluation of NLP and Speech Tools for Italian (EVALITA’14), pp 58–63

  • Basile V, Bolioli A, Nissim M, et al (2014) Overview of the Evalita 2014 sentiment polarity classification task. In: Proceedings of 4th Evaluation of NLP and Speech Tools for Italian (EVALITA’14), pp 50–57

  • Beineke P, Hastie T, Manning C, Vaithyanathan S (2004) Exploring Sentiment Summarization. In: Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text Theories and Applications (Vol. 7, pp. 1–4). Retrieved from http://www.aaai.org/Papers/Symposia/Spring/2004/SS-04-07/SS04-07-003.pdf

  • Bernabé-Moreno J, Tejeda-Lorente A, Porcel C, Fujita H, Herrera-Viedma E (2015a) CARESOME: a system to enrich marketing customers acquisition and retention campaigns using social media information. Knowl-Based Syst 80:163–179

    Article  Google Scholar 

  • Bernabé-Moreno J, Tejeda-Lorente A, Porcel C, Fujita H, Herrera-Viedma E (2015b) Emotional profiling of locations based on social media. Proced Comput Sci 55:960–969

    Article  Google Scholar 

  • Bernabé-Moreno J, Tejeda-Lorente A, Porcel C, Herrera-Viedma E (2015c) A new model to quantify the impact of a topic in a location over time with social media. Expert Syst Appl 42(7):3381–3395

    Article  Google Scholar 

  • Boiy E, Moens MF (2009) A machine learning approach to sentiment analysis in multilingual web texts. Inf Retr 12:526–558. doi:10.1007/s10791-008-9070-z

    Article  Google Scholar 

  • Bosco C, Patti V, Bolioli A (2015) Developing corpora for sentiment analysis: the case of irony and Senti-TUT (extended abstract). In: Proceedings of Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI’15) pp 4158–4162. doi:10.1109/MIS.2013.28

    Article  Google Scholar 

  • Bradley MM, Lang PJ (1994) Measuring emotion: the self-assessment manikin and the semantic differential. J Behav Ther Exp Psychiatry 25:49–59. doi:10.1016/0005-7916(94)90063-9

    Article  Google Scholar 

  • Bravo-Marquez F, Mendoza M, Poblete B (2013) Combining strengths, emotions and polarities for boosting twitter sentiment analysis. In: Proceedings of Second International Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM’13), pp 1–9

  • Cambria E, Speer R, Havasi C, Hussain A (2010) SenticNet: a publicly available semantic resource for opinion mining. In: AAAI Fall Symposium: Commonsense Knowledge, vol. 10, p 02

  • Cambria E, Havasi C, Hussain A (2012) SenticNet 2: a semantic and affective resource for opinion mining and sentiment analysis, In: Proceedings of 25th Int’l Florida Artificial Intelligence Research Society Conference, AAAI, pp 202–207

  • Cambria E, Olsher D, Rajagopal E (2014) SenticNet 3: a common and commonsense knowledge base for cognition-driven sentiment analysis, In: Twentyeighth AAAI Conference on Artificial Intelligence, pp 1515–1521

  • Castellucci G, Croce D, Cao D De, Basili R (2014) A multiple kernel approach for twitter sentiment analysis in Italian. In: Proceedings of 4th Evaluation of NLP and speech tools for Italian (EVALITA’14), pp 98–103

  • Chafale D, Pimpalkar A (2014) Review on developing corpora for sentiment analysis using plutchik’s wheel of emotions with fuzzy logic. Int J Comput Sci Eng (IJCSE) 2:14–18

    Google Scholar 

  • Chen H, Zimbra D (2010) AI and opinion mining. IEEE Intell Syst 25:74–76. doi:10.1109/MIS.2010.75

    Article  Google Scholar 

  • Coletta LFS, Silva NFF, Hruschka ER, Hruschka ERJ (2014) Combining classification and clustering for tweet sentiment analysis. In: Proceedings of Brazilian Conference on Intelligent Systems (BRACIS’14), pp 210–215

  • Cotelo JM, Cruz FL, Enríquez F, Troyano JA (2016) Tweet categorization by combining content and structural knowledge. Inf Fus 31:54–64. doi:10.1016/j.inffus.2016.01.002

    Article  Google Scholar 

  • Cumbreras MÁG, Cámara EM, Román JV, Morera JG (2016) TASS 2015-the evolution of the Spanish opinion mining systems. Procesamiento de Lenguaje Nat 56:33–40

    Google Scholar 

  • Da Silva NFF, Hruschka ER, Hruschka ERJ (2014) Tweet sentiment analysis with classifier ensembles. Decis Support Syst 66:170–179. doi:10.1016/j.dss.2014.07.003

    Article  Google Scholar 

  • Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th international conference on World Wide Web (pp 519–528). doi:10.1145/775152.775226

  • Dickinson B, Ganger M, Hu W (2015) Dimensionality reduction of distributed vector word representations and emoticon stemming for sentiment analysis. J Data Anal Inf Process 3:153–162. doi:10.4236/jdaip.2015.34015

    Article  Google Scholar 

  • Ding X, Liu B, Yu PS (2008) A holistic lexicon-based approach to opinion mining. In: Proceedings of International conference on Web search and web data mining (WSDM’08), pp 231–239

  • Dosciatti MM, Ferreira LPC, Paraiso EC (2013) Identificando emoções em textos em português do Brasil usando máquina de vetores de suporte em solução multiclasse. In: Proceedings of X Encontro nacional de inteligência artificial e computacional

  • Duncan B, Zhang Y (2015) Neural networks for sentiment analysis on twitter. In: Proceedings of 14th International conference on cognitive informatics and cognitive computing (ICCI’CC’15), pp 275–278

  • Esuli A, Sebastiani F (2006) Determining term subjectivity and term orientation for opinion mining. In: Proceedings of the 11th Meeting of the European Chapter of the Association for Computational Linguistics (EACL-2006), Vol. 2, pp 193–200. Retrieved from http://acl.ldc.upenn.edu/eacl2006/main/papers/13_1_esulisebastiani_192.pdf

  • Farías DIH, Patti V, Rosso P (2016) Irony detection in twitter: The role of affective content. ACM Trans Internet Technol (TOIT) 16(3):19

    Article  Google Scholar 

  • Fast E, Chen B, Bernstein MS (2016) Empath: understanding topic signals in large-scale text. In: Conference on human factors in computing systems (CHI’16), pp 4647–4657

  • Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382

    Article  Google Scholar 

  • Gaspar R, Pedro C, Panagiotopoulos P, Seibt B (2016) Beyond positive or negative: qualitative sentiment analysis of social media reactions to unexpected stressful events. Comput Human Behav 56:179–191. doi:10.1016/j.chb.2015.11.040

    Article  Google Scholar 

  • Ghosh A, Li G, Veale T, et al (2015) SemEval-2015 Task 11: Sentiment analysis of figurative language in twitter. In: Proceedings of 9th International Workshop on Semantic Evaluation (SemEval’15), pp 470–478

  • Go A, Bhayani R, Huang L (2009a) Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 2009 5. doi:10.1016/j.sedgeo.2006.07.004

    Article  Google Scholar 

  • Go A, Huang L, Bhayani R (2009b) Twitter sentiment analysis. CS224N - Final Project Report 17. doi:10.1007/978-3-642-35176-1_32

    Chapter  Google Scholar 

  • Greene S, Resnik P (2009) More than words: syntactic packaging and implicit sentiment. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL, pp 503–511

  • Hennig–Thurau T, Gwinner KP, Walsh G, Gremler DD (2004) Electronic word-of-mouth via consumer-opinion platforms: what motivates consumers to articulate themselves on the Internet? J Interact Mark 18(1):38–52. doi:10.1002/dir.10073

    Article  Google Scholar 

  • Hodes RL, Cook EW, Lang PJ (1985) Individual differences in autonomic response: conditioned association or conditioned fear? Psychophysiology 22:545–560. doi:10.1111/j.1469-8986.1985.tb01649.x

    Article  Google Scholar 

  • Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’04) (pp 168–177). doi:10.1145/1014052.1014073

  • Hu X, Tang L, Tang J, Liu H (2013) Exploiting social relations for sentiment analysis in microblogging. In: Proceedings of Sixth ACM International Conference on Web Search and Data Mining (WSDM’13), pp 537–546

  • Hung C, Lin HK, Yuan C (2013) Using objective words in SentiWordNet to improve word-of-mouth sentiment classification. IEEE Trans Intell Syst 2:47–54

    Google Scholar 

  • Hurtado L-F, Pla F (2014) ELiRF-UPV en TASS 2014: análisis de sentimientos, detección de tópicos y análisis de sentimientos de aspectos en Twitter. Procesamiento del Lenguaje Natural pp 1–7

  • Kang H, Yoo SJ, Han D (2012) Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Syst Appl 39:6000–6010

    Article  Google Scholar 

  • Jakob N, Gurevych I (2010) Extracting opinion targets in a single-and cross-domain setting with conditional random fields. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp 1035–1045. Retrieved from http://portal.acm.org/citation.cfm?id=1870759

  • Jindal N, Liu B (2006) Identifying comparative sentences in text documents. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR’06), p 244. doi:10.1145/1148170.1148215

  • Jindal N, Liu B (2007) Review spam detection. In: Proceedings of WWW-2007, pp 1189–1190. doi:10.1145/1242572.1242759

  • Jurafsky D, Martin JH (2009) Speech and language processing: an introduction to natural language processing

  • Krippendorff K (2004) Content analysis: an introduction to its methodology, 2nd edn

  • Krippendorff K (2011) Computing Krippendorff’s alpha-reliability. Departmental Papers (ASC) p 1-12

  • Lahuerta-Otero E, Cordero-Gutiérrez R (2016) Looking for the perfect tweet. The use of data mining techniques to find influencers on Twitter. Comput Human Behav 64:575–583. doi:10.1016/j.chb.2016.07.035

    Article  Google Scholar 

  • Lee SW, Song YI, Lee JT, Han KS, Rim HC (2012) A new generative opinion retrieval model integrating multiple ranking factors. J Intell Inf Syst 38(2):487–505. doi:10.1007/s10844-011-0164-5

    Article  Google Scholar 

  • Li S-T, Tsai F-C (2013) A fuzzy conceptualization model for text mining with application in opinion polarity classification. Knowl-Based Syst 39:23–33. doi:10.1016/j.knosys.2012.10.005

    Article  Google Scholar 

  • Liu B (2012) Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, vol 5. Morgan & Claypool Publishers, San Rafael. doi:10.2200/S00416ED1V01Y201204HLT016

    Article  Google Scholar 

  • Martín-Valdivia MT, Martínez-Cámara E, Perea-Ortega JM, Ureña-López LA (2013) Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches. Expert Syst Appl 40(10):3934–3942. doi:10.1016/j.eswa.2012.12.084

    Article  Google Scholar 

  • Mohammad SM, Kiritchenko S, Zhu X (2013) NRC-Canada: building the state-of-the-art in sentiment analysis of tweets. In: Proceedings of seventh international workshop on semantic evaluation exercises (SemEval’13), pp 321–327. arXiv preprint arXiv:1308.6242. Accessed 09 Nov 2016

  • Mohammad SM, Sobhani P, Kiritchenko S (2016) Stance and sentiment in tweets. ACM Trans Embed Comput Syst 0:22. arXiv preprint arXiv:1605.01655v1. Accessed 09 Nov 2016

  • Mohammad SM, Zhu X, Kiritchenko S, Martin J (2015) Sentiment, emotion, purpose, and style in electoral tweets. Inf Process Manag 51:480–499. doi:10.1016/j.ipm.2014.09.003

    Article  Google Scholar 

  • Momtazi S (2012) Fine-grained German sentiment analysis on social media. In: Proceedings of 9th Intl. Conference on Language Resources and Evaluation, pp 1215–1220

  • Montoyo A, Martínez-Barco P, Balahur A (2012) Subjectivity and sentiment analysis: an overview of the current state of the area and envisaged developments. Decis Support Syst 53:675–679. doi:10.1016/j.dss.2012.05.022

    Article  Google Scholar 

  • Montesi M, Navarrete T (2008) Classifying web genres in context: a case study documenting the web genres used by a software engineer. Inf Process Manag 44(4):1410–1430. doi:10.1016/j.ipm.2008.02.001

    Article  Google Scholar 

  • Morinaga S, Yamanishi K, Tateishi K, Fukushima T (2002) Mining product reputations on the web. In: Proceedings of Eighth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’02), pp 341–349

  • Mukherjee S, Bhattacharyya P (2012) Sentiment analysis in twitter with lightweight discourse analysis. In: Proceedings of Coling, pp 1847–1864

  • Mukherjee S, Malu A, Balamurali AR, Bhattacharyya P (2012) TwiSent: a multistage system for analyzing sentiment. In: Proceedings of Conference on Information and Knowledge Management (CIKM’12), pp 2531–2534

  • Nakov P, Rosenthal S, Kozareva Z, et al (2013) SemEval-2013 Task 2: sentiment analysis in twitter. In: Proceedings of International Workshop on Semantic Evaluation (SemEval’13), pp 312–320

  • Narr S, Hülfenhaus M, Albayrak S (2012) Language-independent twitter sentiment analysis. In: Proceedings of Knowledge Discovery and Machine Learning (KDML’12), pp 12–14

  • Nascimento P, Aguas R, de Lima D et al (2015) Análise de sentimento de tweets com foco em notícias. Revista Eletrônica de Sistemas de Informação 14:12. doi:10.5329/RESI

    Article  Google Scholar 

  • Neviarouskaya A, Prendinger H, Ishizuka M (2011) SentiFul: a lexicon for sentiment analysis, IEEE Trans Affect Comput 2:1

    Article  Google Scholar 

  • Nguyen HL, Jung JE (2017) Statistical approach for figurative sentiment analysis on social networking services: a case study on twitter. Multimed Tools Appl 76(6):8901–8914

    Article  Google Scholar 

  • Obaidat I, Mohawesh R, Al-Ayyoub M, et al (2015) Enhancing the determination of aspect categories and their polarities in Arabic reviews using lexicon-based approaches. In: Proceedings of Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT’15), pp 1–6

  • Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annua Meeting of the Association for Computational Linguistics, pp 1–11. Retrieved from arXiv:1107.4557

  • Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summation based on minimum cuts. In: Proceedings of 42nd Annual Meeting on Association for Computational Linguistics (ACL’04), pp 271–278. doi:10.3115/1218955.1218990

  • Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Conference on Empirical Methods in Natural Language Processing (EMNLP’02), pp 79–86. doi:10.3115/1118693.1118704

  • Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summation based on minimum cuts. In: Proceedings of 42nd Annual Meeting on Association for Computational Linguistics (ACL’04), pp 271–279. doi:10.3115/1218955.1218990

  • Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2:1–135. doi:10.1561/1500000011

    Article  Google Scholar 

  • Park S (2015) Sentiment classification using sociolinguistic clusters. In: Proceedings of TASS 2015: Workshop on Sentiment Analysis at SEPLN, pp 99–104

  • Park S, Lee K, Song J (2011) Contrasting opposing views of news articles on contentious issues. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (HLT’11), pp 340–349

  • Parkhe V, Biswas B (2016) Sentiment analysis of movie reviews: finding most important movie aspects using driving factors. Soft Comput 20:3373–3379. doi:10.1007/s00500-015-1779-1

    Article  Google Scholar 

  • Perea-Ortega JM, Balahur A (2014) Experiments on feature replacements for polarity classification of Spanish tweets. In: Proceedings of TASS 2014: Workshop on Sentiment Analysis at SEPLN, pp 1–7

  • Pino C, Kavasidis I, Spampinato C (2016) GeoSentiment: a Tool for Analyzing Geographically Distributed Event-related Sentiments. 2016 In: Proceedings of 13th IEEE Annual Consumer Communications and Networking Conference (CCNC)

  • Piryani R, Madhavi D, Singh VK (2017) Analytical mapping of opinion mining and sentiment analysis research during 2000–2015. Inf Process Manag 53(1):122–150

    Article  Google Scholar 

  • Poria S, Gelbukh A, Hussain A, Howard N, Das D, Bandyopadhyay S (2013) Enhanced SenticNet with affective labels for concept-based opinion mining. IEEE Trans Intell Syst 2:31–38

    Google Scholar 

  • Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowl-Based Syst 89:14–46. doi:10.1016/j.knosys.2015.06.015

    Article  Google Scholar 

  • Reyes A, Rosso P, Buscaldi D (2012) From humor recognition to irony detection: the figurative language of social media. Data Knowl Eng 74:1–12

    Article  Google Scholar 

  • Román JV, Morera JG, Cámara EM, Zafra SMJ (2015) TASS 2014-the challenge of aspect-based sentiment analysis. Procesamiento de Lenguaje Nat 54:61–68

    Google Scholar 

  • Roncal ISV, Urizar XS (2014) Looking for features for supervised tweet polarity classification. In: Proceedings of TASS 2014: Workshop on Sentiment Analysis at SEPLN

  • Rosenthal S, Nakov P, Kiritchenko S, et al (2015) Semeval-2015 task 10: sentiment analysis in twitter. In: Proceedings of 9th International Workshop on Semantic Evaluation (SemEval’15), pp 451–463

  • Roul RK, Asthana SR, Kumar G (2016) Study on suitability and importance of multilayer extreme learning machine for classification of text data. Soft Comput 1–18 doi:10.1007/s00500-016-2189-8

    Article  Google Scholar 

  • Rushdi Saleh M, Martín-Valdivia MT, Montejo-Ráez A, Ureña-López LA (2011) Experiments with SVM to classify opinions in different domains. Expert Syst Appl 38(12):14799–14804. doi:10.1016/j.eswa.2011.05.070

    Article  Google Scholar 

  • Sarvabhotla K, Pingali P, Varma V (2011) Sentiment classification a lexical similarity based approach for extracting subjectivity in documents. Inf Retr 14(3):337–353

    Article  Google Scholar 

  • Saif H, Fernandez M, He Y, Alani H (2013) Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold. In: Proceedings of 1st International Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI (ESSEM’13), pp 9–21

  • Saif H, He Y, Alani H (2012) Semantic sentiment analysis of twitter. In: Proceedings of The 11th International Semantic Web Conference (ISWC’12), pp 508–524

    Chapter  Google Scholar 

  • Saif H, He Y, Fernandez M, Alani H (2014a) Adapting sentiment lexicons using contextual semantics for sentiment analysis of Twitter. In: Proceedings of European Semantic Web Conference (ESWC’14), pp 54–63

    Google Scholar 

  • Saif H, He Y, Fernandez M, Alani H (2014b) Semantic patterns for sentiment analysis of twitter. In: Proceedings of Proceedings of the 13th International Semantic Web Conference - Part II (ISWC’14), pp 324–340

    Google Scholar 

  • Savoy J (2012) Authorship attribution based on specific vocabulary. ACM Trans Inf Syst 30(2):1–30. doi:10.1145/2180868.2180874

    Article  Google Scholar 

  • Seki Y, Kando N, Aono M (2009) Multilingual opinion holder identification using author and authority viewpoints. Inf Process Manag 45(2):189–199. doi:10.1016/j.ipm.2008.11.004

    Article  Google Scholar 

  • Serrano-Guerrero J, Olivas JA, Romero FP, Herrera-Viedma E (2015) Sentiment analysis: a review and comparative analysis of web services. Inf Sci 311:18–38

    Article  Google Scholar 

  • Schouten K, Frasincar F (2016) Survey on aspect-level sentiment analysis. IEEE Trans Knowl Data Eng 28(3):813–830

    Article  Google Scholar 

  • Scholz T, Conrad S, Hillekamps L (2012) Opinion mining on a German corpus of a media response analysis. In: Proceedings of International Conference on Text, Speech and Dialogue, pp 39–46

    Chapter  Google Scholar 

  • Shalunts G, Backfried G, Prinz K (2014) Sentiment analysis of German social media data for natural disasters. In: Proceedings of 11th International conference on information systems for crisis response and management (ISCRAM’14), pp 752–756

  • Shammas DA, Kennedy L, Churchill EF (2009) Tweet the debates: understanding community annotation of uncollected sources. In: Proceedings of The first SIGMM workshop on Social media (WSM’09), pp 1–8

  • Spencer J, Uchyigit G (2012) Sentimentor: sentiment analysis of twitter data. In: Proceedings of The 1st International Workshop on Sentiment Discovery from Affective Data (SDAD’12), pp 56–66

  • Speriosu M, Sudan N, Upadhyay S, Baldridge J (2011) Twitter polarity classification with label propagation over lexical links and the follower graph. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP’11), pp 53–63

  • Taboada M (2016) Sentiment analysis: an overview from linguistics. Annu Rev Linguistics 2:325–347

    Article  Google Scholar 

  • Toprak C, Jakob N, Gurevych I (2010) Sentence and Expression Level Annotation of Opinions in User-Generated Discourse. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Vol. 1, pp 575–584. Retrieved from http://www.aclweb.org/anthology/P10-1059

  • Tsai ACR, Wu CE, Tsai RTH, Hsu JYJ (2013) Building a concept-level sentiment dictionary based on commonsense knowledge. IEEE Trans Intell Syst 2:22–30

    Google Scholar 

  • Tsakalidis A, Papadopoulos S, Kompatsiaris I (2014) An ensemble model for cross-domain polarity classification on Twitter. In: Conference on Web Information Systems Engineering-Part II (WISE’14), pp 168–177

    Chapter  Google Scholar 

  • Turney PD (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02), (July), pp 417–424. doi:10.3115/1073083.1073153

  • Vilares D, Alonso MA (2016) A review on political analysis and social media. Procesamiento de Lenguaje Nat 56:13–24

    Google Scholar 

  • Vilares D, Doval Y, Alonso MA, Gómez-Rodríguez C (2014) LyS at TASS 2014: a prototype for extracting and analysing aspects from Spanish tweets. In: Proceedings of TASS 2014: Workshop on Sentiment Analysis at SEPLN

  • Wang D, Zhu S, Li T (2013) SumView: a web-based engine for summarizing product reviews and customer opinions. Expert Syst Appl 40(1):27–33. doi:10.1016/j.eswa.2012.05.070

    Article  Google Scholar 

  • Wang W, Wang H, Song Y (2016) Ranking product aspects through sentiment analysis of online reviews. J Exp Theor Artif Intell, 1–20

  • Wiegand M, Klakow D (2012) Generalization Methods for In-Domain and Cross-Domain Opinion Holder Extraction. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (Eacl’12), pp. 325–335

  • Wilson T, Wiebe J, Hoffmann P (2009) Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis. Comput Linguist 35:399–433. doi:10.1162/coli.08-012-R1-06-90

    Article  Google Scholar 

  • Winkler S, Schaller S, Dorfer V et al (2015) Data-based prediction of sentiments using heterogeneous model ensembles. Soft Comput 19:3401–3412. doi:10.1007/s00500-014-1325-6

    Article  Google Scholar 

  • Xie S, Wang G, Lin S, Yu PS (2012) Review spam detection via temporal pattern discovery. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 823–831. doi:10.1145/2339530.2339662

  • Yu Y, Wang X (2015) World Cup 2014 in the Twitter world: a big data analysis of sentiments in U.S. sports fans’ tweets. Comput Human Behav 48:392–400. doi:10.1016/j.chb.2015.01.075

    Article  Google Scholar 

  • Yu LC, Wu JL, Chang PC, Chu HS (2013) Using a contextual entropy model to expand emotion words and their intensity for the sentiment classification of stock market news. Knowl-Based Syst 41(April):89–97. doi:10.1016/j.knosys.2013.01.001

    Article  Google Scholar 

Download references

Acknowledgements

This study was funded by Coordination of Improvement of Higher Education, CAPES-Brazil (Grant Number BEX 2230/15-1), the Andalusian Excellence Projects (Grant Number P10-SEJ-6768) and the Spanish National Project (Grant Number TIN2013-40658-P).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to F. Steiner-Correa.

Ethics declarations

Conflicts of interest

The author Steiner-Correa, A.F. declare that he has no conflict of interest. The author Viedma-del-Jesus, M.I. declare that she has no conflict of interest. The author López-Herrera, A.G. declare that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Steiner-Correa, F., Viedma-del-Jesus, M.I. & Lopez-Herrera, A.G. A survey of multilingual human-tagged short message datasets for sentiment analysis tasks. Soft Comput 22, 8227–8242 (2018). https://doi.org/10.1007/s00500-017-2766-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-017-2766-5

Keywords

Navigation