Skip to main content
Log in

Developing a successful SemEval task in sentiment analysis of Twitter and other social media texts

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

We present the development and evaluation of a semantic analysis task that lies at the intersection of two very trendy lines of research in contemporary computational linguistics: (1) sentiment analysis, and (2) natural language processing of social media text. The task was part of SemEval, the International Workshop on Semantic Evaluation, a semantic evaluation forum previously known as SensEval. The task ran in 2013 and 2014, attracting the highest number of participating teams at SemEval in both years, and there is an ongoing edition in 2015. The task included the creation of a large contextual and message-level polarity corpus consisting of tweets, SMS messages, LiveJournal messages, and a special test set of sarcastic tweets. The evaluation attracted 44 teams in 2013 and 46 in 2014, who used a variety of approaches. The best teams were able to outperform several baselines by sizable margins with improvement across the 2 years the task has been run. We hope that the long-lasting role of this task and the accompanying datasets will be to serve as a test bed for comparing different approaches, thus facilitating research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. Hashtags are a type of tagging for Twitter messages.

  2. We should note that the distinction between constrained and unconstrained systems is quite subtle. For example, the creation of a dedicated lexicon obtained from other annotated data could be regarded by someone as a form of supervision beyond the dataset provided in the task. A similar argument could be also made about various NLP tools for Twitter processing such as Noah’s ARK Tweet NLP, Alan Ritter’s twitter_nlp, or GATE’s TwitIE, which are commonly used for tweet tokenization, normalization, POS tagging (Gimpel et al. 2011), chunking, syntactic parsing (Kong et al. 2014), named entity recognition (Ritter et al. 2011), information extraction (Bontcheva et al. 2013), and event extraction (Ritter et al. 2012); all these tools are trained on additional tweets. Indeed, some participants in 2013 and 2014 did not understand well the constrained versus unconstrained distinction, and we had to check the system descriptions, and to reclassify some submissions as constrained/unconstrained. This was a hard and tedious job, and thus for the 2015 edition of the task, we did not make a distinction between constrained and unconstrained systems, letting the participants to use any additional data, resources and tools they wished to. In any case, our constrained/unconstrained definition for the 2013 and 2014 editions of the task are clear, and the system descriptions for the individual systems are also available. Thus, researchers are free to see the final system ranking any way they like, e.g., as two separate constrained versus unconstrained rankings or as one common ranking.

  3. Filtering based on an existing lexicon does bias the dataset to some degree; however, note that the text still contains sentiment expressions outside those in the lexicon.

  4. We pre-filtered the SMS messages and the sarcastic tweets with SentiWordNet, but we did not do it for LiveJournal sentences.

  5. http://wing.comp.nus.edu.sg/SMSCorpus/.

  6. The use of Amazon’s Mechanical Turk has been criticised from an ethical (e.g., human exploitation) and a legal (e.g., tax evasion, minimal legal wage in some countries, absence of a work contract) perspective; see Fort et al. (2011) for a broader discussion. We have tried our best to stay fair, adjusting the pay per HIT in such a way that the resulting hourly rate be on par with what is currently considered good pay on Mechanical Turk. Indeed, Turkers were eager to work on our HITs, and the annotations were completed quickly.

  7. Note that this discarding only happened if a single Turker had created contradictory annotations; it was not done at the adjudication stage.

  8. https://github.com/aritter/twitter_download.

  9. However, this did not have major impact on the results; see Sect. 6.3 for detail.

  10. In the ongoing third year of the task (SemEval-2015), there were submission by 41 teams: 11 teams participated in subtask A, 40 in subtask B (Rosenthal et al. 2015).

  11. Neural nets and deep learning were also used by top-performing teams in 2015, e.g., by UNITN (Severyn and Moschitti 2015) (University of Trento and Qatar Computing Research Institute).

  12. http://www.purl.com/net/lexicons.

  13. https://code.google.com/p/word2vec/.

  14. http://alt.qcri.org/semeval2014/task4.

  15. http://alt.qcri.org/semeval2015/task11.

  16. http://alt.qcri.org/semeval2015/task9.

  17. https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews.

  18. http://alt.qcri.org/semeval2015/task10/.

  19. http://alt.qcri.org/semeval2016/task4/.

  20. Note that a good classifier is not necessarily a good quantifier, and vice versa (Forman 2008). See Esuli and Sebastiani (2015) for pointers to literature on text quantification.

  21. Available at https://www.cs.york.ac.uk/semeval-2013/task2/, http://alt.qcri.org/semeval2014/task9/, and http://alt.qcri.org/semeval2015/task10/.

  22. http://creativecommons.org/licenses/by/3.0/.

References

  • Abdul-Mageed, M., Diab, M. T., & Korayem, M. (2011). Subjectivity and sentiment analysis of Modern Standard Arabic. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies: Short papers, ACL-HLT ’11 (Vol. 2, pp. 587–591). Portland, Oregon.

  • Baccianella, S., Esuli, A., & Sebastiani, F. (2010). SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the seventh international conference on language resources and evaluation, LREC ’10. Valletta, Malta.

  • Barbosa, L., & Feng, J. (2010). Robust sentiment detection on Twitter from biased and noisy data. In: Proceedings of the 23rd international conference on computational linguistics: Posters, COLING ’10, (pp. 36–44). Beijing, China.

  • Becker, L., Erhart, G., Skiba, D., & Matula, V. (2013). AVAYA: Sentiment analysis on Twitter with self-training and polarity lexicon expansion. In Proceedings of the second joint conference on lexical and computational semantics (*SEM), Vol. 2: Proceedings of the seventh international workshop on semantic evaluation, SemEval’ 13, (pp. 333–340). Atlanta, Georgia.

  • Bifet, A., Holmes, G., Pfahringer, B., & Gavaldà, R. (2011). Detecting sentiment change in Twitter streaming data. Journal of Machine Learning Research, Proceedings Track, 17, 5–11.

    Google Scholar 

  • Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M., Maynard, D., & Aswani, N. (2013). TwitIE: An open-source information extraction pipeline for microblog text. In Proceedings of the international conference recent advances in natural language processing, RANLP ’13, (pp. 83–90). Hissar, Bulgaria.

  • Borgholt, L., Simonsen, P., & Hovy, D. (2015). The rating game: Sentiment rating reproducibility from text. In Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP ’15, (pp. 2527–2532). Lisbon, Portugal.

  • Brown, P. F., Desouza, P. V., Mercer, R. L., Pietra, V. J. D., & Lai, J. C. (1992). Class-based n-gram models of natural language. Computational Linguistics, 18(4), 467–479.

    Google Scholar 

  • Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27.

    Article  Google Scholar 

  • Chen, T., & Kan, M. Y. (2013). Creating a live, public short message service corpus: The NUS SMS corpus. Language Resources and Evaluation, 47(2), 299–335.

    Google Scholar 

  • Chetviorkin, I., & Loukachevitch, N. (2013). Evaluating sentiment analysis systems in Russian. In Proceedings of the 4th biennial international workshop on Balto-Slavic natural language processing, (pp. 12–17). Sofia, Bulgaria.

  • Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537.

    Google Scholar 

  • Das, S. R., & Chen, M. Y. (2007). Yahoo! for Amazon: Sentiment extraction from small talk on the web. Management Science, 53(9), 1375–1388.

    Article  Google Scholar 

  • Davidov, D., Tsur, O., & Rappoport, A. (2010). Semi-supervised recognition of sarcasm in Twitter and Amazon. In Proceedings of the fourteenth conference on computational natural language learning, CoNLL ’10, (pp. 107–116). Uppsala, Sweden.

  • dos Santos, C. (2014). Think Positive: Towards Twitter sentiment analysis from scratch. In Proceedings of the 8th international workshop on semantic evaluation, SemEval ’14, (pp. 647–651). Dublin, Ireland.

  • Esuli, A., & Sebastiani, F. (2010). Sentiment quantification. IEEE Intelligent Systems, 25, 72–75.

    Article  Google Scholar 

  • Esuli, A., & Sebastiani, F. (2015). Optimizing text quantifiers for multivariate loss functions. ACM Transactions on Knowledge Discovery and Data, 9(4), 27:1–27:27.

    Google Scholar 

  • Evert, S., Proisl, T., Greiner, P., & Kabashi, B. (2014). SentiKLUE: Updating a polarity classifier in 48 hours. In Proceedings of the 8th international workshop on semantic evaluation, SemEval ’14, (pp. 551–555). Dublin, Ireland.

  • Forman, G. (2008). Quantifying counts and costs via classification. Data Mining and Knowledge Discovery, 17(2), 164–206.

    Article  Google Scholar 

  • Fort, K., Adda, G., & Cohen, K. B. (2011). Amazon Mechanical Turk: Gold mine or coal mine? Computational Linguistics, 37(2), 413–420.

    Article  Google Scholar 

  • Ghosh, A., Li, G., Veale, T., Rosso, P., Shutova, E., Barnden, J., & Reyes, A. (2015). SemEval-2015 task 11: Sentiment analysis of figurative language in Twitter. In Proceedings of the 9th international workshop on semantic evaluation, SemEval ’15, (pp. 470–478). Denver, Colorado.

  • Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., & Smith, N. A. (2011). Part-of-speech tagging for Twitter: Annotation, features, and experiments. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, ACL-HLT ’11, (pp. 42–47). Portland, Oregon.

  • Go, A., Bhayani, R., & Huang, L. (2009) Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1, 12.

  • Günther, T., & Furrer, L. (2013). GU-MLT-LT: Sentiment analysis of short messages using linguistic features and stochastic gradient descent. In Proceedings of the second joint conference on lexical and computational semantics (*SEM), Vol. 2: Proceedings of the seventh international workshop on semantic evaluation, SemEval ’13, (pp. 328–332). Atlanta, Georgia.

  • Günther, T., Vancoppenolle, J., & Johansson, R. (2014). RTRGO: Enhancing the GU-MLT-LT system for sentiment analysis of short messages. In Proceedings of the 8th international workshop on semantic evaluation, SemEval ’14, (pp. 497–502). Dublin, Ireland.

  • Hovy, D., Berg-Kirkpatrick, T., Vaswani, A., & Hovy, E. (2013). Learning whom to trust with MACE. In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies, NAACL-HLT ’13, (pp. 1120–1130). Atlanta, Georgia.

  • Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’04, (pp. 168–177). Seattle, Washington.

  • Huberman, B. A., Romero, D. M., & Wu, F. (2008). Social networks that matter: Twitter under the microscope. CoRR abs/0812.1045.

  • Jansen, B., Zhang, M., Sobel, K., & Chowdury, A. (2009). Twitter power: Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology, 60(11), 2169–2188.

    Article  Google Scholar 

  • Java, A., Song, X., Finin, T., & Tseng, B. (2007). Why we twitter: Understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on web mining and social network analysis, (pp. 56–65).

  • Jovanoski, D., Pachovski, V., & Nakov, P. (2015). Sentiment analysis in Twitter for Macedonian. In Proceedings of the international conference on recent advances in natural language processing, RANLP ’15, (pp. 249–257). Hissar, Bulgaria.

  • Kapukaranov, B., & Nakov, P. (2015). Fine-grained sentiment analysis for movie reviews in Bulgarian. In Proceedings of the international conference on recent advances in natural language processing, RANLP ’15, (pp. 266–274). Hissar, Bulgaria.

  • Kiritchenko, S., Zhu, X., Cherry, C., & Mohammad, S. M. (2014). NRC-Canada-2014: Detecting aspects and sentiment in customer reviews. In Proceedings of the international workshop on semantic evaluation, SemEval ’14, (pp. 437–442). Dublin, Ireland.

  • Kiritchenko, S., Zhu, X., & Mohammad, S. M. (2014). Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research, 50, 723–762.

    Google Scholar 

  • Kökciyan, N., Çelebi, A., Özgür, A., & Üsküdarl, S. (2013). BOUNCE: Sentiment classification in Twitter using rich feature sets. In Proceedings of the second joint conference on lexical and computational semantics (*SEM), Vol. 2: Proceedings of the seventh international workshop on semantic evaluation, SemEval ’13, (pp. 554–561). Atlanta, Georgia.

  • Kong, L., Schneider, N., Swayamdipta, S., Bhatia, A., Dyer, C., & Smith, A. N. (2014). A dependency parser for tweets. In Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP ’14, (pp. 1001–1012). Doha, Qatar.

  • Kouloumpis, E., Wilson, T., & Moore, J. (2011). Twitter sentiment analysis: The good the bad and the OMG! In Proceedings of the fifth international conference on weblogs and social media, ICWSM ’11, (pp. 538–541). Barcelona, Catalonia, Spain.

  • Kwak, H., Lee, C., Park, H., & Moon, S. (2010). What is Twitter, a social network or a news media? In Proceedings of the 19th international conference on world wide web, WWW ’10, (pp. 591–600). Raleigh, North Carolina.

  • Lewinsohn, J., & Amenson, C. (1978). Some relations between pleasant and unpleasant events and depression. Journal of Abnormal Psychology, 87(6), 644–654.

    Article  Google Scholar 

  • Liu, B., & Zhang, L. (2012). A survey of opinion mining and sentiment analysis. In C. C. Aggarwal & C. Zhai (Eds.), Mining text data (pp. 415–463). US: Springer.

    Chapter  Google Scholar 

  • MacPhillamy, D., & Lewinsohn, P. M. (1982). The pleasant event schedule: Studies on reliability, validity, and scale intercorrelation. Journal of Counseling and Clinical Psychology, 50(3), 363–380.

    Article  Google Scholar 

  • Manandhar, S., & Yuret, D. (eds.) (2013). Proceedings of the second joint conference on lexical and computational semantics (*SEM), Vol. 2: Proceedings of the seventh international workshop on semantic evaluation. SemEval ’13. Association for Computational Linguistics, Atlanta, Georgia.

  • Marchand, M., Ginsca, A., Besançon, R., & Mesnard, O. (2013). LVIC-LIMSI: Using syntactic features and multi-polarity words for sentiment analysis in Twitter. In Proceedings of the second joint conference on lexical and computational semantics (*SEM), Vol. 2: Proceedings of the seventh international workshop on semantic evaluation, SemEval ’13, (pp. 418–424). Atlanta, Georgia.

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of a workshop at ICLR.

  • Miura, Y., Sakaki, S., Hattori, K., & Ohkuma, T. (2014). TeamX: A sentiment analyzer with enhanced lexicon mapping and weighting scheme for unbalanced data. In Proceedings of the 8th international workshop on semantic evaluation, SemEval ’14, (pp. 628–632). Dublin, Ireland.

  • Mohammad, S. (2012). #Emotional tweets. In Proceedings of *SEM 2012: The first joint conference on lexical and computational semantics: Vol. 1: Proceedings of the main conference and the shared task, *SEM ’12, (pp. 246–255). Montreal, Canada.

  • Mohammad, S., Kiritchenko, S., & Zhu, X. (2013). NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. In Proceedings of the second joint conference on lexical and computational semantics (*SEM), Vol. 2: Proceedings of the seventh international workshop on semantic evaluation, SemEval ’13, (pp. 321–327). Atlanta, Georgia.

  • Mohammad, S. M., & Turney, P. D. (2010). Emotions evoked by common words and phrases: Using Mechanical Turk to create an emotion lexicon. In Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text, CAAGET ’10, (pp. 26–34). Los Angeles, California.

  • Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word-emotion association lexicon. Computational Intelligence, 29(3), 436–465.

    Article  Google Scholar 

  • Nakov, P., Rosenthal, S., Kozareva, Z., Stoyanov, V., Ritter, A., & Wilson, T. (2013). SemEval-2013 task 2: Sentiment analysis in Twitter. In Proceedings of the second joint conference on lexical and computational semantics (*SEM), Vol. 2: Proceedings of the seventh international workshop on semantic evaluation, SemEval ’13, (pp. 312–320). Atlanta, Georgia.

  • Nakov, P., & Zesch, T. (eds.) (2014). Proceedings of the 8th international workshop on semantic evaluation. SemEval ’14. Association for Computational Linguistics and Dublin City University, Dublin, Ireland.

  • Nakov, P., Zesch, T., Cer, D., & Jurgens, D. (eds.) (2015). Proceedings of the 9th international workshop on semantic evaluation. SemEval ’15. Association for Computational Linguistics, Denver, Colorado.

  • O’Connor, B., Balasubramanyan, R., Routledge, B., & Smith, N. (2010). From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the fourth international conference on weblogs and social media, ICWSM ’10, (pp. 122–129). Washington, DC.

  • Pak, A., & Paroubek, P. (2010). Twitter based system: Using Twitter for disambiguating sentiment ambiguous adjectives. In Proceedings of the 5th international workshop on semantic evaluation, SemEval ’10, (pp. 436–439). Uppsala, Sweden.

  • Pang, B., & Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the annual meeting of the association for computational linguistics, ACL ’05, (pp. 115–124). Ann Arbor, Michigan.

  • Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135.

    Article  Google Scholar 

  • Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the conference on empirical methods in natural language processing, EMNLP ’02, (pp. 79–86). Philadelphia, Pennsylvania.

  • Perez-Rosas, V., Banea, C., & Mihalcea, R. (2012). Learning sentiment lexicons in Spanish. In Proceedings of the eight international conference on language resources and evaluation, LREC ’12. Istanbul, Turkey.

  • Pontiki, M., Galanis, D., Papageorgiou, H., Manandhar, S., & Androutsopoulos, I. (2015). SemEval-2015 task 12: Aspect based sentiment analysis. In Proceedings of the 9th international workshop on semantic evaluation, SemEval ’15, (pp. 486–495). Denver, Colorado.

  • Pontiki, M., Papageorgiou, H., Galanis, D., Androutsopoulos, I., Pavlopoulos, J., & Manandhar, S. (2014). SemEval-2014 task 4: Aspect based sentiment analysis. In Proceedings of the 8th international workshop on semantic evaluation, SemEval ’14, (pp. 27–35). Dublin, Ireland.

  • Raychev, V., & Nakov, P. (2009). Language-independent sentiment analysis using subjectivity and positional information. In Proceedings of the international conference on recent advances in natural language processing, RANLP ’09, (pp. 360–364). Borovets, Bulgaria.

  • Reckman, H., Baird, C., Crawford, J., Crowell, R., Micciulla, L., Sethi, S., & Veress, F. (2013). teragram: Rule-based detection of sentiment phrases using SAS sentiment analysis. In Proceedings of the second joint conference on lexical and computational semantics (*SEM), Vol. 2: Proceedings of the seventh international workshop on semantic evaluation, SemEval ’13, (pp. 513–519). Atlanta, Georgia.

  • Ritter, A., Clark, S., Mausam, & Etzioni, O. (2011). Named entity recognition in tweets: An experimental study. In Proceedings of the conference on empirical methods in natural language processing, EMNLP ’11, (pp. 1524–1534). Edinburgh, Scotland, UK.

  • Ritter, A., Etzioni, O., & Clark, S., et al. (2012). Open domain event extraction from Twitter. In Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12, (pp. 1104–1112). Beijing, China.

  • Rosenthal, S., & McKeown, K. (2012). Detecting opinionated claims in online discussions. In Proceedings of the 2012 IEEE sixth international conference on semantic computing, ICSC ’12, (pp. 30–37). Washington, DC.

  • Rosenthal, S., Nakov, P., Kiritchenko, S., Mohammad, S., Ritter, A., & Stoyanov, V. (2015). SemEval-2015 task 10: Sentiment analysis in Twitter. In Proceedings of the 9th international workshop on semantic evaluation, SemEval ’15, (pp. 450–462). Denver, Colorado.

  • Rosenthal, S., Ritter, A., Nakov, P., & Stoyanov, V. (2014). SemEval-2014 Task 9: Sentiment analysis in Twitter. In Proceedings of the 8th international workshop on semantic evaluation, SemEval ’14, (pp. 73–80). Dublin, Ireland.

  • Russo, I., Caselli, T., & Strapparava, C. (2015). SemEval-2015 task 9: CLIPEval implicit polarity of events. In Proceedings of the 9th international workshop on semantic evaluation, SemEval ’15, (pp. 442–449). Denver, Colorado.

  • Severyn, A., & Moschitti, A. (2015). UNITN: Training deep convolutional neural network for Twitter sentiment classification. In Proceedings of the 9th international workshop on semantic evaluation, SemEval ’15, (pp. 464–469). Denver, Colorado.

  • Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing, EMNLP ’13, (pp. 1631–1642). Seattle, Washington.

  • Stoyanov, V., & Cardie, C. (2008). Topic identification for fine-grained opinion analysis. In Proceedings of the 22nd international conference on computational linguistics, COLING ’08, (pp. 817–824). Manchester, United Kingdom.

  • Strapparava, C., & Mihalcea, R. (2007). SemEval-2007 task 14: Affective text. In Proceedings of the international workshop on semantic evaluation, SemEval ’07, (pp. 70–74). Prague, Czech Republic.

  • Tan, S., & Zhang, J. (2008). An empirical study of sentiment analysis for Chinese documents. Expert Systems with Applications, 34(4), 2622–2629.

    Article  Google Scholar 

  • Tang, D., Wei, F., Qin, B., Liu, T., & Zhou, M. (2014). Coooolll: A deep learning system for Twitter sentiment classification. In Proceedings of the 8th international workshop on semantic evaluation, SemEval ’14, (pp. 208–212). Dublin, Ireland.

  • Tiantian, Z., Fangxi, Z., & Lan, M. (2013). ECNUCS: A surface information based system description of sentiment analysis in Twitter in the SemEval-2013 (task 2). In Proceedings of the second joint conference on lexical and computational semantics (*SEM), Vol. 2: Proceedings of the seventh international workshop on semantic evaluation, SemEval ’13, (pp. 408–413). Atlanta, Georgia.

  • Tumasjan, A., Sprenger, T., Sandner, P., & Welpe, I. (2010). Predicting elections with Twitter: What 140 characters reveal about political sentiment. In Proceedings of the fourth international conference on weblogs and social media, ICWSM ’10, (pp. 178–185). Washington, DC.

  • Turney, P. D. (2002). Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the annual meeting of the association for computational linguistics, ACL ’02, (pp. 417–424). Philadelphia, Pennsylvania.

  • Villena-Román, J., Lana-Serrano, S., Martínez-Cámara, E., & Cristóbal, J. C. G. (2013). TASS—Workshop on sentiment analysis at SEPLN. Procesamiento del Lenguaje Natural, 50, 37–44.

    Google Scholar 

  • Wiebe, J., Wilson, T., Bruce, R., Bell, M., & Martin, M. (2004). Learning subjective language. Computational Linguistics, 30(3), 277–308.

    Article  Google Scholar 

  • Wiebe, J., Wilson, T., & Cardie, C. (2005). Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39(2–3), 165–210.

    Article  Google Scholar 

  • Wilson, T., Wiebe, J., & Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on human language technology and empirical methods in natural language processing, HLT-EMNLP ’05, (pp. 347–354). Vancouver, British Columbia, Canada.

  • Zhao, J., Lan, M., & Zhu, T. (2014). ECNU: Expression- and message-level sentiment orientation classification in Twitter using multiple effective features. In Proceedings of the 8th international workshop on semantic evaluation, SemEval ’14, (pp. 259–264). Dublin, Ireland.

  • Zhu, X., Guo, H., Mohammad, S. M., & Kiritchenko, S. (2014). An empirical study on the effect of negation words on sentiment. In Proceedings of the annual meeting of the association for computational linguistics, ACL ’14, (pp. 304–313). Baltimore, Maryland.

  • Zhu, X., Kiritchenko, S., & Mohammad, S. (2014). NRC-Canada-2014: Recent improvements in the sentiment analysis of tweets. In Proceedings of the 8th international workshop on semantic evaluation, SemEval ’14, (pp. 443–447). Dublin, Ireland.

  • Zhu, X., Kiritchenko, S., & Mohammad, S. M. (2014). NRC-Canada-2014: Detecting aspects and sentiment in customer reviews. In Proceedings of the international workshop on semantic evaluation, SemEval ’14, (pp. 437–442). Dublin, Ireland.

Download references

Acknowledgments

We would like to thank Theresa Wilson, who was coorganizer of SemEval-2013 Task 2 and has contributed tremendously to the data collection and to the overall organization of the task. We would also like to thank Kathleen McKeown for her insight in creating the Amazon Mechanical Turk annotation task. For the 2013 Amazon Mechanical Turk annotations, we received funding by the JHU Human Language Technology Center of Excellence and the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), through the U.S. Army Research Lab. All statements of fact, opinion or conclusions contained herein are those of the authors and should not be construed as representing the official views or policies of IARPA, the ODNI or the U.S. Government. The 2014 Amazon Mechanical Turk annotations were funded by Kathleen McKeown and Smaranda Muresan. The 2015 Amazon Mechanical Turk annotations were partially funded by SIGLEX.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Preslav Nakov.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nakov, P., Rosenthal, S., Kiritchenko, S. et al. Developing a successful SemEval task in sentiment analysis of Twitter and other social media texts. Lang Resources & Evaluation 50, 35–65 (2016). https://doi.org/10.1007/s10579-015-9328-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-015-9328-1

Keywords

Navigation