Twitter Sentiment Detection via Ensemble Classification Using Averaged Confidence Scores

  • Matthias Hagen
  • Martin Potthast
  • Michel Büchner
  • Benno Stein
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9022)


We reproduce three classification approaches with diverse feature sets for the task of classifying the sentiment expressed in a given tweet as either positive, neutral, or negative. The reproduced approaches are also combined in an ensemble, averaging the individual classifiers’ confidence scores for the three classes and deciding sentiment polarity based on these averages. Our experimental evaluation on SemEval data shows our re-implementations to slightly outperform their respective originals. Moreover, in the SemEval Twitter sentiment detection tasks of 2013 and 2014, the ensemble of reproduced approaches would have been ranked in the top-5 among 50 participants. An error analysis shows that the ensemble classifier makes few severe misclassifications, such as identifying a positive sentiment in a negative tweet or vice versa. Instead, it tends to misclassify tweets as neutral that are not, which can be viewed as the safest option.


Sentiment Analysis Ensemble Method Twitter Data Pointwise Mutual Information Emotion Lexicon 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Asker, L., Maclin, R.: Ensembles as a sequence of classifiers. In: Proc. of IJCAI, pp. 860–865 (1997)Google Scholar
  2. 2.
    Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In: Proc. of LREC (2010)Google Scholar
  3. 3.
    Balahur, A., Turchi, M.: Improving sentiment analysis in twitter using multilingual machine translated data. In: Proc. of RANLP 2013, pp. 49–55 (2013)Google Scholar
  4. 4.
    Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Proc. of COLING 2010, pp. 36–44 (2010)Google Scholar
  5. 5.
    Bifet, A., Frank, E.: Sentiment knowledge discovery in twitter streaming data. In: Proc. of DS 2010, pp. 1–15 (2010)Google Scholar
  6. 6.
    Bollen, J., Mao, H., Pepe, A.: Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In: Proc. of ICWSM 2011 (2011)Google Scholar
  7. 7.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)zbMATHMathSciNetGoogle Scholar
  8. 8.
    Brody, S., Diakopoulos, N.: Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using word lengthening to detect sentiment in microblogs. In: Proc. of EMNLP 2011, pp. 562–570 (2011)Google Scholar
  9. 9.
    Brown, P.F., Della Pietra, V.J., de Souza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Computational Linguistics 18(4), 467–479 (1992)Google Scholar
  10. 10.
    de Albornoz, J.C., Plaza, L., Gervás, P., Díaz, A.: A joint model of feature mining and sentiment analysis for product review rating. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 55–66. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  11. 11.
    Demartini, G.: ARES: A Retrieval Engine Based on Sentiments. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 772–775. Springer, Heidelberg (2011)Google Scholar
  12. 12.
    Diakopoulos, N., Shamma, D.A.: Characterizing debate performance via aggregated twitter sentiment. In: Proc. of CHI 2010, pp. 1195–1198 (2010)Google Scholar
  13. 13.
    Ermakov, S., Ermakova, L.: Sentiment classification based on phonetic characteristics. In: Proc. of ECIR 2013, pp. 706–709 (2013)Google Scholar
  14. 14.
    Feldman, R.: Techniques and applications for sentiment analysis. CACM 56(4), 82–89 (2013)CrossRefGoogle Scholar
  15. 15.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proc. of ICML 1996, pp. 148–156 (1996)Google Scholar
  16. 16.
    Fung, G.P.C., Yu, J.X., Wang, H., Cheung, D.W., Liu, H.: A balanced ensemble approach to weighting classifiers for text classification. In: Proc. of ICDM 2006, pp. 869–873 (2006)Google Scholar
  17. 17.
    Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. Project Report CS224N, Stanford University (2009)Google Scholar
  18. 18.
    Günther, T., Furrer, L.: GU-MLT-LT: Sentiment analysis of short messages using linguistic features and stochastic gradient descent. In: Proc. of SemEval 2013, pp. 328–332 (2013)Google Scholar
  19. 19.
    He, Y.: Latent sentiment model for weakly-supervised cross-lingual sentiment classification. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 214–225. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  20. 20.
    Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proc. of KDD 2004, pp. 168–177 (2004)Google Scholar
  21. 21.
    Jiang, L., Yu, M., Zhou, M., Liu, X., Zhao, T.: Target-dependent twitter sentiment classification. In: Proc. of HLT 2011, pp. 151–160 (2011)Google Scholar
  22. 22.
    Karlgren, J., Sahlgren, M., Olsson, F., Espinoza, F., Hamfors, O.: Usefulness of sentiment analysis. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 426–435. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  23. 23.
    Kouloumpis, E., Wilson, T., Moore, J.D.: Twitter sentiment analysis: The good the bad and the OMG! In: Proc. of ICWSM (2011)Google Scholar
  24. 24.
    Mohammad, S.M., Turney, P.D.: Emotions evoked by common words and phrases: Using mechanical turk to create an emotion lexicon. In: Proc. of HLT 2010 Workshop CAAGET 2010, pp. 26–34 (2010)Google Scholar
  25. 25.
    Mohammad, S.M., Turney, P.D.: Crowdsourcing a word-emotion association lexicon. Computational Intelligence 29(3), 436–465 (2013)CrossRefMathSciNetGoogle Scholar
  26. 26.
    Mohammad, S.M., Kiritchenko, S., Zhu, X.: NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. In: Proc. of SemEval 2013, pp. 321–327 (2013)Google Scholar
  27. 27.
    Moniz, A., de Jong, F.: Sentiment analysis and the impact of employee satisfaction on firm earnings. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 519–527. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  28. 28.
    Nakov, P., Kozareva, Z., Ritter, A., Rosenthal, S., Stoyanov, V., Wilson, T.: Semeval-2013 task 2: Sentiment analysis in Twitter. In: Proc. of SemEval 2013, pp. 312–320 (2013)Google Scholar
  29. 29.
    Nielsen, F.Å.: A new ANEW: evaluation of a word list for sentiment analysis in microblogs. In: Proc. of ESWC 2011 Workshop MSM 2011, pp. 93–98 (2011)Google Scholar
  30. 30.
    Opitz, D.W., Maclin, R.: Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research 11, 169–198 (1999)zbMATHGoogle Scholar
  31. 31.
    Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.A.: Improved part-of-speech tagging for online conversational text with word clusters. In: Proc. of HLT 2013, pp. 380–390 (2013)Google Scholar
  32. 32.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proc. of EMNLP 2002, pp. 79–86 (2002)Google Scholar
  33. 33.
    Polikar, R.: Ensemble based systems in decision making. IEEE CASS Mag 6(3), 21–45 (2006)Google Scholar
  34. 34.
    Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)CrossRefGoogle Scholar
  35. 35.
    Proisl, T., Greiner, P., Evert, S., Kabashi, B.: Klue: Simple and robust methods for polarity classification. In: Proc. of SemEval 2013, pp. 395–401 (2013)Google Scholar
  36. 36.
    Rokach, L.: Ensemble-based classifiers. Artificial Intelligence Review 33(1-2), 1–39 (2010)CrossRefGoogle Scholar
  37. 37.
    Rokach, L., Schclar, A., Itach, E.: Ensemble methods for multi-label classification. Expert Systems with Applications 41(16), 7507–7523 (2014)CrossRefGoogle Scholar
  38. 38.
    Rosenthal, S., Ritter, A., Nakov, P., Stoyanov, V.: Semeval-2014 task 9: Sentiment analysis in twitter. In: Proc. of SemEval 2014, pp. 73–80 (2014)Google Scholar
  39. 39.
    Schapire, R.E.: The strength of weak learnability. Machine Learning 5, 197–227 (1990)Google Scholar
  40. 40.
    Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Proc. of ACL 2014, pp. 1555–1565 (2014)Google Scholar
  41. 41.
    Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proc. of ACL 2002, pp. 417–424 (2002)Google Scholar
  42. 42.
    Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proc. of EMNLP 2005, pp. 347–354 (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Matthias Hagen
    • 1
  • Martin Potthast
    • 1
  • Michel Büchner
    • 1
  • Benno Stein
    • 1
  1. 1.Bauhaus-Universität WeimarGermany

Personalised recommendations