From classification to quantification in tweet sentiment analysis

  • Wei Gao
  • Fabrizio SebastianiEmail author
Original Article


Sentiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper, we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. “prevalence”) of the different classes in the dataset. The latter task is called quantification, and recent research has convincingly shown that it should be tackled as a task of its own, using learning algorithms and evaluation measures different from those used for classification. In this paper, we show (by carrying out experiments using two learners, seven quantification-specific algorithms, and 11 TSC datasets) that using quantification-specific algorithms produces substantially better class frequency estimates than a state-of-the-art classification-oriented algorithm routinely used in TSC. We thus argue that researchers interested in tweet sentiment prevalence should switch to quantification-specific (instead of classification-specific) learning algorithms and evaluation measures.


Sentiment Analysis Word Sense Disambiguation Sentiment Classification Sentiment Lexicon Class Prevalence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We are grateful to Chih-Chung Chang and Chih-Jen Lin for making LIBSVM available, to Rong-En Fan and colleagues for making LIBLINEAR available, to Thorsten Joachims for making SVM-perf available, to Andrea Esuli for making available the code for obtaining SVM(KLD) from SVM-perf, to José Barranquero for making available the code for obtaining SVM(Q) from SVM-perf, to Shuai Li for pointing out a small mistake in a previous version, and to Carlos Castillo for several pointers to the literature.


  1. Alaíz-Rodríguez R, Guerrero-Curieses A, Cid-Sueiro J (2011) Class and subclass probability re-estimation to adapt a classifier in the presence of concept drift. Neurocomputing 74(16):2614–2623CrossRefGoogle Scholar
  2. Asur S, Huberman BA (2010) Predicting the future with social media. In: Proceedings of the 10th IEEE/WIC/ACM international conference on web intelligence (WI 2010), pp 492–499, Toronto, CAGoogle Scholar
  3. Balikas G, Partalas I, Gaussier E, Babbar R, Amini M-R (2015) Efficient model selection for regularized classification by exploiting unlabeled data. In: Proceedings of the 14th international symposium on intelligent data analysis (IDA 2015), pp 25–36, Saint Etienne, FRGoogle Scholar
  4. Barranquero J, González P, Díez J, del Coz JJ (2013) On the study of nearest neighbor algorithms for prevalence estimation in binary problems. Pattern Recognit 46(2):472–482CrossRefzbMATHGoogle Scholar
  5. Barranquero J, Díez J, del Coz JJ (2015) Quantification-oriented learning based on reliable classifiers. Pattern Recognit 48(2):591–604CrossRefGoogle Scholar
  6. Beijbom O, Hoffman J, Yao E, Darrell T, Rodriguez-Ramirez A, Gonzalez-Rivero M, Hoegh-Guldberg O (2015) Quantification in-the-wild: Data-sets and baselines. Presented at the NIPS 2015 Workshop on Transfer and Multi-Task Learning. Montreal, CAGoogle Scholar
  7. Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2010) Quantification via probability estimators. In: Proceedings of the 11th IEEE international conference on data mining (ICDM 2010), pp 737–742, Sydney, AUGoogle Scholar
  8. Berardi G, Esuli A, Sebastiani F (2015) Utility-theoretic ranking for semi-automated text classification. ACM Trans Knowl Discov Data 10(1). Article 6Google Scholar
  9. Bollen J, Mao H, Zeng X-J (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8CrossRefGoogle Scholar
  10. Borge-Holthoefer J, Magdy W, Darwish K, Weber I (2015) Content and network dynamics behind Egyptian political polarization on Twitter. In: Proceedings of the 18th ACM conference on computer supported cooperative work and social computing (CSCW 2015), pp 700–711, Vancouver, CAGoogle Scholar
  11. Burton S, Soboleva A (2011) Interactive or reactive? Marketing with Twitter. J Consumer Mark 28(7):491–499CrossRefGoogle Scholar
  12. Chan YS, Ng HT (2006) Estimating class priors in domain adaptation for word sense disambiguation. In: Proceedings of the 44th annual meeting of the Association for Computational Linguistics (ACL 2006), pp 89–96, Sydney, AUGoogle Scholar
  13. Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3). Article 27Google Scholar
  14. Conroy BR, Sajda P (212) Fast, exact model selection and permutation testing for L2-regularized logistic regression. In: Proceedings of the 15th international conference on artificial intelligence and statistics (AISTATS 2012), pp 246–254, La Palma, ESGoogle Scholar
  15. Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New YorkCrossRefzbMATHGoogle Scholar
  16. Csiszár I, Shields PC (2004) Information theory and statistics: a tutorial. Found Trends Commun Inf Theory 1(4):417–528CrossRefzbMATHGoogle Scholar
  17. Da San Martino G, Gao W, Sebastiani F (2016) QCRI at SemEval-2016 Task 4: probabilistic methods for binary and ordinal quantification. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval 2016), San Diego, US (Forthcoming)Google Scholar
  18. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39(1):1–38MathSciNetzbMATHGoogle Scholar
  19. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetzbMATHGoogle Scholar
  20. Dodds PS, Harris KD, Kloumann IM, Bliss CA, Danforth CM (2011) Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter. PLoS One 6(12):e26752CrossRefGoogle Scholar
  21. Esuli A (2016) ISTI-CNR at SemEval-2016 Task 4: quantification on an ordinal scale. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval 2016), San Diego, USGoogle Scholar
  22. Esuli A, Sebastiani F (2010) Sentiment quantification. IEEE Intell Syst 25(4):72–75CrossRefGoogle Scholar
  23. Esuli A, Sebastiani F (2014) Explicit loss minimization in quantification applications (preliminary draft). In: Presented at the 8th international workshop on information filtering and retrieval (DART 2014), Pisa, ITGoogle Scholar
  24. Esuli A, Sebastiani F (2015) Optimizing text quantifiers for multivariate loss functions. ACM Trans Knowl Discov Data 9(4). Article 27Google Scholar
  25. Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874zbMATHGoogle Scholar
  26. Forman G (2005) Counting positives accurately despite inaccurate classification. In: Proceedings of the 16th European Conference on machine learning (ECML 2005), pp 564–575, Porto, PTGoogle Scholar
  27. Forman G (2008) Quantifying counts and costs via classification. Data Min Knowl Discov 17(2):164–206MathSciNetCrossRefGoogle Scholar
  28. Gao W, Sebastiani F (2015) Tweet sentiment: from classification to quantification. In: Proceedings of the 7th international conference on advances in social network analysis and mining (ASONAM 2015), pp 97–104, Paris, FRGoogle Scholar
  29. González-Castro V, Alaiz-Rodríguez R, Alegre E (2013) Class distribution estimation based on the Hellinger distance. Inf Sci 218:146–164CrossRefGoogle Scholar
  30. Herfort B, Schelhorn S-J, de Albuquerque JP, Zipf A (2014) Does the spatiotemporal distribution of tweets match the spatiotemporal distribution of flood phenomena? A study about the river Elbe flood in June 2013. In: Proceedings of the 11th international conference on information systems for crisis response and management (ISCRAM 2014), pp 747–751, Philadelphia, USGoogle Scholar
  31. Hopkins DJ, King G (2010) A method of automated nonparametric content analysis for social science. Am J Political Sci 54(1):229–247CrossRefGoogle Scholar
  32. Joachims T (2005) A support vector method for multivariate performance measures. In: Proceedings of the 22nd international conference on machine learning (ICML 2005), pp 377–384, Bonn, DEGoogle Scholar
  33. Joachims T, Hofmann T, Yue Y, Yu C-N (2009) Predicting structured objects with support vector machines. Commun ACM 52(11):97–104CrossRefGoogle Scholar
  34. Kaya M, Fidan G, Toroslu IH (2013) Transfer learning using Twitter data for improving sentiment classification of Turkish political news. In: Proceedings of the 28th international symposium on computer and information sciences (ISCIS 2013), pp 139–148, Paris, FRGoogle Scholar
  35. King G, Lu Y (2008) Verbal autopsy methods with multiple causes of death. Stat Sci 23(1):78–91MathSciNetCrossRefzbMATHGoogle Scholar
  36. Kiritchenko S, Zhu X, Mohammad SM (2014) Sentiment analysis of short informal texts. J Artif Intell Res 50:723–762zbMATHGoogle Scholar
  37. Latinne P, Saerens M, Decaestecker C (2001) Adjusting the outputs of a classifier to new a priori probabilities may significantly improve classification accuracy: evidence from a multi-class problem in remote sensing. In: Proceedings of the 18th international conference on machine learning (ICML 2001), pp 298–305Google Scholar
  38. Lewis DD (1995) Evaluating and optimizing autonomous text classification systems. In: Proceedings of the 18th ACM international conference on research and development in information retrieval (SIGIR 1995), pp 246–254, Seattle, USGoogle Scholar
  39. Limsetto N, Waiyamai K (2011) Handling concept drift via ensemble and class distribution estimation technique. In: Proceedings of the 7th international conference on advanced data mining (ADMA 2011), pp 13–26, Beijing, CNGoogle Scholar
  40. Marchetti-Bowick M, Chambers N (2012) Learning for microblogs with distant supervision: political forecasting with Twitter. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pp 603–612, Avignon, FRGoogle Scholar
  41. Martínez-Cámara E, Martín-Valdivia MT, López LAU, Ráez AM (2014) Sentiment analysis in Twitter. Nat Lang Eng 20(1):1–28CrossRefGoogle Scholar
  42. Mejova Y, Weber I, Macy MW (eds) (2015) Twitter: a digital socioscope. Cambridge University Press, CambridgeGoogle Scholar
  43. Milli L, Monreale A, Rossetti G, Giannotti F, Pedreschi D, Sebastiani F (2013) Quantification trees. In: Proceedings of the 13th IEEE international conference on data mining (ICDM 2013), pp 528–536, Dallas, USGoogle Scholar
  44. Mohammad SM, Kiritchenko S, Zhu X (2013) NRC-Canada: building the state-of-the-art in sentiment analysis of tweets. In: Proceedings of the 7th international workshop on semantic evaluation (SemEval 2013), pp 321–327, Atlanta, USGoogle Scholar
  45. Murphy KP (2012) Machine learning. A probabilistic perspective. The MIT Press, CambridgezbMATHGoogle Scholar
  46. Nakov P, Rosenthal S, Kozareva Z, Stoyanov V, Ritter A, Wilson T (2013) SemEval-2013 Task 2: sentiment analysis in Twitter. In: Proceedings of the 7th international workshop on semantic evaluation (SemEval 2013), pp 312–320, Atlanta, USGoogle Scholar
  47. Nakov P, Ritter A, Rosenthal S, Sebastiani F, Stoyanov V (2016) SemEval-2016 Task 4: sentiment analysis in Twitter. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval 2016), San Diego, US (forthcoming)Google Scholar
  48. Narasimhan H, Li S, Kar P, Chawla S, Sebastiani F (2016) Stochastic optimization techniques for quantification performance measures. Submitted for publicationGoogle Scholar
  49. O’Connor B, Balasubramanyan R, Routledge BR, Smith NA (2010) From tweets to polls: linking text sentiment to public opinion time series. In: Proceedings of the 4th AAAI Conference on Weblogs and Social Media (ICWSM 2010), Washington, USGoogle Scholar
  50. Olteanu A, Vieweg S, Castillo C (2015) What to expect when the unexpected happens: social media communications across crises. In: Proceedings of the 18th ACM conference on computer supported cooperative work and social computing (CSCW 2015), pp 994–1009, Vancouver, CAGoogle Scholar
  51. Pan W, Zhong E, Yang Q (2012) Transfer learning for text mining. In: Aggarwal CC, Zhai CX (eds) Mining text data. Springer, Heidelberg, pp 223–258CrossRefGoogle Scholar
  52. Qureshi MA, O’Riordan C, Pasi G (2013) Clustering with error estimation for monitoring reputation of companies on Twitter. In: Proceedings of the 9th Asia Information Retrieval Societies Conference (AIRS 2013), pp 170–180. Singapore, SNGoogle Scholar
  53. Rosenthal S, Nakov P, Kiritchenko S, Mohammad S, Ritter A, Stoyanov V (2015) SemEval-2015 Task 10: sentiment analysis in Twitter. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015), pp 451–463, Denver, USGoogle Scholar
  54. Rosenthal S, Ritter A, Nakov P, Stoyanov V (2014) SemEval-2014 Task 9: sentiment analysis in Twitter. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pp 73–80, Dublin, IEGoogle Scholar
  55. Saerens M, Latinne P, Decaestecker C (2002) Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure. Neural Comput 14(1):21–41CrossRefzbMATHGoogle Scholar
  56. Saif H, Fernez M, He Y, Alani H (2013) Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold. In: Proceedings of the 1st international workshop on emotion and sentiment in social and expressive media (ESSEM 2013), pp 9–21, Torino, ITGoogle Scholar
  57. Sánchez L, González V, Alegre E, Alaiz R (2008) Classification and quantification based on image analysis for sperm samples with uncertain damaged/intact cell proportions. In: Proceedings of the 5th international conference on image analysis and recognition (ICIAR 2008), pp 827–836, Póvoa de Varzim, PTGoogle Scholar
  58. Takahashi T, Abe S, Igata N (2011) Can Twitter be an alternative of real-world sensors? In: Proceedings of the 14th international conference on human–computer interaction (HCI International 2011), pp 240–249, Orlando, USGoogle Scholar
  59. Tang L, Gao H, Liu H (2010) Network quantification despite biased labels. In: Proceedings of the 8th workshop on mining and learning with graphs (MLG 2010), pp 147–154, Washington, USGoogle Scholar
  60. Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. J Mach Learn Res 6:1453–1484MathSciNetzbMATHGoogle Scholar
  61. Vapnik V (1998) Statistical learning theory. Wiley, New YorkzbMATHGoogle Scholar
  62. Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83CrossRefGoogle Scholar
  63. Wu T-F, Lin C-J, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5:975–1005MathSciNetzbMATHGoogle Scholar
  64. Xue JC, Weiss GM (2009) Quantification and semi-supervised classification methods for handling changes in class distribution. In: Proceedings of the 15th ACM international conference on knowledge discovery and data mining (SIGKDD 2009), pp 897–906, Paris, FRGoogle Scholar
  65. Zhang Z, Zhou J (2010) Transfer estimation of evolving class priors in data stream classification. Pattern Recognit 43(9):3151–3161CrossRefzbMATHGoogle Scholar
  66. Zhu X, Kiritchenko S, Mohammad SM (2014) NRC-Canada-2014: recent improvements in the sentiment analysis of tweets. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pp 443–447, Dublin, IEGoogle Scholar
  67. Zou F, Wang Y, Yang Y, Zhou K, Chen Y, Song J (2015) Supervised feature learning via L2-norm regularized logistic regression for 3D object recognition. Neurocomputing 151:603–611CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Wien 2016

Authors and Affiliations

  1. 1.Qatar Computing Research InstituteHamad bin Khalifa UniversityDohaQatar

Personalised recommendations