Advertisement

Progress in Artificial Intelligence

, Volume 6, Issue 1, pp 53–58 | Cite as

Why is quantification an interesting learning problem?

  • Pablo González
  • Jorge Díez
  • Nitesh Chawla
  • Juan José del CozEmail author
Regular Paper

Abstract

There are real applications that do not demand to classify or to make predictions about individual objects, but to estimate some magnitude about a group of them. For instance, one of these cases happens in sentiment analysis and opinion mining. Some applications require to classify opinions as positives or negatives, but there are also others, even more useful sometimes, that just need an estimation of which is the proportion of each class during a concrete period of time. “How many tweets about our new product were positive yesterday?” Practitioners should apply quantification algorithms to tackle this kind of problems, instead of just using off-the-shelf classification methods, because classifiers are suboptimal in the context of quantification tasks. Unfortunately, quantification learning is still relatively an under explored area in machine learning. The goal of this paper is to show that quantification learning is an interesting open problem. To support its benefits, we shall show an application to analyze Twitter comments in which even the most simple quantification methods outperform classification approaches.

Keywords

Sentiment analysis Opinion mining Quantification Prevalence estimation Population shift 

Notes

Acknowledgments

This research has been funded by MINECO (the Spanish Ministerio de Economía y Competitividad) and FEDER (Fondo Europeo de Desarrollo Regional), Grant TIN2015-65069-C2-2-R. Juan José del Coz is also supported by the Fulbright Commission and the Salvador de Madariaga Program, Grant PRX15/00607. This paper has been written during the stay of Juan José del Coz at the University of Notre Dame.

References

  1. 1.
    Barranquero, J., González, P., Díez, J., del Coz, J.J.: On the study of nearest neighbour algorithms for prevalence estimation in binary problems. Pattern Recognit. 46(2), 472–482 (2013)CrossRefzbMATHGoogle Scholar
  2. 2.
    Barranquero, J., Díez, J., del Coz, J.J.: Quantification-oriented learning based on reliable classifiers. Pattern Recognit. 48(2), 591–604 (2015)CrossRefGoogle Scholar
  3. 3.
    Beijbom, O., Hoffman, J., Yao, E., Darrell, T., Rodriguez-Ramirez, A., Gonzalez-Rivero, M., Guldberg, O.H.: Quantification in-the-wild: data-sets and baselines. In: NIPS 2015, Workshop on Transfer and Multi-Task Learning. Montreal, CA (2015)Google Scholar
  4. 4.
    Bella, A., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.: Quantification via probability estimators. In: Proc. of the 10th IEEE International Conference on Data Mining, pp. 737–742 (2010)Google Scholar
  5. 5.
    Esuli, A., Sebastiani, F.: Sentiment quantification. IEEE Intell. Syst. 25(4), 72–75 (2010)CrossRefGoogle Scholar
  6. 6.
    Esuli, A., Sebastiani, F.: Optimizing text quantifiers for multivariate loss functions. ACM Trans. Knowl. Discov. Data 9(4), 27:1–27:27 (2015)Google Scholar
  7. 7.
    Fawcett, T., Flach, P.: A response to Webb and Ting’s on the application of ROC analysis to predict classification performance under varying class distributions. Mach. Learn. 58(1), 33–38 (2005)CrossRefGoogle Scholar
  8. 8.
    Forman, G.: Quantifying counts and costs via classification. Data Mining Knowl. Discov. 17(2), 164–206 (2008)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Forman, G., Kirshenbaum, E., Suermondt, J.: Pragmatic text mining: minimizing human effort to quantify many issues in call logs. In: Proceedings of ACM SIGKDD’06, ACM, pp. 852–861 (2006)Google Scholar
  10. 10.
    Garcia, S., Herrera, F.: An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)zbMATHGoogle Scholar
  11. 11.
    Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1:12 (2009)Google Scholar
  12. 12.
    González-Castro, V., Alaiz-Rodríguez, R., Alegre, E.: Class distribution estimation based on the hellinger distance. Inf. Sci. 218, 146–164 (2013)CrossRefGoogle Scholar
  13. 13.
    Latinne, P., Saerens, M., Decaestecker, C.: Adjusting the outputs of a classifier to new a priori probabilities may significantly improve classification accuracy: Evidence from a multi-class problem in remote sensing. In: Proceedings of ICML’01, M. Kaufmann, pp. 298–305 (2001)Google Scholar
  14. 14.
    Milli, L., Monreale, A., Rossetti, G., Giannotti, F., Pedreschi, D., Sebastiani, F.: Quantification trees. In: IEEE International Conference on Data Mining (ICDM’13), pp. 528–536 (2013)Google Scholar
  15. 15.
    Milli, L., Monreale, A., Rossetti, G., Pedreschi, D., Giannotti, F., Sebastiani, F.: Quantification in social networks. In: Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on, pp. 1–10 (2015)Google Scholar
  16. 16.
    Pérez-Gallego, P., Quevedo, J.R., del Coz, J.J.: Using ensembles for problems with characterizable changes in data distribution: a case study on quantification. Inf. Fusion 34, 87–100 (2017)CrossRefGoogle Scholar
  17. 17.
    Rakthanmanon, T., Keogh, E., Lonardi, S., Evans, S.: MDL-based time series clustering. Knowl. Inf. Syst. 33(2), 371–399 (2012)CrossRefGoogle Scholar
  18. 18.
    Saif, H., Fernández, M., He, Y., Alani, H.: Evaluation datasets for twitter sentiment analysis: a survey and a new dataset, the sts-gold. In: 1st Interantional Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI (ESSEM 2013) (2013)Google Scholar
  19. 19.
    Tasche, D.: Exact fit of simple finite mixture models. J. Risk Financial Manag. 7(4), 150–164 (2014)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Pablo González
    • 1
  • Jorge Díez
    • 1
  • Nitesh Chawla
    • 2
    • 3
  • Juan José del Coz
    • 1
    • 2
    Email author
  1. 1.Artificial Intelligence CenterUniversity of Oviedo at GijónOviedoSpain
  2. 2.Department of Computer Science and EngineeringUniversity of Notre DameNotre DameUSA
  3. 3.Interdisciplinary Center for Network Science and ApplicationsUniversity of Notre DameNotre DameUSA

Personalised recommendations