Advertisement

Text Quantification

  • Fabrizio Sebastiani
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8416)

Abstract

In recent years it has been pointed out that, in a number of applications involving classification, the final goal is not determining which class (or classes) individual unlabelled data items belong to, but determining the prevalence (or “relative frequency”) of each class in the unlabelled data. The latter task has come to be known as quantification [1, 3, 5-10, 15, 18, 19].

Keywords

Knowledge Discovery Unlabelled Data Concept Drift 11th IEEE International Confer Mobile Phone Service 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baccianella, S., Esuli, A., Sebastiani, F.: Variable-constraint classification and quantification of radiology reports under the ACR Index. Expert Systems and Applications 40(9), 3441–3449 (2013)CrossRefGoogle Scholar
  2. 2.
    Bella, A., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Quantification via probability estimators. In: Proceedings of the 11th IEEE International Conference on Data Mining (ICDM 2010), pp. 737–742 (2010)Google Scholar
  3. 3.
    Bella, A., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Aggregative quantification for regression. Data Mining and Knowledge Discovery 28(2), 475–518 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Esuli, A., Sebastiani, F.: Machines that learn how to code open-ended survey data. International Journal of Market Research 52(6), 775–800 (2010)CrossRefGoogle Scholar
  5. 5.
    Esuli, A., Sebastiani, F.: Sentiment quantification. IEEE Intelligent Systems 25(4), 72–75 (2010)CrossRefGoogle Scholar
  6. 6.
    Esuli, A., Sebastiani, F.: Optimizing text quantifiers for multivariate loss functions. Technical Report 2013-TR-005, Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, IT (2013)Google Scholar
  7. 7.
    Forman, G.: Counting positives accurately despite inaccurate classification. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 564–575. Springer, Heidelberg (2005)Google Scholar
  8. 8.
    Forman, G.: Quantifying trends accurately despite classifier error and class imbalance. In: Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2006), Philadelphia, US, pp. 157–166 (2006)Google Scholar
  9. 9.
    Forman, G.: Quantifying counts and costs via classification. Data Mining and Knowledge Discovery 17(2), 164–206 (2008)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Forman, G., Kirshenbaum, E., Suermondt, J.: Pragmatic text mining: Minimizing human effort to quantify many issues in call logs. In: Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2006), Philadelphia, US, pp. 852–861 (2006)Google Scholar
  11. 11.
    Gamon, M.: Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of linguistic analysis. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva, CH, pp. 841–847 (2004)Google Scholar
  12. 12.
    Giorgetti, D., Sebastiani, F.: Automating survey coding by multiclass text categorization techniques. Journal of the American Society for Information Science and Technology 54(14), 1269–1277 (2003)CrossRefGoogle Scholar
  13. 13.
    Hopkins, D.J., King, G.: A method of automated nonparametric content analysis for social science. American Journal of Political Science 54(1), 229–247 (2010)CrossRefGoogle Scholar
  14. 14.
    Kelly, M.G., Hand, D.J., Adams, N.M.: The impact of changing populations on classifier performance. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 1999), San Diego, US, pp. 367–371 (1999)Google Scholar
  15. 15.
    Milli, L., Monreale, A., Rossetti, G., Giannotti, F., Pedreschi, D., Sebastiani, F.: Quantification trees. In: Proceedings of the 13th IEEE International Conference on Data Mining (ICDM 2013), Dallas, US, pp. 528–536 (2013)Google Scholar
  16. 16.
    Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset shift in machine learning. The MIT Press, Cambridge (2009)Google Scholar
  17. 17.
    Sammut, C., Harries, M.: Concept drift. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 202–205. Springer, Heidelberg (2011)Google Scholar
  18. 18.
    Tang, L., Gao, H., Liu, H.: Network quantification despite biased labels. In: Proceedings of the 8th Workshop on Mining and Learning with Graphs (MLG 2010), Washington, US, pp. 147–154 (2010)Google Scholar
  19. 19.
    Xue, J.C., Weiss, G.M.: Quantification and semi-supervised classification methods for handling changes in class distribution. In: Proceedings of the 15th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD 2009), Paris, FR, pp. 897–906 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Fabrizio Sebastiani
    • 1
  1. 1.Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle RicerchePisaItaly

Personalised recommendations