Learning from Imbalanced Data: Evaluation Matters

  • Troy Raeder
  • George Forman
  • Nitesh V. Chawla
Part of the Intelligent Systems Reference Library book series (ISRL, volume 23)


Datasets having a highly imbalanced class distribution present a fundamental challenge in machine learning, not only for training a classifier, but also for evaluation. There are also several different evaluation measures used in the class imbalance literature, each with its own bias. Compounded with this, there are different cross-validation strategies. However, the behavior of different evaluation measures and their relative sensitivities—not only to the classifier but also to the sample size and the chosen cross-validation method—is not well understood. Papers generally choose one evaluation measure and show the dominance of one method over another. We posit that this common methodology is myopic, especially for imbalanced data. Another fundamental issue that is not sufficiently considered is the sensitivity of classifiers both to class imbalance as well as to having only a small number of samples of the minority class. We consider such questions in this paper.


Evaluation Matter Minority Class Link Prediction Class Imbalance Positive Class 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Akbani, R., Kwek, S.S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  2. 2.
    Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations 6(1) (2004)Google Scholar
  3. 3.
    Blake, C., Merz, C.: UCI repository of machine learning databases (1998)Google Scholar
  4. 4.
    Chang, C., Lin, C.: Libsvm data sets,
  5. 5.
    Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter 6(1), 1–6 (2004)CrossRefGoogle Scholar
  6. 6.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Oversampling TEchnique. JAIR 16, 321–357 (2002)zbMATHGoogle Scholar
  7. 7.
    Chawla, N.V., Cieslak, D., Hall, L.O., Joshi, A.: Automatically Countering Imbalance and Its Empirical Relationship to Cost. In: DMKD (2009)Google Scholar
  8. 8.
    Cieslak, D.A., Chawla, N.V.: Learning decision trees on unbalanced data. In: ECML (2008)Google Scholar
  9. 9.
    Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine learning, p. 240. ACM, New York (2006)Google Scholar
  10. 10.
    Demsar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. JMLR 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Direct Marketing Association. The dmef data set library,
  12. 12.
    Domingos, P., Pazzani, M.J.: Beyond independence: Conditions for the optimality of the simple bayesian classifier. In: ICML (1996)Google Scholar
  13. 13.
    Ezawa, K.J., Singh, M., Norton, S.W.: Learning Goal Oriented Bayesian Networks for Risk Management. In: ICML, pp. 139–147 (1996)Google Scholar
  14. 14.
    Forman, G.: A method for discovering the insignificance of one’s best classifier and the unlearnability of a classification task. In: Data Mining Lessons Learned Workshop, ICML (2002)Google Scholar
  15. 15.
    Forman, G., Cohen, I.: Beware the null hypothesis: Critical value tables for evaluating classifiers. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 133–145. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Hand, D.J.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine Learning 77(1), 103–123 (2009)CrossRefGoogle Scholar
  17. 17.
    Hulse, J.V., Khoshgoftaar, T.M., Napolitano, A.: Experimental perspectives on learning from imbalanced data. In: Ghahramani, Z. (ed.) ICML, pp. 935–942. ACM, New York (2007)CrossRefGoogle Scholar
  18. 18.
    Kubat, M., Holte, R., Matwin, S.: Machine Learning for the Detection of Oil Spills in Satellite Radar Images. Machine Learning 30, 195–215 (1998)CrossRefGoogle Scholar
  19. 19.
    Lewis, D.D., Yang, Y., Rose, T., Li, F.: RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)Google Scholar
  20. 20.
    Lichtenwalter, R., Lussier, J., Chawla, N.: New Perspectives and Methods in Link Prediction. In: Proceedings of KDDGoogle Scholar
  21. 21.
    Mease, D., Wyner, A.J., Buja, A.: Boosted classification trees and class probability/quantile estimation. Journal of Machine Learning Research 8(3), 557–562 (2007)Google Scholar
  22. 22.
    Mladenić, D., Grobelnik, M.: Feature Selection for Unbalanced Class Distribution and Naive Bayes. In: Proceedings of the 16th International Conference on Machine Learning, pp. 258–267 (1999)Google Scholar
  23. 23.
    Chawla, N.V., Japkowicz, N., Kolcz, A.: Proceedings of the ICML Workshop on Learning from Imbalanced Data Sets (August 2003)Google Scholar
  24. 24.
    Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning, Citeseer, vol. 445 (1998)Google Scholar
  25. 25.
    Radivojac, P., Chawla, N.V., Dunker, K., Obradovic, Z.: Classification and Knowledge Discovery in Protein Databases. JBI 37(4), 224–239 (2004)Google Scholar
  26. 26.
    Tafts, L.M., et al.: Countering imbalanced datasets to improve adverse drug event predictive models in labor and delivery. JBI (2009)Google Scholar
  27. 27.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  28. 28.
    Wu, G., Chang, E.Y.: Kba: Kernel boundary alignment considering imbalanced data distribution. IEEE TKDE 17(6), 786–795 (2005)Google Scholar
  29. 29.
    Wu, J., Xiong, H., Wu, P., Chen, J.: Local Decomposition for Rare Class Analysis. In: Proceedings of KDD, pp. 814–823 (2007)Google Scholar
  30. 30.
    Zadrozny, B., Elkan, C.: Learning and making decisions when costs and probabilities are both unknown. In: Proceedings KDD (2001)Google Scholar
  31. 31.
    Zhou, Z., Liu, X.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE TKDE 18(1), 63–77 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Troy Raeder
    • 1
  • George Forman
    • 2
  • Nitesh V. Chawla
    • 1
  1. 1.University of Notre DameNotre DameUSA
  2. 2.HP LabsPalo AltoUSA

Personalised recommendations