Knowledge and Information Systems

, Volume 41, Issue 1, pp 33–52 | Cite as

Improving class probability estimates for imbalanced data

  • Byron C. Wallace
  • Issa J. Dahabreh
Regular paper


Obtaining good probability estimates is imperative for many applications. The increased uncertainty and typically asymmetric costs surrounding rare events increase this need. Experts (and classification systems) often rely on probabilities to inform decisions. However, we demonstrate that class probability estimates obtained via supervised learning in imbalanced scenarios systematically underestimate the probabilities for minority class instances, despite ostensibly good overall calibration. To our knowledge, this problem has not previously been explored. We propose a new metric, the stratified Brier score, to capture class-specific calibration, analogous to the per-class metrics widely used to assess the discriminative performance of classifiers in imbalanced scenarios. We propose a simple, effective method to mitigate the bias of probability estimates for imbalanced data that bags estimators independently calibrated over balanced bootstrap samples. This approach drastically improves performance on the minority instances without greatly affecting overall calibration. We extend our previous work in this direction by providing ample additional empirical evidence for the utility of this strategy, using both support vector machines and boosted decision trees as base learners. Finally, we show that additional uncertainty can be exploited via a Bayesian approach by considering posterior distributions over bagged probability estimates.


Classification Imbalance Unbalance SVM Boosted decision-trees Platt Calibration Brier-score 


  1. 1.
    Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140zbMATHMathSciNetGoogle Scholar
  2. 2.
    Breslow NE, Day NE (1980) Statistical methods in cancer research, vol. 1. The analysis of case-control studies, vol 1. Distributed for IARC by WHO, Geneva, SwitzerlandGoogle Scholar
  3. 3.
    Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78(1):1–3CrossRefGoogle Scholar
  4. 4.
    Chawla NV (2010) Data mining for imbalanced datasets: an overview. Data Mining Knowledge Discovery Handbook, pp 875–886Google Scholar
  5. 5.
    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357zbMATHGoogle Scholar
  6. 6.
    Chawla NV, Cieslak DA, Hall LO, Joshi A (2008) Automatically countering imbalance and its empirical relationship to cost. Data Min Knowl Discov 17(2):225–252CrossRefMathSciNetGoogle Scholar
  7. 7.
    Cieslak D, Chawla N (2008) Analyzing pets on imbalanced datasets when training and testing class distributions differ. Adv Knowl Discov Data Min 5012: 519–526Google Scholar
  8. 8.
    Cohen AM, Hersh WR, Peterson K, Yen PY (2006a) Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc 13(2):206–219CrossRefGoogle Scholar
  9. 9.
    Cohen G, Hilario M, Sax H, Hugonnet S, Geissbuhler A (2006b) Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 37(1):7–18CrossRefGoogle Scholar
  10. 10.
    Cohen I, Goldszmidt M (2004) Properties and benefits of calibrated classifiers. In: 15th European conference on machine learning (ECML04) and 8th European conference on principles and practice of knowledge discovery in databases (PKDD04), pp 125–136Google Scholar
  11. 11.
    Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 7th international joint conference on, artificial intelligence (IJCAI01), vol 17, pp 973–978Google Scholar
  12. 12.
    Firth D (1993) Bias reduction of maximum likelihood estimates. Biometrika 80(1):27–38CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Foster DP, Stine RA (2004) Variable selection in data mining: building a predictive model for bankruptcy. J Am Stat Assoc 99(466):303–313CrossRefzbMATHMathSciNetGoogle Scholar
  14. 14.
    Guo X, Yin Y, Dong C, Yang G, Zhou G (2008) On the class imbalance problem. In: 4th International conference on natural computation (ICNC08), pp 192–201Google Scholar
  15. 15.
    Haibo H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRefGoogle Scholar
  16. 16.
    Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26(2):451–471CrossRefzbMATHMathSciNetGoogle Scholar
  17. 17.
    Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449zbMATHGoogle Scholar
  18. 18.
    King G, Zeng L (2001) Logistic regression in rare events data. Political Anal 9(2):137–163CrossRefGoogle Scholar
  19. 19.
    Lin HT, Lin CJ, Weng RC (2007) A note on platt’s probabilistic outputs for support vector machines. Mach Learn 68(3):267–276CrossRefGoogle Scholar
  20. 20.
    McCullagh P, Nelder JA (1987) Generalized linear models. Chapman and Hall, LondonGoogle Scholar
  21. 21.
    Niculescu-Mizil A, Caruana R (2005a) Obtaining calibrated probabilities from boosting. In: Proceedings of the 21st conference on uncertainty in artificial intelligence (UAI05), pp 413–420Google Scholar
  22. 22.
    Niculescu-Mizil A, Caruana R (2005b) Predicting good probabilities with supervised learning. In: Proceedings of the 22nd international conference on machine learning (ICML05). ACM, pp 625–632Google Scholar
  23. 23.
    Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 10(3):61–74Google Scholar
  24. 24.
    Provost F (2000) Machine learning from imbalanced data sets 101 (invited talk). In: Proceedings of the AAAI workshop on learning from imbalanced data sets (AAAI00)Google Scholar
  25. 25.
    Rabe-Hesketh S, Skrondal A (2008) Multilevel and longitudinal modeling using Stata. Stata Corp, College Station, TexasGoogle Scholar
  26. 26.
    Van Hulse J, Khoshgoftaar TM, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th international conference on machine learning (ICML07), pp 935–942Google Scholar
  27. 27.
    Wallace BC, Dahabreh IJ (2012) Class probability estimates are unreliable for imbalanced data (and how to fix them). In: Proceedings of the 12th international conference on data mining (ICDM12). IEEE, pp 695–704Google Scholar
  28. 28.
    Wallace BC, Small K, Brodley CE, Trikalinos TA (2011) Class imbalance, redux. In: Proceedings of the 11th international conference on data mining (ICDM11). IEEE, pp 754–763Google Scholar
  29. 29.
    Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH (2010) Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinform 11(1):55CrossRefGoogle Scholar
  30. 30.
    Walter SD (1985) Small sample estimation of log odds ratios from logistic regression and fourfold tables. Stat Med 4(4):437–444CrossRefGoogle Scholar
  31. 31.
    Yang W, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Technol Dec Making 5(4):597–604CrossRefGoogle Scholar
  32. 32.
    Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD02). ACM, pp 694–699Google Scholar
  33. 33.
    Zhu J, Hovy E, (2007) Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural, language learning, pp 783–790Google Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  1. 1.Center for Evidence-Based MedicineBrown UniversityProvidenceUSA

Personalised recommendations