Skip to main content
Log in

Improving class probability estimates for imbalanced data

Knowledge and Information Systems Aims and scope Submit manuscript

Cite this article


Obtaining good probability estimates is imperative for many applications. The increased uncertainty and typically asymmetric costs surrounding rare events increase this need. Experts (and classification systems) often rely on probabilities to inform decisions. However, we demonstrate that class probability estimates obtained via supervised learning in imbalanced scenarios systematically underestimate the probabilities for minority class instances, despite ostensibly good overall calibration. To our knowledge, this problem has not previously been explored. We propose a new metric, the stratified Brier score, to capture class-specific calibration, analogous to the per-class metrics widely used to assess the discriminative performance of classifiers in imbalanced scenarios. We propose a simple, effective method to mitigate the bias of probability estimates for imbalanced data that bags estimators independently calibrated over balanced bootstrap samples. This approach drastically improves performance on the minority instances without greatly affecting overall calibration. We extend our previous work in this direction by providing ample additional empirical evidence for the utility of this strategy, using both support vector machines and boosted decision trees as base learners. Finally, we show that additional uncertainty can be exploited via a Bayesian approach by considering posterior distributions over bagged probability estimates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8



  2. We note that Cieslak and Chawla [7] have investigated the specific case of probability estimation trees (PETs) for imbalanced data and that Foster and Stine [13] have considered the related task of variable selection for prediction under imbalance.

  3. Somewhat confusingly, the term ‘calibration’ is often used both to refer to the process of calibrating classifiers and to measure the accuracy of probability estimates.

  4. We used the somewhat arbitrary relative weight of 10.

  5. See Table 1.

  6. This limit is undefined in general; here \(\tilde{\pi }\) is coming from the positive side.

  7. Recall that the training time of SVMs scales quadratically with the number of instances.



  1. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  MathSciNet  Google Scholar 

  2. Breslow NE, Day NE (1980) Statistical methods in cancer research, vol. 1. The analysis of case-control studies, vol 1. Distributed for IARC by WHO, Geneva, Switzerland

  3. Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78(1):1–3

    Article  Google Scholar 

  4. Chawla NV (2010) Data mining for imbalanced datasets: an overview. Data Mining Knowledge Discovery Handbook, pp 875–886

  5. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Google Scholar 

  6. Chawla NV, Cieslak DA, Hall LO, Joshi A (2008) Automatically countering imbalance and its empirical relationship to cost. Data Min Knowl Discov 17(2):225–252

    Article  MathSciNet  Google Scholar 

  7. Cieslak D, Chawla N (2008) Analyzing pets on imbalanced datasets when training and testing class distributions differ. Adv Knowl Discov Data Min 5012: 519–526

    Google Scholar 

  8. Cohen AM, Hersh WR, Peterson K, Yen PY (2006a) Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc 13(2):206–219

    Article  Google Scholar 

  9. Cohen G, Hilario M, Sax H, Hugonnet S, Geissbuhler A (2006b) Learning from imbalanced data in surveillance of nosocomial infection. Artif Intell Med 37(1):7–18

    Article  Google Scholar 

  10. Cohen I, Goldszmidt M (2004) Properties and benefits of calibrated classifiers. In: 15th European conference on machine learning (ECML04) and 8th European conference on principles and practice of knowledge discovery in databases (PKDD04), pp 125–136

  11. Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 7th international joint conference on, artificial intelligence (IJCAI01), vol 17, pp 973–978

  12. Firth D (1993) Bias reduction of maximum likelihood estimates. Biometrika 80(1):27–38

    Article  MATH  MathSciNet  Google Scholar 

  13. Foster DP, Stine RA (2004) Variable selection in data mining: building a predictive model for bankruptcy. J Am Stat Assoc 99(466):303–313

    Article  MATH  MathSciNet  Google Scholar 

  14. Guo X, Yin Y, Dong C, Yang G, Zhou G (2008) On the class imbalance problem. In: 4th International conference on natural computation (ICNC08), pp 192–201

  15. Haibo H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  16. Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26(2):451–471

    Article  MATH  MathSciNet  Google Scholar 

  17. Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449

    MATH  Google Scholar 

  18. King G, Zeng L (2001) Logistic regression in rare events data. Political Anal 9(2):137–163

    Article  Google Scholar 

  19. Lin HT, Lin CJ, Weng RC (2007) A note on platt’s probabilistic outputs for support vector machines. Mach Learn 68(3):267–276

    Article  Google Scholar 

  20. McCullagh P, Nelder JA (1987) Generalized linear models. Chapman and Hall, London

    Google Scholar 

  21. Niculescu-Mizil A, Caruana R (2005a) Obtaining calibrated probabilities from boosting. In: Proceedings of the 21st conference on uncertainty in artificial intelligence (UAI05), pp 413–420

  22. Niculescu-Mizil A, Caruana R (2005b) Predicting good probabilities with supervised learning. In: Proceedings of the 22nd international conference on machine learning (ICML05). ACM, pp 625–632

  23. Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 10(3):61–74

    Google Scholar 

  24. Provost F (2000) Machine learning from imbalanced data sets 101 (invited talk). In: Proceedings of the AAAI workshop on learning from imbalanced data sets (AAAI00)

  25. Rabe-Hesketh S, Skrondal A (2008) Multilevel and longitudinal modeling using Stata. Stata Corp, College Station, Texas

  26. Van Hulse J, Khoshgoftaar TM, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th international conference on machine learning (ICML07), pp 935–942

  27. Wallace BC, Dahabreh IJ (2012) Class probability estimates are unreliable for imbalanced data (and how to fix them). In: Proceedings of the 12th international conference on data mining (ICDM12). IEEE, pp 695–704

  28. Wallace BC, Small K, Brodley CE, Trikalinos TA (2011) Class imbalance, redux. In: Proceedings of the 11th international conference on data mining (ICDM11). IEEE, pp 754–763

  29. Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH (2010) Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinform 11(1):55

    Article  Google Scholar 

  30. Walter SD (1985) Small sample estimation of log odds ratios from logistic regression and fourfold tables. Stat Med 4(4):437–444

    Article  Google Scholar 

  31. Yang W, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Technol Dec Making 5(4):597–604

    Article  Google Scholar 

  32. Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD02). ACM, pp 694–699

  33. Zhu J, Hovy E, (2007) Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural, language learning, pp 783–790

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Byron C. Wallace.

Additional information

This article is an extended version of our ICDM 2012 paper [27]. This work was supported in part by a grant from the Agency for Healthcare Research and Quality (AHRQ, grant HS018494-01). The findings and conclusions in this paper are those of the authors, who are responsible for its content, and do not necessarily represent the views of the AHRQ. No statement in this report should be construed as an official position of the AHRQ or of the U.S. Department of Health and Human Services.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Wallace, B.C., Dahabreh, I.J. Improving class probability estimates for imbalanced data. Knowl Inf Syst 41, 33–52 (2014).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: