Obtaining calibrated probability using ROC Binning

Sun, Meesun; Cho, Sungzoon

doi:10.1007/s10044-016-0578-3

Obtaining calibrated probability using ROC Binning

Theoretical Advances
Published: 29 September 2016

Volume 21, pages 307–322, (2018)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Meesun Sun^1,2 &
Sungzoon Cho¹

395 Accesses
5 Citations
Explore all metrics

Abstract

Obtaining calibrated probability, or actual occurrence, is crucial in many real problems because it effectively supports the decision-making process with good assessment of cost and effect. Estimating calibrated probability is a more significant issue in class imbalance and class overlap problems, where direct application of classification algorithms may result in substantial errors. Consequently, several post-processing calibration techniques that aim at improving the probability estimation or the error distribution of existing classification models have been developed. In this underlying context, we propose Receiver Operating Characteristics Binning, a robust method that provides accurate calibrated probabilities that are robust to changes in the prevalence of the positive class by using a combination of True Positive Rate, False Positive Rate, and the prevalence of the positive class. The results of experiments conducted on the real-world UCI dataset indicate that, given a training set in which the positive class proportion is noticeably different from that of the test set, the proposed ROC Binning method outperforms conventional calibration methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2013) On the effect of calibration in classifier combination. Appl Intell 38(4):566–585
Article Google Scholar
Naeini MP, Cooper GF, Hauskrecht M (2014) Binary classifier calibration: non-parametric approach. arXiv preprint arXiv:14013390
Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and Naïve Bayesian classiers. In: Proceedings of the 18th international conference on machine learning, Williamstown, MA, 2001. pp 609–616
Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74
Zadrozny B, Elkan C Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the 8th ACM SIGKDD international conference on Knowledge discovery and data mining, Canada, 2002. pp 694–699
Agoritsas T, Courvoisier DS, Combescure C, Deom M, Perneger TV (2011) Post-test probability according to prevalence. J Gen Intern Med 26(10):1091. doi:10.1007/s11606-011-1787-5
Article Google Scholar
Denil M, Trappenberg T (2010) Overlap versus imbalance. Advances in artificial intelligence. Springer, Berlin Heidelberg, pp 220–231
Chapter Google Scholar
Cohen I, Goldszmidt M (2004) Properties and benefits of calibrated classifiers knowledge discovery in databases: PKDD 2004, Lecture Notes in Computer Science
Sun M, Choi K, Cho S (2015) Estimating the minority class proportion with the ROC curve using Military Personality Inventory data of the ROK Armed Forces. Journal of Applied Statistics 42(8):1677–1689
Article MathSciNet Google Scholar
Lambrou A, Papadopoulos H, Nouretdinov I, Gammerman A (2012) Reliable probability estimates based on Support Vector Machines for large multiclass datasets. Artif Intell Appl Innov 382:182–191
Article Google Scholar
Wallace BC, Dahabreh IJ (2012) Class probability estimates are unreliable for imbalanced data (and how to fix them). In: IEEE 12th international conference on data mining, Washington, DC, 2012. IEEE Computer Society, pp 695–704
Lin H-T, Lin C-J, Weng RC (2007) A note on Platt’s probabilistic outputs for support vector machines. Mach Learn 68(3):267–276
Article Google Scholar
Fawcett T, Niculescu-Mizil A (2007) PAV and the ROC Convex Hull. Mach Learn 68(1):97–106
Article Google Scholar
Niculescu-Mizil A, Caruana R (2005) Predicting good probabilities with supervised learning. In: Proceedings of the 22nd international conference on Machine learning, 2005. ACM, pp 625–632
Gebel M (2009) Multivariate calibration of classifier scores into the probability space. Dissertation, Technical University of Dortmund, Duisburg, Germany
Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recogn Lett 30(1):27–38
Article Google Scholar
Brier G (1950) Verification of forecasts expressed in terms of probabilities. Mon Weather Rev 78:1–3
Article Google Scholar
Murphy AH (1973) A new vector partition of the probability score. J Appl Meteorol 12(4):595–600
Article Google Scholar
Flach P, Matsubara ET (2007) A Simple Lexicographic Ranker and Probability Estimator. Machine Learning: ECML 2007 Lecture Notes in Computer Science 4701:575–582
Murphy AH (1972) Scalar and vector partitions of the probability score: part ii. n-state situation. J Appl Meteorol 11:182–1192
Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27:861–874
Article Google Scholar
López V, Fernández A, Moreno-Torres JG, Herrera F (2012) Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Syst Appl 39:6585–6608
Article Google Scholar
Barranquero J, González P, Díez J, Coz JJ (2013) On the study of nearest neighbor algorithms for prevalence estimation in binary problems. Pattern Recogn 46:472–482
Article MATH Google Scholar
Forman G (2008) Quantifying counts and costs via classification. Data Mining Knowl Discov 17:164–206
Article MathSciNet Google Scholar
Webb G, Ting K (2005) On the application of ROC analysis to predict classification performance under varying class distributions. Mach Learn 58:25–32
Article MATH Google Scholar
Fawcett T, Flach P (2005) A response to Webb and Ting’s on the application of ROC analysis to predict classification performance under varying class distributions. Mach Learn 58(1):33–38
Article Google Scholar
Tan P-N, Steinbach M, Kumar V (2006) Introduction to data mining. Addison Wesley, New York
Google Scholar
Lichman M (2013) UCI machine learning repository (http://archive.ics.uci.edu/ml). University of California, Irvine, School of Information and Computer Sciences, Irvine, CA
Sánchez JS, Mollineda RA, Sotoca JM (2007) An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal Appl 10(3):189–201
Article MathSciNet Google Scholar
Duda RO, Hart PE (1973) Pattern classification and science analysis. Wiley, New York
MATH Google Scholar
Switzer P (1980) Extensions of linear discriminant analysis for statistical classification of remotely sensed satellite imagery. J Int Assoc Math Geol 12(4):367–376
Article MathSciNet Google Scholar
Agresti A (1996) An introduction to categorical data analysis. Wiley, New York
MATH Google Scholar
Cortes C, Vapnik VN (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Matlab version 7.10.0 (2010). The MathWorks Inc., Natick, Massachusetts
Isotonic Regression Software (2005) undInstitute für Mathematische Statistik und Versicherungslehre. Universität Bern, Bern
Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–130
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial Engineering, Seoul National University, San 56-1, Shillim-dong, Kwanak-gu, Seoul, 151-744, Republic of Korea
Meesun Sun & Sungzoon Cho
Center for Defense Management, Korea Institute for Defense Analyses, 37 Hoegi-ro, Dongdaemun-gu, Seoul, 130-871, Republic of Korea
Meesun Sun

Authors

Meesun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Sungzoon Cho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meesun Sun.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOC 1285 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, M., Cho, S. Obtaining calibrated probability using ROC Binning. Pattern Anal Applic 21, 307–322 (2018). https://doi.org/10.1007/s10044-016-0578-3

Download citation

Received: 05 June 2015
Accepted: 08 September 2016
Published: 29 September 2016
Issue Date: May 2018
DOI: https://doi.org/10.1007/s10044-016-0578-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Obtaining calibrated probability using ROC Binning

Abstract

Access this article

Similar content being viewed by others

Non-parametric Bayesian Isotonic Calibration: Fighting Over-Confidence in Binary Classification

Classifier calibration: a survey on how to assess and improve predicted class probabilities

ROC Curve Estimation Using the Combination of Generalized Half Normal and Weibull Distributions

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (DOC 1285 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Obtaining calibrated probability using ROC Binning

Abstract

Access this article

Similar content being viewed by others

Non-parametric Bayesian Isotonic Calibration: Fighting Over-Confidence in Binary Classification

Classifier calibration: a survey on how to assess and improve predicted class probabilities

ROC Curve Estimation Using the Combination of Generalized Half Normal and Weibull Distributions

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (DOC 1285 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation