# Classifier Calibration

**DOI:**https://doi.org/10.1007/978-1-4899-7502-7_900-1

## Abstract

Classifier calibration is concerned with the scale on which a classifier’s scores are expressed. While a classifier ultimately maps instances to discrete classes, it is often beneficial to decompose this mapping into a *scoring classifier* which outputs one or more real-valued numbers and a *decision rule* which converts these numbers into predicted classes. For example, a linear classifier might output a positive or negative score whose magnitude is proportional to the distance between the instance and the *decision boundary*, in which case the decision rule would be a simple threshold on that score. The advantage of calibrating these scores to a known, domain-independent scale is that the decision rule then also takes a domain-independent form and does not have to be learned. The best-known example of this occurs when the classifier’s scores approximate, in a precise sense, the posterior probability over the classes; the main advantage of this is that the optimal decision rule is to predict the class that minimizes expected cost averaged over all possible true classes.The main methods to obtain calibrated scores are *logistic calibration*, which is a parametric method that assumes that the distances on either side of the decision boundary are normally distributed and a nonparametric alternative that is variously known as *isotonic regression*, the *pool adjacent violators* (PAV) method or the *ROC convex hull* (ROCCH) method.

## Keywords

Decision Boundary Cost Curve Decision Threshold Refinement Loss Brier Score## References

- Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2013) On the effect of calibration in classifier combination. Appl Intell 38(4):566–585CrossRefGoogle Scholar
- Brier G (1950) Verification of forecasts expressed in terms of probabilities. Mon Weather Rev 78:1–3CrossRefGoogle Scholar
- Drummond C, Holte R (2006) Cost curves: an improved method for visualizing classifier performance. Mach Learn 65(1):95–130CrossRefGoogle Scholar
- Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of 17th international joint conference on artificial intelligence (IJCAI’01). Morgan Kaufmann, pp 973–978Google Scholar
- Fawcett T, Niculescu-Mizil A (2007) PAV and the ROC convex hull. Mach Learn 68(1):97–106CrossRefGoogle Scholar
- Ferri C, Flach P, Hernández-Orallo J (2003) Improving the AUC of probabilistic estimation trees. In: 14th European conference on machine learning (ECML’03). Springer, pp 121–132Google Scholar
- Flach P, Kull M (2015) Precision-recall-gain curves: PR analysis done right. In: Advances in neural information processing systems (NIPS’15), pp 838–846Google Scholar
- Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102(477):359–378MathSciNetCrossRefzbMATHGoogle Scholar
- Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26(2):451–471MathSciNetCrossRefzbMATHGoogle Scholar
- Hernández-Orallo J, Flach P, Ferri C (2011) Brier curves: a new cost-based visualisation of classifier performance. In: Proceedings 28th international conference on machine learning (ICML’11), pp 585–592Google Scholar
- Hernández-Orallo J, Flach P, Ferri C (2012) A unified view of performance metrics: translating threshold choice into expected classification loss. J Mach Learn Res 13(1):2813–2869MathSciNetzbMATHGoogle Scholar
- Kong EB, Dietterich T (1997) Probability estimation via error-correcting output coding. In: International conference on artificial intelligence and soft computingGoogle Scholar
- Koyejo OO, Natarajan N, Ravikumar PK, Dhillon IS (2014) Consistent binary classification with generalized performance metrics. In: Advances in neural information processing systems (NIPS’14), pp 2744–2752Google Scholar
- Kull M, Flach P (2015) Novel decompositions of proper scoring rules for classification: score adjustment as precursor to calibration. In: Machine learning and knowledge discovery in databases (ECML-PKDD’15). Springer, pp 68–85Google Scholar
- Niculescu-Mizil A, Caruana R (2005) Predicting good probabilities with supervised learning. In: Proceedings of 22nd international conference on machine learning (ICML’05), pp 625–632Google Scholar
- Platt J (2000) Probabilities for SV machines. In: Smola A, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, Cambridge, pp 61–74Google Scholar
- Provost F, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52(3):199–215CrossRefzbMATHGoogle Scholar
- Zadrozny B (2001) Reducing multiclass to binary by coupling probability estimates. In: Advances in neural information processing systems (NIPS’01), pp 1041–1048Google Scholar
- Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In: Proceedings of 18th international conference on machine learning (ICML’01), pp 609–616Google Scholar
- Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of 8th international conference on knowledge discovery and data mining (KDD’02). ACM, pp 694–699Google Scholar
- Zhao M-J, Edakunni N, Pocock A, Brown G (2013) Beyond Fano’s inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications. J Mach Learn Res 14(1):1033–1090MathSciNetzbMATHGoogle Scholar