Encyclopedia of Machine Learning and Data Mining

Living Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Classifier Calibration

  • Peter A. Flach
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7502-7_900-1


Classifier calibration is concerned with the scale on which a classifier’s scores are expressed. While a classifier ultimately maps instances to discrete classes, it is often beneficial to decompose this mapping into a scoring classifier which outputs one or more real-valued numbers and a decision rule which converts these numbers into predicted classes. For example, a linear classifier might output a positive or negative score whose magnitude is proportional to the distance between the instance and the decision boundary, in which case the decision rule would be a simple threshold on that score. The advantage of calibrating these scores to a known, domain-independent scale is that the decision rule then also takes a domain-independent form and does not have to be learned. The best-known example of this occurs when the classifier’s scores approximate, in a precise sense, the posterior probability over the classes; the main advantage of this is that the optimal decision rule is to predict the class that minimizes expected cost averaged over all possible true classes.The main methods to obtain calibrated scores are logistic calibration, which is a parametric method that assumes that the distances on either side of the decision boundary are normally distributed and a nonparametric alternative that is variously known as isotonic regression, the pool adjacent violators (PAV) method or the ROC convex hull (ROCCH) method.


Decision Boundary Cost Curve Decision Threshold Refinement Loss Brier Score 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in to check access.


  1. Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2013) On the effect of calibration in classifier combination. Appl Intell 38(4):566–585CrossRefGoogle Scholar
  2. Brier G (1950) Verification of forecasts expressed in terms of probabilities. Mon Weather Rev 78:1–3CrossRefGoogle Scholar
  3. Drummond C, Holte R (2006) Cost curves: an improved method for visualizing classifier performance. Mach Learn 65(1):95–130CrossRefGoogle Scholar
  4. Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of 17th international joint conference on artificial intelligence (IJCAI’01). Morgan Kaufmann, pp 973–978Google Scholar
  5. Fawcett T, Niculescu-Mizil A (2007) PAV and the ROC convex hull. Mach Learn 68(1):97–106CrossRefGoogle Scholar
  6. Ferri C, Flach P, Hernández-Orallo J (2003) Improving the AUC of probabilistic estimation trees. In: 14th European conference on machine learning (ECML’03). Springer, pp 121–132Google Scholar
  7. Flach P, Kull M (2015) Precision-recall-gain curves: PR analysis done right. In: Advances in neural information processing systems (NIPS’15), pp 838–846Google Scholar
  8. Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102(477):359–378MathSciNetCrossRefzbMATHGoogle Scholar
  9. Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26(2):451–471MathSciNetCrossRefzbMATHGoogle Scholar
  10. Hernández-Orallo J, Flach P, Ferri C (2011) Brier curves: a new cost-based visualisation of classifier performance. In: Proceedings 28th international conference on machine learning (ICML’11), pp 585–592Google Scholar
  11. Hernández-Orallo J, Flach P, Ferri C (2012) A unified view of performance metrics: translating threshold choice into expected classification loss. J Mach Learn Res 13(1):2813–2869MathSciNetzbMATHGoogle Scholar
  12. Kong EB, Dietterich T (1997) Probability estimation via error-correcting output coding. In: International conference on artificial intelligence and soft computingGoogle Scholar
  13. Koyejo OO, Natarajan N, Ravikumar PK, Dhillon IS (2014) Consistent binary classification with generalized performance metrics. In: Advances in neural information processing systems (NIPS’14), pp 2744–2752Google Scholar
  14. Kull M, Flach P (2015) Novel decompositions of proper scoring rules for classification: score adjustment as precursor to calibration. In: Machine learning and knowledge discovery in databases (ECML-PKDD’15). Springer, pp 68–85Google Scholar
  15. Niculescu-Mizil A, Caruana R (2005) Predicting good probabilities with supervised learning. In: Proceedings of 22nd international conference on machine learning (ICML’05), pp 625–632Google Scholar
  16. Platt J (2000) Probabilities for SV machines. In: Smola A, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, Cambridge, pp 61–74Google Scholar
  17. Provost F, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52(3):199–215CrossRefzbMATHGoogle Scholar
  18. Zadrozny B (2001) Reducing multiclass to binary by coupling probability estimates. In: Advances in neural information processing systems (NIPS’01), pp 1041–1048Google Scholar
  19. Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In: Proceedings of 18th international conference on machine learning (ICML’01), pp 609–616Google Scholar
  20. Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of 8th international conference on knowledge discovery and data mining (KDD’02). ACM, pp 694–699Google Scholar
  21. Zhao M-J, Edakunni N, Pocock A, Brown G (2013) Beyond Fano’s inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications. J Mach Learn Res 14(1):1033–1090MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of BristolBristolUK