Similarity-Binning Averaging: A Generalisation of Binning Calibration

  • Antonio Bella
  • Cèsar Ferri
  • José Hernández-Orallo
  • Marïa José Ramírez-Quintana
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5788)


In this paper we revisit the problem of classifier calibration, motivated by the issue that existing calibration methods ignore the problem attributes (i.e., they are univariate). We propose a new calibration method inspired in binning-based methods in which the calibrated probabilities are obtained from k instances from a dataset. Bins are constructed by including the k-most similar instances, considering not only estimated probabilities but also the original attributes. This method has been tested wrt. two calibration measures, including a comparison with other traditional calibration methods. The results show that the new method outperforms the most commonly used calibration methods.


Calibration Method Calibration Technique Original Attribute Calibration Measure Brier Score 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bella, A., Ferri, C., Hernandez-Orallo, J., Ramirez-Quintana, M.J.: Calibration of machine learning models. In: Handbook of Research on Machine Learning Applications. IGI Global (2009)Google Scholar
  2. 2.
    Caruana, R., Niculescu-Mizil, A.: Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: Proc. of the 10th Intl. Conference on Knowledge Discovery and Data Mining, pp. 69–78 (2004)Google Scholar
  3. 3.
    Ayer, M., et al.: An empirical distribution function for sampling with incomplete information. Annals of Mathematical Statistics 5, 641–647 (1955)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Ferri, C., Hernández-Orallo, J., Modroiu, R.: An experimental comparison of performance measures for classification. Pattern Recogn. Lett. 30(1), 27–38 (2009)CrossRefGoogle Scholar
  5. 5.
    Flach, P., Matsubara, E.: A simple lexicographic ranker and probability estimator. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 575–582. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  6. 6.
    Gama, J., Brazdil, P.: Cascade generalization. Machine Learning 41, 315–343 (2000)CrossRefzbMATHGoogle Scholar
  7. 7.
    Murphy, A.H.: Scalar and vector partitions of the probability score: Part ii. n-state situation. Journal of Applied Meteorology 11, 1182–1192 (1972)Google Scholar
  8. 8.
    Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Boston (1999)Google Scholar
  9. 9.
    Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In: Proc. of the 18th Intl. Conference on Machine Learning, pp. 609–616 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Antonio Bella
    • 1
  • Cèsar Ferri
    • 1
  • José Hernández-Orallo
    • 1
  • Marïa José Ramírez-Quintana
    • 1
  1. 1.Universidad Politécnica de Valencia, DSICValenciaSpain

Personalised recommendations