Knowledge and Information Systems

, Volume 54, Issue 1, pp 151–170 | Cite as

Binary classifier calibration using an ensemble of piecewise linear regression models

  • Mahdi Pakdaman NaeiniEmail author
  • Gregory F. Cooper
Regular Paper


In this paper, we present a new nonparametric calibration method called ensemble of near-isotonic regression (ENIR). The method can be considered as an extension of BBQ (Naeini et al., in: Proceedings of twenty-ninth AAAI conference on artificial intelligence, 2015b), a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression (IsoRegC) (Zadrozny and Elkan, in: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining 2002). ENIR is designed to address the key limitation of IsoRegC which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus, it can be used with many existing classification models to generate accurate probabilistic predictions. We demonstrate the performance of ENIR on synthetic and real datasets for commonly applied binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular, on the real data, we evaluated ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large-scale datasets, as it is \(O(N \log N)\) time, where N is the number of samples.


Classifier calibration Accurate probability Ensemble of near-isotonic regression Ensemble of linear trend estimation ENIR ELiTE 



We thank anonymous reviewers for their very useful comments and suggestions. Research reported in this publication was supported by Grant U54HG008540 awarded by the National Human Genome Research Institute through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative. It was also supported in part by NIH Grants R01GM088224 and R01LM012095. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. This research was also supported by Grant #4100070287 from the Pennsylvania Department of Health. The Department specifically disclaims responsibility for any analyses, interpretations, or conclusions.


  1. 1.
    Bahnsen AC, Stojanovic A, Aouada D, Ottersten B (2014) Improving credit card fraud detection with calibrated probabilities. In: Proceedings of the 2014 SIAM international conference on data miningGoogle Scholar
  2. 2.
    Barlow RE, Bartholomew DJ, Bremner J, Brunk HD (1972) Statistical inference under order restrictions: theory and application of isotonic regression. Wiley, New YorkzbMATHGoogle Scholar
  3. 3.
    Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2013) On the effect of calibration in classifier combination. Appl Intell 38(4):566–585CrossRefGoogle Scholar
  4. 4.
    Cavanaugh JE (1997) Unifying the derivations for the Akaike and corrected Akaike information criteria. Stat Probab Lett 33(2):201–208MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Chang C-C, Lin C-J (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27Google Scholar
  6. 6.
    Cohen I, Goldszmidt M (2004) Properties and benefits of calibrated classifiers. In: Proceedings of the European conference on principles of data mining and knowledge discovery. Springer, pp 125–136Google Scholar
  7. 7.
    DeGroot M, Fienberg S (1983) The comparison and evaluation of forecasters. Statistician 32:12–22CrossRefGoogle Scholar
  8. 8.
    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetzbMATHGoogle Scholar
  9. 9.
    Dong X, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 601–610Google Scholar
  10. 10.
    Fawcett T, Niculescu-Mizil A (2007) PAV and the ROC convex hull. Mach Learn 68(1):97–106CrossRefGoogle Scholar
  11. 11.
    Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701CrossRefzbMATHGoogle Scholar
  12. 12.
    Gill PE, Murray W, Wright MH (1981) Practical optimization. Academic press, LondonzbMATHGoogle Scholar
  13. 13.
    Gronat P, Obozinski G, Sivic J, Pajdla T (2013) Learning and calibrating per-location classifiers for visual place recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 907–914Google Scholar
  14. 14.
    Hashemi HB, Yazdani N, Shakery A, Naeini MP (2010) Application of ensemble models in web ranking. In: Proceedings of 5th international symposium on telecommunications (IST). IEEE, pp 726–731Google Scholar
  15. 15.
    Heckerman D, Geiger D, Chickering D (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20(3):197–243zbMATHGoogle Scholar
  16. 16.
    Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14:382–401MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70MathSciNetzbMATHGoogle Scholar
  18. 18.
    Iman RL, Davenport JM (1980) Approximations of the critical region of the friedman statistic. Commun Stat Theory Methods 9(6):571–595CrossRefzbMATHGoogle Scholar
  19. 19.
    Jiang L, Zhang H, Su J (2005) Learning k-nearest neighbor naïve Bayes for ranking. In: Proceedings of the advanced data mining and applications. Springer, pp 175–185Google Scholar
  20. 20.
    Jiang X, Osl M, Kim J, Ohno-Machado L (2012) Calibrating predictive model estimates to support personalized medicine. J Am Med Inform Assoc 19(2):263–274CrossRefGoogle Scholar
  21. 21.
    Kim S-J, Koh K, Boyd S, Gorinevsky D (2009) \(\ell _1\) trend filtering. SIAM Rev 51(2):339–360MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Lichman M (2013) UCI machine learning repository. Accessed 15 Nov 2015
  23. 23.
    Menon A, Jiang X, Vembu S, Elkan C, Ohno-Machado L (2012) Predicting accurate probabilities with a ranking loss. In: Proceedings of the international conference on machine learning, pp 703–710Google Scholar
  24. 24.
    Niculescu-Mizil A, Caruana R (2005) Predicting good probabilities with supervised learning. In: Proceedings of the international conference on machine learning, pp 625–632Google Scholar
  25. 25.
    Naeini MP, Cooper GF (2016a) Binary classifier calibration using an ensemble of linear trend estimation. In: Proceedings of the 2016 SIAM international conference on data mining. SIAM, pp 261–269Google Scholar
  26. 26.
    Naeini MP, Cooper GF (2016b) Binary classifier calibration using an ensemble of near isotonic regression models. In: 2016 IEEE 16th International Conference on data mining (ICDM). IEEE, pp 360–369Google Scholar
  27. 27.
    Naeini MP, Cooper GF, Hauskrecht M (2015a) Binary classifier calibration using a Bayesian non-parametric approach. In: Proceedings of the SIAM data mining (SDM) conferenceGoogle Scholar
  28. 28.
    Naeini MP, Cooper G, Hauskrecht M (2015b) Obtaining well calibrated probabilities using Bayesian binning. In: Proceedings of twenty-ninth AAAI conference on artificial intelligenceGoogle Scholar
  29. 29.
    Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 10(3):61–74Google Scholar
  30. 30.
    Ramdas A, Tibshirani RJ (2016) Fast and flexible ADMM algorithms for trend filtering. J Comput Graph Stat 25(3):839–858MathSciNetCrossRefGoogle Scholar
  31. 31.
    Robnik-Šikonja M, Kononenko I (2008) Explaining classifications for individual instances. IEEE Trans Knowl Data Eng 20(5):589–600CrossRefGoogle Scholar
  32. 32.
    Russell S, Norvig P (2010) Artificial intelligence: a modern approach. Prentice hall, Englewood CliffszbMATHGoogle Scholar
  33. 33.
    Schwarz G et al (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Takahashi K, Takamura H, Okumura M (2009) Direct estimation of class membership probabilities for multiclass classification using multiple scores. Knowl Inf Syst 19(2):185–210CrossRefGoogle Scholar
  35. 35.
    Tibshirani RJ, Hoefling H, Tibshirani R (2011) Nearly-isotonic regression. Technometrics 53(1):54–61MathSciNetCrossRefGoogle Scholar
  36. 36.
    Wallace BC, Dahabreh IJ (2014) Improving class probability estimates for imbalanced data. Knowl Inf Syst 41(1):33–52CrossRefGoogle Scholar
  37. 37.
    Whalen S, Pandey G (2013) A comparative analysis of ensemble classifiers: case studies in genomics. In: 2013 IEEE 13th international conference on data mining (ICDM). IEEE, pp 807–816Google Scholar
  38. 38.
    Zadrozny B, Elkan C (2001a) Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 204–213Google Scholar
  39. 39.
    Zadrozny B, Elkan C (2001b) Obtaining calibrated probability estimates from decision trees and naïve Bayesian classifiers. In: Proceedings of the international conference on machine learning, pp 609–616Google Scholar
  40. 40.
    Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 694–699Google Scholar
  41. 41.
    Zhang H, Su J (2004) Naïve Bayesian classifiers for ranking. In: Proceedings of the European conference on machine learning (ECML). Springer, pp 501–512Google Scholar
  42. 42.
    Zhong LW, Kwok JT (2013) Accurate probability calibration for multiple classifiers. In: Proceedings of the twenty-third international joint conference on artificial intelligence. AAAI Press, pp 1939–1945Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2017

Authors and Affiliations

  1. 1.Paulson School of Engineering and Applied SciencesHarvard UniversityCambridgeUSA
  2. 2.Department of Biomedical InformaticsHarvard Medical SchoolBostonUSA
  3. 3.Department of Biomedical InformaticsUniversity of PittsburghPittsburghUSA

Personalised recommendations