Skip to main content

Estimating Expected Calibration Errors

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2021 (ICANN 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12894))

Included in the following conference series:

Abstract

Uncertainty in probabilistic classifiers predictions is a key concern when models are used to support human decision making, in broader probabilistic pipelines or when sensitive automatic decisions have to be taken. Studies have shown that most models are not intrinsically well calibrated, meaning that their decision scores are not consistent with posterior probabilities. Hence being able to calibrate these models, or enforce calibration while learning them, has regained interest in recent literature. In this context, properly assessing calibration is paramount to quantify new contributions tackling calibration. However, there is room for improvement for commonly used metrics and evaluation of calibration could benefit from deeper analyses. Thus this paper focuses on the empirical evaluation of calibration metrics in the context of classification. More specifically it evaluates different estimators of the Expected Calibration Error (ECE), amongst which legacy estimators and some novel ones, proposed in this paper. We build an empirical procedure to quantify the quality of these ECE estimators, and use it to decide which estimator should be used in practice for different settings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The code ensuring the reproducibility of the experiments presented in this work is available at https://github.com/euranova/estimating_eces.

References

  1. Bröcker, J., Smith, L.A.: Increasing the reliability of reliability diagrams. Weather forecast. 22(3), 651–661 (2007)

    Article  Google Scholar 

  2. Chen, S.X.: Beta kernel estimators for density functions. Comput. Stat. Data Anal. 31(2), 131–145 (1999)

    Article  MathSciNet  Google Scholar 

  3. DeGroot, M.H., Fienberg, S.E.: The comparison and evaluation of forecasters. J. Roy. Stat. Soc. Series D (Stat.) 32(1–2), 12–22 (1983)

    Google Scholar 

  4. Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: 34th International Conference on Machine Learning, ICML 2017, vol. 3, pp. 2130–2143 (2017)

    Google Scholar 

  5. Keren, G., Cummins, N., Schuller, B.: Calibrated prediction intervals for neural network regressors. IEEE Access 6, 54033–54041 (2018)

    Article  Google Scholar 

  6. Kull, M., Flach, P.: Novel decompositions of proper scoring rules for classification: score adjustment as precursor to calibration. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, vol. 9284, pp. 68–85 (2015)

    Google Scholar 

  7. Kull, M., Perello-Nieto, M., Kängsepp, M., Song, H., Flach, P., Others: beyond temperature scaling: obtaining well-calibrated multiclass probabilities with Dirichlet calibration. In: Advances in Neural Information Processing System, vol. 32 (2019)

    Google Scholar 

  8. Kull, M., Silva Filho, T., Flach, P.: Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. In: Artificial Intelligence and Statistics, pp. 623–631. PMLR (2017)

    Google Scholar 

  9. Kumar, A., Sarawagi, S., Jain, U.: Trainable calibration measures for neural networks from kernel mean embeddings. In: International Conference on Machine Learning, pp. 2805–2814. PMLR (2018)

    Google Scholar 

  10. Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015)

    Google Scholar 

  11. Niculescu-Mizil, A., Caruana, R.: Predicting good probabilities with supervised learning. In: ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning, pp. 625–632 (2005)

    Google Scholar 

  12. Nixon, J.V., Dusenberry, M.W., Zhang, L., Jerfel, G., Tran, D.: Measuring calibration in deep learning. In: CVPR Workshops, vol. 2 (2019)

    Google Scholar 

  13. Platt, J.: Others: probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10(3), 61–74 (1999)

    Google Scholar 

  14. Silverman, B.W.: Density Estimation for Statistics and Data Analysis, vol. 26. CRC Press, Boca Raton (1986)

    MATH  Google Scholar 

  15. Song, H., Diethe, T., Kull, M., Flach, P.: Distribution calibration for regression. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 5897–5906. PMLR (2019)

    Google Scholar 

  16. Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: International Conference on Machine Learning (ICML), pp. 1–8 (2001)

    Google Scholar 

  17. Zadrozny, B., Elkan, C.: Transforming classifier scores into accurate multiclass probability estimates bianca. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 704. ACM (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicolas Posocco .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Posocco, N., Bonnefoy, A. (2021). Estimating Expected Calibration Errors. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12894. Springer, Cham. https://doi.org/10.1007/978-3-030-86380-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86380-7_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86379-1

  • Online ISBN: 978-3-030-86380-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics