Estimating Expected Calibration Errors

Posocco, Nicolas; Bonnefoy, Antoine

doi:10.1007/978-3-030-86380-7_12

Nicolas Posocco¹² &
Antoine Bonnefoy¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12894))

Included in the following conference series:

International Conference on Artificial Neural Networks

2283 Accesses
2 Citations

Abstract

Uncertainty in probabilistic classifiers predictions is a key concern when models are used to support human decision making, in broader probabilistic pipelines or when sensitive automatic decisions have to be taken. Studies have shown that most models are not intrinsically well calibrated, meaning that their decision scores are not consistent with posterior probabilities. Hence being able to calibrate these models, or enforce calibration while learning them, has regained interest in recent literature. In this context, properly assessing calibration is paramount to quantify new contributions tackling calibration. However, there is room for improvement for commonly used metrics and evaluation of calibration could benefit from deeper analyses. Thus this paper focuses on the empirical evaluation of calibration metrics in the context of classification. More specifically it evaluates different estimators of the Expected Calibration Error (ECE), amongst which legacy estimators and some novel ones, proposed in this paper. We build an empirical procedure to quantify the quality of these ECE estimators, and use it to decide which estimator should be used in practice for different settings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The code ensuring the reproducibility of the experiments presented in this work is available at https://github.com/euranova/estimating_eces.

References

Bröcker, J., Smith, L.A.: Increasing the reliability of reliability diagrams. Weather forecast. 22(3), 651–661 (2007)
Article Google Scholar
Chen, S.X.: Beta kernel estimators for density functions. Comput. Stat. Data Anal. 31(2), 131–145 (1999)
Article MathSciNet Google Scholar
DeGroot, M.H., Fienberg, S.E.: The comparison and evaluation of forecasters. J. Roy. Stat. Soc. Series D (Stat.) 32(1–2), 12–22 (1983)
Google Scholar
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: 34th International Conference on Machine Learning, ICML 2017, vol. 3, pp. 2130–2143 (2017)
Google Scholar
Keren, G., Cummins, N., Schuller, B.: Calibrated prediction intervals for neural network regressors. IEEE Access 6, 54033–54041 (2018)
Article Google Scholar
Kull, M., Flach, P.: Novel decompositions of proper scoring rules for classification: score adjustment as precursor to calibration. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, vol. 9284, pp. 68–85 (2015)
Google Scholar
Kull, M., Perello-Nieto, M., Kängsepp, M., Song, H., Flach, P., Others: beyond temperature scaling: obtaining well-calibrated multiclass probabilities with Dirichlet calibration. In: Advances in Neural Information Processing System, vol. 32 (2019)
Google Scholar
Kull, M., Silva Filho, T., Flach, P.: Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. In: Artificial Intelligence and Statistics, pp. 623–631. PMLR (2017)
Google Scholar
Kumar, A., Sarawagi, S., Jain, U.: Trainable calibration measures for neural networks from kernel mean embeddings. In: International Conference on Machine Learning, pp. 2805–2814. PMLR (2018)
Google Scholar
Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015)
Google Scholar
Niculescu-Mizil, A., Caruana, R.: Predicting good probabilities with supervised learning. In: ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning, pp. 625–632 (2005)
Google Scholar
Nixon, J.V., Dusenberry, M.W., Zhang, L., Jerfel, G., Tran, D.: Measuring calibration in deep learning. In: CVPR Workshops, vol. 2 (2019)
Google Scholar
Platt, J.: Others: probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10(3), 61–74 (1999)
Google Scholar
Silverman, B.W.: Density Estimation for Statistics and Data Analysis, vol. 26. CRC Press, Boca Raton (1986)
MATH Google Scholar
Song, H., Diethe, T., Kull, M., Flach, P.: Distribution calibration for regression. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 5897–5906. PMLR (2019)
Google Scholar
Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: International Conference on Machine Learning (ICML), pp. 1–8 (2001)
Google Scholar
Zadrozny, B., Elkan, C.: Transforming classifier scores into accurate multiclass probability estimates bianca. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 704. ACM (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

EURA NOVA, Marseille, France
Nicolas Posocco & Antoine Bonnefoy

Authors

Nicolas Posocco
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Bonnefoy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Posocco .

Editor information

Editors and Affiliations

Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
iMotions A/S, Copenhagen, Denmark
Paolo Masulli
University of Tübingen, Tübingen, Baden-Württemberg, Germany
Sebastian Otte
Universität Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Posocco, N., Bonnefoy, A. (2021). Estimating Expected Calibration Errors. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12894. Springer, Cham. https://doi.org/10.1007/978-3-030-86380-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-86380-7_12
Published: 07 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86379-1
Online ISBN: 978-3-030-86380-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics