Abstract
Uncertainty in probabilistic classifiers predictions is a key concern when models are used to support human decision making, in broader probabilistic pipelines or when sensitive automatic decisions have to be taken. Studies have shown that most models are not intrinsically well calibrated, meaning that their decision scores are not consistent with posterior probabilities. Hence being able to calibrate these models, or enforce calibration while learning them, has regained interest in recent literature. In this context, properly assessing calibration is paramount to quantify new contributions tackling calibration. However, there is room for improvement for commonly used metrics and evaluation of calibration could benefit from deeper analyses. Thus this paper focuses on the empirical evaluation of calibration metrics in the context of classification. More specifically it evaluates different estimators of the Expected Calibration Error (ECE), amongst which legacy estimators and some novel ones, proposed in this paper. We build an empirical procedure to quantify the quality of these ECE estimators, and use it to decide which estimator should be used in practice for different settings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The code ensuring the reproducibility of the experiments presented in this work is available at https://github.com/euranova/estimating_eces.
References
Bröcker, J., Smith, L.A.: Increasing the reliability of reliability diagrams. Weather forecast. 22(3), 651–661 (2007)
Chen, S.X.: Beta kernel estimators for density functions. Comput. Stat. Data Anal. 31(2), 131–145 (1999)
DeGroot, M.H., Fienberg, S.E.: The comparison and evaluation of forecasters. J. Roy. Stat. Soc. Series D (Stat.) 32(1–2), 12–22 (1983)
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: 34th International Conference on Machine Learning, ICML 2017, vol. 3, pp. 2130–2143 (2017)
Keren, G., Cummins, N., Schuller, B.: Calibrated prediction intervals for neural network regressors. IEEE Access 6, 54033–54041 (2018)
Kull, M., Flach, P.: Novel decompositions of proper scoring rules for classification: score adjustment as precursor to calibration. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, vol. 9284, pp. 68–85 (2015)
Kull, M., Perello-Nieto, M., Kängsepp, M., Song, H., Flach, P., Others: beyond temperature scaling: obtaining well-calibrated multiclass probabilities with Dirichlet calibration. In: Advances in Neural Information Processing System, vol. 32 (2019)
Kull, M., Silva Filho, T., Flach, P.: Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. In: Artificial Intelligence and Statistics, pp. 623–631. PMLR (2017)
Kumar, A., Sarawagi, S., Jain, U.: Trainable calibration measures for neural networks from kernel mean embeddings. In: International Conference on Machine Learning, pp. 2805–2814. PMLR (2018)
Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015)
Niculescu-Mizil, A., Caruana, R.: Predicting good probabilities with supervised learning. In: ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning, pp. 625–632 (2005)
Nixon, J.V., Dusenberry, M.W., Zhang, L., Jerfel, G., Tran, D.: Measuring calibration in deep learning. In: CVPR Workshops, vol. 2 (2019)
Platt, J.: Others: probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10(3), 61–74 (1999)
Silverman, B.W.: Density Estimation for Statistics and Data Analysis, vol. 26. CRC Press, Boca Raton (1986)
Song, H., Diethe, T., Kull, M., Flach, P.: Distribution calibration for regression. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 5897–5906. PMLR (2019)
Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: International Conference on Machine Learning (ICML), pp. 1–8 (2001)
Zadrozny, B., Elkan, C.: Transforming classifier scores into accurate multiclass probability estimates bianca. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 704. ACM (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Posocco, N., Bonnefoy, A. (2021). Estimating Expected Calibration Errors. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12894. Springer, Cham. https://doi.org/10.1007/978-3-030-86380-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-86380-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86379-1
Online ISBN: 978-3-030-86380-7
eBook Packages: Computer ScienceComputer Science (R0)