The area under the receiver operating characteristic curve (AUROC) of the test set is used throughout machine learning (ML) for assessing a model’s performance. However, when concordance is not the only ambition, this gives only a partial insight into performance, masking distribution shifts of model outputs and model instability.
Change history
12 April 2024
A Correction to this paper has been published: https://doi.org/10.1038/s42256-024-00834-6
References
Halligan, S., Altman, D. G. & Mallett, S. Eur. Radiol. 25, 932–939 (2015).
Lobo, J. M., Jiménez-Valverde, A. & Real, R. Glob. Ecol. Biogeogr. 17, 145–151 (2008).
Kwegyir-Aggrey, K., Gerchick, M., Mohan, M. Horowitz, A. & Venkatasubramanian, S. In Proc. 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’23) 1570–1583 (ACM, 2023).
White, N., Parsons, R., Collins, G. & Barnett, A. BMC Med. 21, 339 (2023).
Rabe, C. et al. Alzheimers Dement. 19, 1393–1402 (2023).
Roberts, M. et al. Nat. Mach. Intell. 3, 199–217 (2021).
Wynants, L. et al. BMJ 369, m1328 (2020).
Chicco, D. & Jurman, G. BioData Min. 16, 4 (2023).
Hazan, A. & Dittmer, S. CodeOcean https://doi.org/10.24433/CO.1960655.v1 (2023).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Rights and permissions
About this article
Cite this article
Roberts, M., Hazan, A., Dittmer, S. et al. The curious case of the test set AUROC. Nat Mach Intell 6, 373–376 (2024). https://doi.org/10.1038/s42256-024-00817-7
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-024-00817-7
- Springer Nature Limited