Measuring classifier performance: a coherent alternative to the area under the ROC curve

Hand, David J.

doi:10.1007/s10994-009-5119-5

Measuring classifier performance: a coherent alternative to the area under the ROC curve

Published: 16 June 2009

Volume 77, pages 103–123, (2009)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Measuring classifier performance: a coherent alternative to the area under the ROC curve

Download PDF

David J. Hand^1,2

21k Accesses
641 Citations
37 Altmetric
2 Mentions
Explore all metrics

Abstract

The area under the ROC curve (AUC) is a very widely used measure of performance for classification and diagnostic rules. It has the appealing property of being objective, requiring no subjective input from the user. On the other hand, the AUC has disadvantages, some of which are well known. For example, the AUC can give potentially misleading results if ROC curves cross. However, the AUC also has a much more serious deficiency, and one which appears not to have been previously recognised. This is that it is fundamentally incoherent in terms of misclassification costs: the AUC uses different misclassification cost distributions for different classifiers. This means that using the AUC is equivalent to using different metrics to evaluate different classification rules. It is equivalent to saying that, using one classifier, misclassifying a class 1 point is p times as serious as misclassifying a class 0 point, but, using another classifier, misclassifying a class 1 point is P times as serious, where p≠P. This is nonsensical because the relative severities of different kinds of misclassifications of individual points is a property of the problem, not the classifiers which happen to have been chosen. This property is explored in detail, and a simple valid alternative to the AUC is proposed.

Avoid common mistakes on your manuscript.

References

Adams, N. M., & Hand, D. J. (1999). Comparing classifiers when the misallocation costs are uncertain. Pattern Recognition, 32, 1139–1147.
Article Google Scholar
Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30, 1145–1159.
Article Google Scholar
Dodd, L. E., & Pepe, M. S. (2003). Partial AUC estimation and regression. Biometrics, 59, 614–623.
Article MathSciNet Google Scholar
Fawcett, T. (2004). ROC graphs: notes and practical considerations for researchers. Palo Alto: HP Laboratories.
Google Scholar
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861–874.
Article Google Scholar
Flach, P. A. (2003). The geometry of ROC space: understanding machine learning metrics through isometrics. In Proc. 20th international conference on machine learning (ICML’03) (pp. 194–201).
Hand, D. J. (1997). Construction and assessment of classification rules. New York: Wiley.
MATH Google Scholar
Hand, D. J. (2004). Measurement theory and practice: the world through quantification. London: Arnold.
MATH Google Scholar
Hand, D. J. (2005). Good practice in retail credit scorecard assessment. Journal of the Operational Research Society, 56, 1109–1117.
Article MATH Google Scholar
Hand, D. J. (2006). Classifier technology and the illusion of progress (with discussion). Statistical Science, 21, 1–34.
Article MATH MathSciNet Google Scholar
Hand, D. J., & Till, R. J. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 45, 171–186.
Article MATH Google Scholar
Hanley, J. A. (1989). Receiver operating characteristic (ROC) methodology: the state of the art. Critical Reviews in Diagnostic Imaging, 29, 307–335.
Google Scholar
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under an ROC curve. Radiology, 143, 29–36.
Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. New York: Springer.
MATH Google Scholar
Jamain, A., & Hand, D. J. (2008). Mining supervised classification performance studies: a meta-analytic investigation. Journal of Classification, 25, 87–112.
Article Google Scholar
Krzanowski, W. J., & Hand, D. J. (2009). ROC curves for continuous data. London: Chapman and Hall.
MATH Google Scholar
McClish, D. K. (1989). Analyzing a portion of the ROC curve. Medical Decision Making, 9, 190–195.
Article Google Scholar
Pepe, M. S. (2003). The statistical evaluation of medical tests for classification and prediction. Oxford: Oxford University Press.
MATH Google Scholar
Provost, F., & Fawcett, T. (1997). Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In KDD-97—third international conference on knowledge discovery and data mining.
Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In Proceedings of the 15th international conference on machine learning, ICML-98.
Rudin, W. (1964). Principles of mathematical analysis (2nd edn.). New York: McGraw-Hill.
MATH Google Scholar
Scott, M. J. J., Niranjan, M., & Prager, R. W. (1998). Parcel: feature subset selection in variable cost domains (Technical Report CUED/F-INFENG/TR. 323). Cambridge University Engineering Department, UK.
Thomas, L. C., Edelman, D. B., & Crook, J. N. (2002). Credit scoring and its applications. Philadelphia: Society for Industrial and Applied Mathematics.
MATH Google Scholar
Webb, A. (2002). Statistical pattern recognition (2nd edn.). New York: Wiley.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Imperial College London, London, UK
David J. Hand
Institute for Mathematical Sciences, Imperial College London, London, UK
David J. Hand

Authors

David J. Hand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David J. Hand.

Additional information

Editor: Johannes Fürnkranz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hand, D.J. Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 77, 103–123 (2009). https://doi.org/10.1007/s10994-009-5119-5

Download citation

Received: 21 August 2008
Revised: 24 March 2009
Accepted: 04 May 2009
Published: 16 June 2009
Issue Date: October 2009
DOI: https://doi.org/10.1007/s10994-009-5119-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Measuring classifier performance: a coherent alternative to the area under the ROC curve

Abstract

Article PDF

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

A random forest guided tour

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Measuring classifier performance: a coherent alternative to the area under the ROC curve

Abstract

Article PDF

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

A random forest guided tour

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation