Measuring classifier performance: a coherent alternative to the area under the ROC curve
 David J. Hand
 … show all 1 hide
Abstract
The area under the ROC curve (AUC) is a very widely used measure of performance for classification and diagnostic rules. It has the appealing property of being objective, requiring no subjective input from the user. On the other hand, the AUC has disadvantages, some of which are well known. For example, the AUC can give potentially misleading results if ROC curves cross. However, the AUC also has a much more serious deficiency, and one which appears not to have been previously recognised. This is that it is fundamentally incoherent in terms of misclassification costs: the AUC uses different misclassification cost distributions for different classifiers. This means that using the AUC is equivalent to using different metrics to evaluate different classification rules. It is equivalent to saying that, using one classifier, misclassifying a class 1 point is p times as serious as misclassifying a class 0 point, but, using another classifier, misclassifying a class 1 point is P times as serious, where p≠P. This is nonsensical because the relative severities of different kinds of misclassifications of individual points is a property of the problem, not the classifiers which happen to have been chosen. This property is explored in detail, and a simple valid alternative to the AUC is proposed.
 Adams, N. M., Hand, D. J. (1999) Comparing classifiers when the misallocation costs are uncertain. Pattern Recognition 32: pp. 11391147 CrossRef
 Bradley, A. P. (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30: pp. 11451159 CrossRef
 Dodd, L. E., Pepe, M. S. (2003) Partial AUC estimation and regression. Biometrics 59: pp. 614623 CrossRef
 Fawcett, T. (2004) ROC graphs: notes and practical considerations for researchers. HP Laboratories, Palo Alto
 Fawcett, T. (2006) An introduction to ROC analysis. Pattern Recognition Letters 27: pp. 861874 CrossRef
 Flach, P. A. (2003). The geometry of ROC space: understanding machine learning metrics through isometrics. In Proc. 20th international conference on machine learning (ICML’03) (pp. 194–201).
 Hand, D. J. (1997) Construction and assessment of classification rules. Wiley, New York
 Hand, D. J. (2004) Measurement theory and practice: the world through quantification. Arnold, London
 Hand, D. J. (2005) Good practice in retail credit scorecard assessment. Journal of the Operational Research Society 56: pp. 11091117 CrossRef
 Hand, D. J. (2006) Classifier technology and the illusion of progress (with discussion). Statistical Science 21: pp. 134 CrossRef
 Hand, D. J., Till, R. J. (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45: pp. 171186 CrossRef
 Hanley, J. A. (1989) Receiver operating characteristic (ROC) methodology: the state of the art. Critical Reviews in Diagnostic Imaging 29: pp. 307335
 Hanley, J. A., McNeil, B. J. (1982) The meaning and use of the area under an ROC curve. Radiology 143: pp. 2936
 Hastie, T., Tibshirani, R., Friedman, J. (2001) The elements of statistical learning. Springer, New York
 Jamain, A., Hand, D. J. (2008) Mining supervised classification performance studies: a metaanalytic investigation. Journal of Classification 25: pp. 87112 CrossRef
 Krzanowski, W. J., Hand, D. J. (2009) ROC curves for continuous data. Chapman and Hall, London
 McClish, D. K. (1989) Analyzing a portion of the ROC curve. Medical Decision Making 9: pp. 190195 CrossRef
 Pepe, M. S. (2003) The statistical evaluation of medical tests for classification and prediction. Oxford University Press, Oxford
 Provost, F., & Fawcett, T. (1997). Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In KDD97—third international conference on knowledge discovery and data mining.
 Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In Proceedings of the 15th international conference on machine learning, ICML98.
 Rudin, W. (1964) Principles of mathematical analysis. McGrawHill, New York
 Scott, M. J. J., Niranjan, M., & Prager, R. W. (1998). Parcel: feature subset selection in variable cost domains (Technical Report CUED/FINFENG/TR. 323). Cambridge University Engineering Department, UK.
 Thomas, L. C., Edelman, D. B., Crook, J. N. (2002) Credit scoring and its applications. Society for Industrial and Applied Mathematics, Philadelphia
 Webb, A. (2002) Statistical pattern recognition. Wiley, New York
 Title
 Measuring classifier performance: a coherent alternative to the area under the ROC curve
 Journal

Machine Learning
Volume 77, Issue 1 , pp 103123
 Cover Date
 20091001
 DOI
 10.1007/s1099400951195
 Print ISSN
 08856125
 Online ISSN
 15730565
 Publisher
 Springer US
 Additional Links
 Topics
 Keywords

 ROC curves
 Classification
 AUC
 Specificity
 Sensitivity
 Misclassification rate
 Cost
 Loss
 Error rate
 Industry Sectors
 Authors

 David J. Hand ^{(1)} ^{(2)}
 Author Affiliations

 1. Department of Mathematics, Imperial College London, London, UK
 2. Institute for Mathematical Sciences, Imperial College London, London, UK