Machine Learning

, Volume 77, Issue 1, pp 103–123

Measuring classifier performance: a coherent alternative to the area under the ROC curve

Article

Abstract

The area under the ROC curve (AUC) is a very widely used measure of performance for classification and diagnostic rules. It has the appealing property of being objective, requiring no subjective input from the user. On the other hand, the AUC has disadvantages, some of which are well known. For example, the AUC can give potentially misleading results if ROC curves cross. However, the AUC also has a much more serious deficiency, and one which appears not to have been previously recognised. This is that it is fundamentally incoherent in terms of misclassification costs: the AUC uses different misclassification cost distributions for different classifiers. This means that using the AUC is equivalent to using different metrics to evaluate different classification rules. It is equivalent to saying that, using one classifier, misclassifying a class 1 point is p times as serious as misclassifying a class 0 point, but, using another classifier, misclassifying a class 1 point is P times as serious, where pP. This is nonsensical because the relative severities of different kinds of misclassifications of individual points is a property of the problem, not the classifiers which happen to have been chosen. This property is explored in detail, and a simple valid alternative to the AUC is proposed.

Keywords

ROC curves Classification AUC Specificity Sensitivity Misclassification rate Cost Loss Error rate 

References

  1. Adams, N. M., & Hand, D. J. (1999). Comparing classifiers when the misallocation costs are uncertain. Pattern Recognition, 32, 1139–1147. CrossRefGoogle Scholar
  2. Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30, 1145–1159. CrossRefGoogle Scholar
  3. Dodd, L. E., & Pepe, M. S. (2003). Partial AUC estimation and regression. Biometrics, 59, 614–623. CrossRefMathSciNetGoogle Scholar
  4. Fawcett, T. (2004). ROC graphs: notes and practical considerations for researchers. Palo Alto: HP Laboratories. Google Scholar
  5. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861–874. CrossRefGoogle Scholar
  6. Flach, P. A. (2003). The geometry of ROC space: understanding machine learning metrics through isometrics. In Proc. 20th international conference on machine learning (ICML’03) (pp. 194–201). Google Scholar
  7. Hand, D. J. (1997). Construction and assessment of classification rules. New York: Wiley. MATHGoogle Scholar
  8. Hand, D. J. (2004). Measurement theory and practice: the world through quantification. London: Arnold. MATHGoogle Scholar
  9. Hand, D. J. (2005). Good practice in retail credit scorecard assessment. Journal of the Operational Research Society, 56, 1109–1117. MATHCrossRefGoogle Scholar
  10. Hand, D. J. (2006). Classifier technology and the illusion of progress (with discussion). Statistical Science, 21, 1–34. MATHCrossRefMathSciNetGoogle Scholar
  11. Hand, D. J., & Till, R. J. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 45, 171–186. MATHCrossRefGoogle Scholar
  12. Hanley, J. A. (1989). Receiver operating characteristic (ROC) methodology: the state of the art. Critical Reviews in Diagnostic Imaging, 29, 307–335. Google Scholar
  13. Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under an ROC curve. Radiology, 143, 29–36. Google Scholar
  14. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. New York: Springer. MATHGoogle Scholar
  15. Jamain, A., & Hand, D. J. (2008). Mining supervised classification performance studies: a meta-analytic investigation. Journal of Classification, 25, 87–112. CrossRefGoogle Scholar
  16. Krzanowski, W. J., & Hand, D. J. (2009). ROC curves for continuous data. London: Chapman and Hall. MATHGoogle Scholar
  17. McClish, D. K. (1989). Analyzing a portion of the ROC curve. Medical Decision Making, 9, 190–195. CrossRefGoogle Scholar
  18. Pepe, M. S. (2003). The statistical evaluation of medical tests for classification and prediction. Oxford: Oxford University Press. MATHGoogle Scholar
  19. Provost, F., & Fawcett, T. (1997). Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In KDD-97—third international conference on knowledge discovery and data mining. Google Scholar
  20. Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In Proceedings of the 15th international conference on machine learning, ICML-98. Google Scholar
  21. Rudin, W. (1964). Principles of mathematical analysis (2nd edn.). New York: McGraw-Hill. MATHGoogle Scholar
  22. Scott, M. J. J., Niranjan, M., & Prager, R. W. (1998). Parcel: feature subset selection in variable cost domains (Technical Report CUED/F-INFENG/TR. 323). Cambridge University Engineering Department, UK. Google Scholar
  23. Thomas, L. C., Edelman, D. B., & Crook, J. N. (2002). Credit scoring and its applications. Philadelphia: Society for Industrial and Applied Mathematics. MATHGoogle Scholar
  24. Webb, A. (2002). Statistical pattern recognition (2nd edn.). New York: Wiley. MATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of MathematicsImperial College LondonLondonUK
  2. 2.Institute for Mathematical SciencesImperial College LondonLondonUK

Personalised recommendations