Machine Learning

, Volume 65, Issue 1, pp 95–130 | Cite as

Cost curves: An improved method for visualizing classifier performance

Article

Abstract

This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2-class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing classifier performance for most purposes. This is because they visually support several crucial types of performance assessment that cannot be done easily with ROC curves, such as showing confidence intervals on a classifier's performance, and visualizing the statistical significance of the difference in performance of two classifiers. A software tool supporting all the cost curve analysis described in this paper is available from the authors.

Keywords

Performance evaluation Classifiers ROC curves Machine learning 

References

  1. Adams, N. M., & Hand, D. J. (1999). Comparing classifiers when misclassification costs are uncertain. Pattern Recognition, 32, 1139–1147.CrossRefGoogle Scholar
  2. Agarwal, S., Har-Peled, S., & Roth, D. (2005). A uniform convergence bound for the area under the ROC curve. In: Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (pp. 1–8).Google Scholar
  3. Bengio, S., & Mariéthoz, J. (2004). The expected performance curve: a new assessment measure for person authentication. In: Proceedings of Odyssey 2004: The Speaker and Language Recognition Workshop (pp. 9–16).Google Scholar
  4. Bengio, S., Marithoz, J., & Keller, M. (2005). The expected performance curve. In: Proceedings of the Second Workshop on ROC Analysis in ML (pp. 9–16).Google Scholar
  5. Bradford, J., Kunz, C., Kohavi, R., Brunk, C., & Brodley, C. E. (1998). Pruning decision trees with misclassification costs. In: Proceedings of the Tenth European Conference on Machine Learning (pp. 131–136).Google Scholar
  6. Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159.CrossRefGoogle Scholar
  7. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth.MATHGoogle Scholar
  8. Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4), 261–283.Google Scholar
  9. Cohen, W. (1995). Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning (pp. 115–123).Google Scholar
  10. Cortes, C., & Mohri, M. (2005). Confidence intervals for the area under the ROC curve. In: L.K. Saul, Y. Weiss, & L. Bottou, (eds.): Advances in neural information processing systems 17. MIT Press, (pp. 305–312).Google Scholar
  11. Domingos, P. (1999) MetaCost: A general method for making classifiers cost-sensitive. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining. (pp. 155–164).Google Scholar
  12. Drummond, C., & Holte, R. C. (2000a). Explicitly representing expected cost: An alternative to ROC representation. In: Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining. (pp. 198–207).Google Scholar
  13. Drummond, C., & Holte, R. C. (2000b). Exploiting the cost (In)sensitivity of decision tree splitting criteria. In: Proceedings of the Seventeenth International Conference on Machine Learning. (pp. 239–246).Google Scholar
  14. Drummond, C., & Holte, R. C. (2003). C4.5, Class imbalance, and cost sensitivity: why undersampling beats oversampling. In: Proceedings of the Twentieth International Conference on Machine Learning: Workshop - Learning from Imbalanced Data Sets II. (pp. 1–8).Google Scholar
  15. Duda, R. O., & Hart, P. E. (1973). Pattern Classification and scene analysis. New York: Wiley.MATHGoogle Scholar
  16. Dukic, V., & Gatsonis, C. (2003) Meta-analysis of diagnostic test accuracy assessment studies with varying number of thresholds. Biometrics 59(4), 936–946.Google Scholar
  17. Efron, B., & Tibshirani, R. (1993). An Introduction to the bootstrap. London: Chapman and Hall.MATHGoogle Scholar
  18. Fawcett, T. (2003). ROC Graphs: Notes and practical considerations for researchers. Technical Report HPL-2003-4, HP Labs.Google Scholar
  19. Fawcett, T., & Provost, F. (1997). Adaptive fraud detection. Journal of Data Mining and Knowledge Discovery 1, 291–316.Google Scholar
  20. Ferri, C., Flach, P. A., & Hernández-Orallo, J. (2002). Learning decision trees using the area under the ROC curve. In: Proceedings of the Nineteenth International Conference on Machine Learning (pp. 139– 146).Google Scholar
  21. Ferri, C., Hernández-Orallo, J., & Salido, M. A. (2003). Volume under the ROC surface for multi-class problems. In: Proceedings of the Fourteenth European Conference on Machine Learning (pp. 108–120).Google Scholar
  22. Flach, P. (2003). The geometry of ROC Space: Understanding machine learning metrics through ROC isometrics’. In: Proceedings of the Twentieth International Conference on Machine Learning (pp. 194–201).Google Scholar
  23. Flach, P. A. (2004). ICML Tutorial: The many faces of ROC analysis in machine learning. http://www.cs.bris.ac.uk/~flach/ICML04tutorial/index.html.
  24. Halpern, E. J., Albert, M., Krieger, A. M., Metz, C. E., & Maidment, A. D. (1996). Comparison of receiver operating characteristic curves on the basis of optimal operating points. Statistics for Radiologists, 3, 245–253.Google Scholar
  25. Hand, D. J. (1997). Construction and assessment of classification rules. New York: Wiley.MATHGoogle Scholar
  26. Hilden, J., & Glasziou, P. (1996). Regret graphs, diagnostic uncertainty, and youden’s index. Statistics in Medicine, 15, 969–986.CrossRefGoogle Scholar
  27. Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11(1), 63–91.MATHMathSciNetCrossRefGoogle Scholar
  28. Japkowicz, N., Myers, C., & Gluck, M. (1995). A novelty detection approach to classification. In: Proceedings of the Fourteenth Joint Conference on Artificial Intelligence (pp. 518–523).Google Scholar
  29. Jensen, K., Muller, H. H., & Schafer, H. (2000). Regional confidence bands for ROC curves. Statistics in Medicine, 19(4), 493–509.CrossRefGoogle Scholar
  30. Karwath, A., & King, R. D. (2002). Homology induction: The use of machine learning to improve sequence similarity searches. BMC Bioinformatics 3, 11.CrossRefGoogle Scholar
  31. Kubat, M., Holte, R. C., & Matwin, S. (1998). Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30, 195–215.CrossRefGoogle Scholar
  32. Kukar, M., & Kononenko, I (1998). Cost-sensitive learning with neural networks. In: Proceedings of the Thirteenth European Conference on Artificial Intelligence (pp. 445–449).Google Scholar
  33. Ling, C. X., & Li, C. (1998). Data mining for direct marketing: Problems and solutions. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (pp. 73–79).Google Scholar
  34. Ma, G., & Hall, W. J. (1993). Confidence bands for receiver operating characteristic curves. Medical Decision Making, 13(3), 191–197.Google Scholar
  35. Macskassy, S. A., Provost, F., & Rosset, S. (2005). ROC confidence bands: An empirical evaluation. In: Proceedings of the Twenty-Second International Conference on Machine Learning (pp. 537–544).Google Scholar
  36. Margineantu, D. D. (2002). Class probability estimation and cost-sensitive classification decisions. In: Proceedings of the Thirteenth European Conference on Machine Learning. (pp. 270–281).Google Scholar
  37. Margineantu, D. D., & Dietterich, T. G. (2000). Bootstrap methods for the cost-sensitive evaluation of classifiers. In: Proceedings of the Seventeenth International Conference on Machine Learning (pp. 582–590).Google Scholar
  38. McNeil, B. J., & Hanley, J. A. (1984). Statistical approaches to the analysis of receiver operating characteristic (ROC) curves. Medical Decision Making, 4(2), 137–150.Google Scholar
  39. Metz, C. E., & Kronman, H. B. (1980). Statistical significance tests for binormal ROC curves. Journal of Mathematical Psychology, 22(3), 218–243.MATHCrossRefGoogle Scholar
  40. Metz, C. E., Wang, P. L., & Kronman, H. B., (1983). A new approach for testing the significance of differences between ROC curves measured from correlated data. In: Proceedings of the Eighth Conference on Information Processing in Medical Imaging (pp. 432–445).Google Scholar
  41. Miller, K., Ramaswami, S., Rousseeuw, P., Sellares, T., Souvaine, D., Streinu, I., & Struyf, A. (2001). Fast implementation of depth contours using topological sweep. In: Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete algorithms (pp. 690–699).Google Scholar
  42. Newman, D., Hettich, S., Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html. University of California, Irvine, Dept. of Information and Computer Sciences.
  43. Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs. In: Proceedings of the Eleventh International Conference on Machine Learning (pp. 217–225).Google Scholar
  44. Platt, R. W., Hanley, J. A., & Yang, H. (2000). Bootstrap confidence intervals for the sensitivity of a quantitative diagnostic test. Statistics in Medicine19 (3), 313–322.Google Scholar
  45. Pottmann, H. (2001). Basics of projective geometry. An institute for mathematics and its applications tutorial. Geometric Design: Geometries for CAGD http://www.ima.umn.edu/multimedia/spring/tut7.html.
  46. Preparata, F. P., & Shamos, M. I. (1988). Computational Geometry, An Introduction, Text and Monographs in Computer Science. New York: Springer-Verlag.Google Scholar
  47. Provost, F., & Fawcett, T. (1997). Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (pp. 43–48).Google Scholar
  48. Provost, F., & Fawcett, T. (1998). Robust classification systems for imprecise environments. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence. (pp. 706–713).Google Scholar
  49. Provost, F., & Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning, 42, 203–231.MATHCrossRefGoogle Scholar
  50. Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning (pp. 43– 48).Google Scholar
  51. Quinlan, J. R. (1993). C4.5 Programs for Machine Learning. San Mateo, California: Morgan Kaufmann.Google Scholar
  52. Radivojac, P., Sivalingam, K., & Obradovic, Z. (2003). Learning from class-imbalanced data in wireless sensor networks. In: Proceedings of the Sixty-Second IEEE Semiannual Vehicular Technology Conference (pp. 3030–3034).Google Scholar
  53. Saitta, L., & Neri, F. (1998). Learning in the “Real World”.Machine Learning, 30(2-3), 133–163.CrossRefGoogle Scholar
  54. Srinivasan, A. (1999). Note on the location of optimal classifiers in n-dimensional ROC space. Technical Report PRG-TR-2-99, Oxford University Computing Laboratory, Oxford University, Oxford. UK.Google Scholar
  55. Swets, J. A. (1967). Information Retrieval Systems. Cambridge, Massachusetts: Bolt, Beranek and Newman.Google Scholar
  56. Swets, J. A. (1988) Measuring the accuracy of diagnostic systems. Science, 240, 1285–1293.Google Scholar
  57. Swets, J. A., &. Pickett, R. M. (1982). Evaluation of Diagnostic Systems : Methods from Signal Detection Theory. New York: Academic Press.Google Scholar
  58. Tilbury, J., Eetvelt, P. V., Garibaldi, J., Curnow, J., & Ifeachor, E. (2000). Receiver operating characteristic analysis for intelligent medical systems—a new approach for finding non-parametric confidence intervals. IEEE Transactions Biomedical Engineering, 47(7), 952–963.CrossRefGoogle Scholar
  59. Ting, K. M. (2000). An empirical study of metacost using boosting algorithms. In: Proceedings of the Eleventh European Conference on Machine Learning (pp. 413–425).Google Scholar
  60. Ting, K. M. (2002). Issues in classifier evaluation using optimal cost curves. In: Proceedings of The Nineteenth International Conference on Machine Learning (pp. 642–649).Google Scholar
  61. Ting, K. M. (2004). Matching model versus single model: A study of the requirement to match class distribution using decision trees. In: Proceedings of the Fifteenth European Conference on Machine Learning (pp. 429–440).Google Scholar
  62. Turney, P. D. (1995). Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. Journal of Artificial Intelligence Research, 2, 369–409.Google Scholar
  63. van Rijsbergen, C. J. (1979). Information retrieval. London: Butterworths.MATHGoogle Scholar
  64. Webb, G., & Ting, K. M. (2005). On the application of ROC analysis to predict classification performance under varying class distributions. Machine Learning, 58(1), 25–32.MATHCrossRefGoogle Scholar
  65. Webb, G. I. (1996). Cost-sensitive specialization. In: Proceedings of the Fourteenth Pacific Rim International Conference on Artificial Intelligence (pp. 23–34).Google Scholar
  66. Weiss, G. M., & Provost, F. (2003). Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19, 315–354.MATHGoogle Scholar
  67. Witten, I. H., & Frank, E.,(2005). Data Mining: Practical Machine Learning Tools and Techniques. San Francisco: Morgan Kaufmann.Google Scholar
  68. Yan, L., Dodier, R., Mozer, M. C., & Wolniewicz, R. (2003). Optimizing classifier performance via an approximation to the wilcoxon-mann-whitney statistic. In: Proceedings of the Twentieth International Conference on Machine Learning (pp. 848–855).Google Scholar
  69. Zadrozny, B., Langford, J., & Abe, N. (2003). Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of the Third IEEE International Conference on Data Mining (pp. 435–442).Google Scholar
  70. Zou, K. H., Hall, W. J., & Shapiro, D. E. (1997). Smooth non-parametric roc curves for continuous diagnostic tests. Statistics in Medicine, 16, 2143–56.CrossRefGoogle Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  1. 1.Institute for Information TechnologyNational Research Council CanadaOntarioCanada
  2. 2.Department of Computing ScienceUniversity of AlbertaEdmontonCanada

Personalised recommendations