Abstract
The area under the ROC curve, or AUC, has been widely used to assess the ranking performance of binary scoring classifiers. Given a sample, the metric considers the ordering of positive and negative instances, i.e., the sign of the corresponding score differences. From a model evaluation and selection point of view, it may appear unreasonable to ignore the absolute value of these differences. For this reason, several variants of the AUC metric that take score differences into account have recently been proposed. In this paper, we present a unified framework for these metrics and provide a formal analysis. We conjecture that, despite their intuitive appeal, actually none of the variants is effective, at least with regard to model evaluation and selection. An extensive empirical analysis corroborates this conjecture. Our findings also shed light on recent research dealing with the construction of AUC-optimizing classifiers.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Asuncion, A., & Newman, D. (2007). UCI machine learning repository.
Bradley, A. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159.
Brefeld, U., & Scheffer, T. (2005). AUC maximizing support vector learning. In Ferri, C., Lachiche, N., Macskassy, S., & Rakotomamonjy, A. (Eds.), Proceedings of the 2nd workshop on ROC analysis in machine learning (ROCML 2005). Bonn, Germany, August 11, 2005.
Calders, T., & Jaroszewicz, S. (2007). Efficient AUC optimization for classification. In J. Kok, J. Koronacki, R. L. de Mántaras, S. Matwin, D. Mladenic, & A. Skowron (Eds.), Proceedings of the 11th European conference on principles and practice of knowledge discovery in databases (PKDD 2007) (pp. 42–53). Warsaw, Poland, September 17–21, 2007. Berlin: Springer.
Caruana, R., & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. In W. Cohen & A. Moore (Eds.), Proceedings of the 23rd international conference on machine learning (ICML 2006) (pp. 161–168). Pittsburgh, PA, USA, June 25–29, 2006. New York: Assoc. Comput. Mach.
Cortes, C., & Mohri, M. (2003). AUC optimization vs. error rate minimization. In S. Thrun, L. Saul, & B. Schölkopf (Eds.), Advances in neural information processing systems 16 (NIPS 2003). Vancouver, BC, Canada, December 8–13, 2003. Cambridge: MIT Press.
Ferri, C., Flach, P., & Hernández-Orallo, J. (2003). Improving the AUC of probabilistic estimation trees. In N. Lavrac, D. Gamberger, L. Todorovski, & H. Blockeel (Eds.), Proceedings of the 14th European conference on machine learning (ECML 2003) (pp. 121–132). Cavtat-Dubrovnik, Croatia, September 22–26, 2003. Berlin: Springer.
Ferri, C., Flach, P., Hernández-Orallo, J., & Senad, A. (2005). Modifying ROC curves to incorporate predicted probabilities. In C. Ferri, N. Lachiche, S. Macskassy, & A. Rakotomamonjy (Eds.), Proceedings of the 2nd workshop on ROC analysis in machine learning (ROCML 2005). Bonn, Germany, August 11, 2005.
Friedman, J. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1, 55–77.
Hand, D., & Till, R. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 45, 171–186.
Hanley, J., & McNeil, B. (1982). The meaning and use of the area under a receiver operator characteristic ROC curve. Radiology, 143(1), 29–36.
Herschtal, A., & Raskutti, B. (2004). Optimising area under the ROC curve using gradient descent. In C. Brodley (Ed.), Proceedings of the 21st international conference on machine learning (ICML 2004). Banff, Alberta, Canada, July 4–8, 2004. New York: Assoc. Comput. Mach.
Ling, C., Huang, J., & Zhang, H. (2003). AUC: a statistically consistent and more discriminating measure than accuracy. In G. Gottlob & T. Walsh (Eds.), Proceedings of the 18th international joint conference on artificial intelligence (IJCAI 2003) (pp. 519–526). Acapulco, Mexico, August 9–15, 2003. Menlo Park: AAAI Press.
Mann, H., & Whitney, D. (1947). On a test whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18(1), 50–60.
Provost, F., & Domingos, P. (2003). Tree-induction fir probability based ranking. Machine Learning, 52(3), 199–215.
Provost, F., & Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning, 42(3), 203–231.
Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In J. Shavlik (Ed.), Proceedings of the 15th international conference on machine learning (ICML 1998) (pp. 43–48). Madison, WI, USA, July 24–27, 1998. San Mateo: Morgan Kaufmann.
Rakotomamonjy, A. (2004). Optimizing area under ROC curve with SVMs. In J. Hernández-Orallo, C. Ferri, N. Lachiche, & P. Flach (Eds.), Proceedings of the 1st workshop on ROC analysis and artificial intelligence (ROCAI 2004) (pp. 71–80). Valencia, Spain, August 22, 2004.
Steck, H. (2007). Hinge rank loss and the area under the ROC curve. In J. Kok, J. Koronacki, R. L. de Mántaras, S. Matwin, D. Mladenic, & A. Skowron (Eds.), Proceedings of the 18th European conference on machine learning (ECML 2007) (pp. 347–358). Warsaw, Poland, September 17–21, 2007. Berlin: Springer.
Tax, D., & Veenman, C. (2005). Tuning the hyperparameter of an AUC-optimized classifier. In K. Verbeeck, K. Tuyls, A. Nowe, B. Manderick, & B. Kuijpers (Eds.), Proceedings of the 17th Belgium-Netherlands conference on artificial intelligence (BNAIC 2005) (pp. 224–231). Brussels, Belgium, October 17–18, 2005. Brussels: Royal Flemish Academy of Belgium for Science and Arts.
Tax, D., Duin, R., & Arzhaeva, Y. (2006). Linear model combining by optimizing the area under the ROC curve. In Y. Tang, P. Wang, G. Lorette, D. Yeung, & H. Yan (Eds.), Proceedings of the 18th international conference on pattern recognition (ICPR 2006) (pp. 119–122). Hong Kong, China, August 20–24, 2006. Los Alamitos: IEEE Comput. Soc.
Witten, I., & Frank, E. (2005). Data mining: practical machine learning tools and techniques (2nd ed.). San Mateo: Morgan Kaufmann.
Wu, S., Flach, P., & Ferri, C. (2007). An improved model selection heuristic for AUC. In J. Kok, J. Koronacki, R. L. de Mántaras, S. Matwin, D. Mladenic, & A. Skowron (Eds.), Proceedings of the 18th European conference on machine learning (ECML 2007) (pp. 478–489). Warsaw, Poland, September 17–21, 2007. Berlin: Springer.
Yan, L., Dodier, R., Mozer, M., & Wolniewicz, R. (2003). Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic. In T. Fawcett & N. Mishra (Eds.), Proceedings of the 20th international conference on machine learning (ICML 2003) (pp. 848–855). Washington, DC, USA, August 21–24, 2003. Menlo Park: AAAI Press.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Walter Daelemans, Bart Goethals, Katharina Morik.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Vanderlooy, S., Hüllermeier, E. A critical analysis of variants of the AUC. Mach Learn 72, 247–262 (2008). https://doi.org/10.1007/s10994-008-5070-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-008-5070-x