Rate-Constrained Ranking and the Rate-Weighted AUC
- 2 Citations
- 3k Downloads
Abstract
Ranking tasks, where instances are ranked by a predicted score, are common in machine learning. Often only a proportion of the instances in the ranking can be processed, and this quantity, the predicted positive rate (PPR), may not be known precisely. In this situation, the evaluation of a model’s performance needs to account for these imprecise constraints on the PPR, but existing metrics such as the area under the ROC curve (AUC) and early retrieval metrics such as normalised discounted cumulative gain (NDCG) cannot do this. In this paper we introduce a novel metric, the rate-weighted AUC (rAUC), to evaluate ranking models when constraints across the PPR exist, and provide an efficient algorithm to estimate the rAUC using an empirical ROC curve. Our experiments show that rAUC, AUC and NDCG often select different models. We demonstrate the usefulness of rAUC on a practical application: ranking articles for rapid reviews in epidemiology.
Keywords
Random Forest True Positive Rate Support Vector Machine Model Rapid Review True Negative RateReferences
- 1.Albert, J.: Learnbayes: Functions for learning Bayesian inference. R package version 2.12 (2008)Google Scholar
- 2.Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997)CrossRefGoogle Scholar
- 3.Bradley, A.P.: Half-AUC for the evaluation of sensitive or specific classifiers. Pattern Recognition Letters 38, 93–98 (2014)CrossRefGoogle Scholar
- 4.Dodd, L.E., Pepe, M.S.: Partial AUC estimation and regression. Biometrics 59(3), 614–623 (2003)CrossRefzbMATHMathSciNetGoogle Scholar
- 5.Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27(8), 861–874 (2006)CrossRefMathSciNetGoogle Scholar
- 6.Flach, P.A.: The geometry of ROC space: Understanding machine learning metrics through ROC isometrics. In: Proceedings of the 20th International Conference on Machine Learning, ICML 2003, pp. 194–201 (2003)Google Scholar
- 7.Ganann, R., Ciliska, D., Thomas, H.: Expediting systematic reviews: Methods and implications of rapid reviews. Implementation Science 5(1), 56 (2010)CrossRefGoogle Scholar
- 8.Hand, D.J.: Measuring classifier performance: A coherent alternative to the area under the ROC curve. Machine Learning 77(1), 103–123 (2009)CrossRefGoogle Scholar
- 9.Higgins, J., Altman, D.G.: Assessing risk of bias in included studies. In: Cochrane Handbook for Systematic Reviews of Interventions. Cochrane Book Series, pp. 187–241 (2008)Google Scholar
- 10.Järvelin, K., Kekäläinen, J.: IR evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 41–48. ACM (2000)Google Scholar
- 11.Jarvelin, K., Kekalainen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20(4), 422–446 (2002)CrossRefGoogle Scholar
- 12.Jiang, Y., Metz, C.E., Nishikawa, R.M.: A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology 201(3), 745–750 (1996)CrossRefGoogle Scholar
- 13.Macskassy, S.A., Provost, F., Rosset, S.: ROC confidence bands: An empirical evaluation. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005, pp. 537–544. ACM (2005)Google Scholar
- 14.McClish, D.K.: Analyzing a portion of the ROC curve. Medical Decision Making 9(3), 190–195 (1989)CrossRefGoogle Scholar
- 15.Sheridan, R.P., Singh, S.B., Fluder, E.M., Kearsley, S.K.: Protocols for bridging the peptide to nonpeptide gap in topological similarity searches. Journal of Chemical Information and Computer Sciences 41(5), 1395–1406 (2001)Google Scholar
- 16.Swamidass, J., Azencott, C.-A., Daily, K., Baldi, P.: A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. Bioinformatics 26(10), 1348–1356 (2010)CrossRefGoogle Scholar
- 17.Truchon, J.-F., Bayly, C.I.: Evaluating virtual screening methods: good and bad metrics for the “early recognition” problem. Journal of Chemical Information and Modeling 47(2), 488–508 (2007)CrossRefGoogle Scholar
- 18.Zhao, W., Hevener, K.E., White, S.W., Lee, R.E., Boyett, J.M.: A statistical framework to evaluate virtual screening. BMC Bioinformatics 10(1), 225 (2009)CrossRefGoogle Scholar