Rate-Oriented Point-Wise Confidence Bounds for ROC Curves

  • Louise A. C. Millard
  • Meelis Kull
  • Peter A. Flach
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8725)


Common approaches to generating confidence bounds around ROC curves have several shortcomings. We resolve these weaknesses with a new ‘rate-oriented’ approach. We generate confidence bounds composed of a series of confidence intervals for a consensus curve, each at a particular predicted positive rate (PPR), with the aim that each confidence interval contains new samples of this consensus curve with probability 95%. We propose two approaches; a parametric and a bootstrapping approach, which we base on a derivation from first principles. Our method is particularly appropriate with models used for a common type of task that we call rate-constrained, where a certain proportion of examples needs to be classified as positive by the model, such that the operating point will be set at a particular PPR value.


Confidence bounds rate-averaging ROC curves rate-constrained 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arnold, B.C., Balakrishnan, N., Nagaraja, H.N.: A first course in order statistics, vol. 54. SIAM (1992)Google Scholar
  2. 2.
    Berrar, D., Flach, P.: Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them). Briefings in Bioinformatics 13(1), 83–97 (2012)CrossRefGoogle Scholar
  3. 3.
    Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997)CrossRefGoogle Scholar
  4. 4.
    Campbell, G.: Advances in statistical methodology for the evaluation of diagnostic and laboratory tests. Statistics in Medicine 13(5-7), 499–508 (1994)CrossRefGoogle Scholar
  5. 5.
    Fawcett, T.: ROC graphs: Notes and practical considerations for researchers. Machine Learning 31, 1–38 (2004)MathSciNetGoogle Scholar
  6. 6.
    Hall, P., Hyndman, R.J., Fan, Y.: Nonparametric confidence intervals for receiver operating characteristic curves. Biometrika 91(3), 743–750 (2004)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Hand, D.J.: Measuring classifier performance: A coherent alternative to the area under the ROC curve. Machine Learning 77(1), 103–123 (2009)CrossRefGoogle Scholar
  8. 8.
    Järvelin, K., Kekäläinen, J.: IR evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 41–48. ACM (2000)Google Scholar
  9. 9.
    Macskassy, S., Provost, F.: Confidence bands for ROC curves: Methods and an empirical study. In: Proceedings of the First Workshop on ROC Analysis in AI (2004)Google Scholar
  10. 10.
    Macskassy, S., Provost, F., Rosset, S.: Pointwise ROC confidence bounds: An empirical evaluation. In: Proceedings of the Workshop on ROC Analysis in Machine Learning (2005)Google Scholar
  11. 11.
    Macskassy, S.A., Provost, F., Rosset, S.: ROC confidence bands: An empirical evaluation. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005, New York, NY, USA, pp. 537–544 (2005)Google Scholar
  12. 12.
    Millard, L.A.C., Flach, P.A., Higgins, J.P.T.: Rate-constrained ranking and the rate-weighted AUC. In: Calders, T., Esposito, F., Hüllermeier, E. (eds.) ECML/PKDD 2014, vol. 8725, pp. 383–398. Springer, Heidelberg (2014)Google Scholar
  13. 13.
    Provost, F.J., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: ICML, vol. 98, pp. 445–453 (1998)Google Scholar
  14. 14.
    Sheridan, R.P., Singh, S.B., Fluder, E.M., Kearsley, S.K.: Protocols for bridging the peptide to nonpeptide gap in topological similarity searches. Journal of Chemical Information and Computer Sciences 41(5), 1395–1406 (2001)Google Scholar
  15. 15.
    Joshua Swamidass, S., Azencott, C.-A., Daily, K., Baldi, P.: A CROC stronger than ROC: Measuring, visualizing and optimizing early retrieval. Bioinformatics 26(10), 1348–1356 (2010)CrossRefGoogle Scholar
  16. 16.
    Tilbury, J.B., Van Eetvelt, W., Garibaldi, J.M., Curnsw, W.J., Ifeachor, E.C.: Receiver operating characteristic analysis for intelligent medical systems-a new approach for finding confidence intervals. IEEE Transactions on Biomedical Engineering 47(7), 952–963 (2000)Google Scholar
  17. 17.
    Truchon, J.-F., Bayly, C.I.: Evaluating virtual screening methods: good and bad metrics for the “early recognition” problem. Journal of Chemical Information and Modeling 47(2), 488–508 (2007)CrossRefGoogle Scholar
  18. 18.
    Zhao, W., Hevener, K.E., White, S.W., Lee, R.E., Boyett, J.M.: A statistical framework to evaluate virtual screening. BMC Bioinformatics 10(1), 225 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Louise A. C. Millard
    • 1
    • 2
  • Meelis Kull
    • 1
  • Peter A. Flach
    • 1
    • 2
  1. 1.Intelligent Systems LaboratoryUniversity of BristolUnited Kingdom
  2. 2.MRC Integrative Epidemiology Unit, School of Social and Community MedicineUniversity of BristolUnited Kingdom

Personalised recommendations