Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals

  • Kendrick Boyd
  • Kevin H. Eng
  • C. David Page
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8190)


The area under the precision-recall curve (AUCPR) is a single number summary of the information in the precision-recall (PR) curve. Similar to the receiver operating characteristic curve, the PR curve has its own unique properties that make estimating its enclosed area challenging. Besides a point estimate of the area, an interval estimate is often required to express magnitude and uncertainty. In this paper we perform a computational analysis of common AUCPR estimators and their confidence intervals. We find both satisfactory estimates and invalid procedures and we recommend two simple intervals that are robust to a variety of assumptions.


Average Precision Roswell Park Cancer Institute Bias Ratio Markov Logic Network Receiver Operating Character 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Goadrich, M., Oliphant, L., Shavlik, J.: Gleaner: Creating ensembles of first-order clauses to improve recall-precision curves. Machine Learning 64, 231–262 (2006)CrossRefzbMATHGoogle Scholar
  2. 2.
    Richardson, M., Domingos, P.: Markov logic networks. Machine Learning 62(1-2), 107–136 (2006)CrossRefGoogle Scholar
  3. 3.
    Liu, Y., Shriberg, E.: Comparing evaluation metrics for sentence boundary detection. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 4, pp. IV–185. IEEE (2007)Google Scholar
  4. 4.
    Yue, Y., Finley, T., Radlinski, F., Joachims, T.: A support vector method for optimizing average precision. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 271–278. ACM (2007)Google Scholar
  5. 5.
    Natarajan, S., Khot, T., Kersting, K., Gutmann, B., Shavlik, J.: Gradient-based boosting for statistical relational learning: The relational dependency network case. Machine Learning 86(1), 25–56 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine learning, ICML 2006, pp. 233–240. ACM, New York (2006)Google Scholar
  7. 7.
    Bamber, D.: The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology 12(4), 387–415 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Pepe, M.S.: The statistical evaluation of medical tests for classification and prediction. Oxford University Press, USA (2004)zbMATHGoogle Scholar
  9. 9.
    Gordon, M., Kochen, M.: Recall-precision trade-off: A derivation. Journal of the American Society for Information Science 40(3), 145–151 (1989)CrossRefGoogle Scholar
  10. 10.
    Abeel, T., Van de Peer, Y., Saeys, Y.: Toward a gold standard for promoter prediction evaluation. Bioinformatics 25(12), i313–i320 (2009)CrossRefGoogle Scholar
  11. 11.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefzbMATHGoogle Scholar
  12. 12.
    Brodersen, K.H., Ong, C.S., Stephan, K.E., Buhmann, J.M.: The binormal assumption on precision-recall curves. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 4263–4266. IEEE (2010)Google Scholar
  13. 13.
    Efron, B.: Bootstrap methods: Another look at the jackknife. The Annals of Statistics 7(1), 1–26 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1923 (1998)CrossRefGoogle Scholar
  15. 15.
    DeGroot, M.H., Schervish, M.J.: Probability and Statistics. Addison-Wesley (2001)Google Scholar
  16. 16.
    Shao, J.: Mathematical Statistics, 2nd edn. Springer (2003)Google Scholar
  17. 17.
    Wasserman, L.: All of statistics: A concise course in statistical inference. Springer (2004)Google Scholar
  18. 18.
    Lehmann, E.L., Casella, G.: Theory of point estimation, vol. 31. Springer (1998)Google Scholar
  19. 19.
    Efron, B.: Bootstrap confidence intervals: Good or bad? Psychological Bulletin 104(2), 293–296 (1988)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Kendrick Boyd
    • 1
  • Kevin H. Eng
    • 2
  • C. David Page
    • 1
  1. 1.University of Wisconsin-MadisonMadisonUSA
  2. 2.Roswell Park Cancer InstituteBuffaloUSA

Personalised recommendations