Skip to main content
Log in

The Net Reclassification Index (NRI): A Misleading Measure of Prediction Improvement Even with Independent Test Data Sets

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript


The Net Reclassification Index (NRI) is a very popular measure for evaluating the improvement in prediction performance gained by adding a marker to a set of baseline predictors. However, the statistical properties of this novel measure have not been explored in depth. We demonstrate the alarming result that the NRI statistic calculated on a large test dataset using risk models derived from a training set is likely to be positive even when the new marker has no predictive information. A related theoretical example is provided in which an incorrect risk function that includes an uninformative marker is proven to erroneously yield a positive NRI. Some insight into this phenomenon is provided. Since large values for the NRI statistic may simply be due to use of poorly fitting risk models, we suggest caution in using the NRI as the basis for marker evaluation. Other measures of prediction performance improvement, such as measures derived from the receiver operating characteristic curve, the net benefit function, and the Brier score, cannot be large due to poorly fitting risk functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others


  1. Baker SG, Cook NR, Vickers A, Kramer BS (2009) Using relative utility curves to evaluate risk prediction. J R Stat Soc Ser A Stat Soc 172(4):729–748

    Article  MathSciNet  Google Scholar 

  2. Baker SG, Van Calster B, Steyerberg EW (2012) Evaluating a new marker for risk prediction using the test tradeoff: an update. Int J Biostat 8(1):1–37

    Article  MathSciNet  Google Scholar 

  3. Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102:359–378

    Article  MathSciNet  MATH  Google Scholar 

  4. Hastie T, Tibshirani R, Friedman JH (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, New York

    Book  MATH  Google Scholar 

  5. Hilden J (2014) Commentary: On NRI, IDI, and “good-looking” statistics with nothing underneath. Epidemiology 25(2):265–267

    Article  Google Scholar 

  6. Hilden J, Gerds TA (2013) A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index. Stat Med. doi:10.1002/sim.5804

  7. Kerr KF, McClelland RL, Brown ER, Lumley T (2011) Evaluating the incremental value of new biomarkers with integrated discrimination improvement. Am J Epidemiol 174(3):364–374

    Article  Google Scholar 

  8. Kerr KF, Wang Z, Janes H, McClelland R, Psaty BM, Pepe MS (2014) Net reclassification indices for evaluating risk prediction instruments: a critical review. Epidemiology 25(1):114–121

    Article  Google Scholar 

  9. Li J, Jiang B, Fine JP (2013) Multicategory reclassification statistics for assessing improvements in diagnostic accuracy. Biostatistics 14(2):382–394

    Article  Google Scholar 

  10. McIntosh MW, Pepe MS (2002) Combining several screening tests: optimality of the risk score. Biometrics 58(3):657–664

    Article  MathSciNet  MATH  Google Scholar 

  11. Pencina M, D’Agostino R, D’Agostino R, Vasan R (2008) Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 27(2):157–172

    Article  MathSciNet  Google Scholar 

  12. Pencina MJ, D’Agostino RB, Steyerberg EW (2011) Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med 30(1):11–21

    Article  MathSciNet  Google Scholar 

  13. Pencina MJ, D’Agostino RB, Demler OV (2012) Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models. Stat Med 31(2):101–113

    Article  MathSciNet  Google Scholar 

  14. Pepe M, Janes H (2013) Methods for evaluating prediction performance of biomarkers and tests. In: Lee ML, Gail M, Pfeiffer R, Satten G, Cai T, Gandy A (eds) Risk assessment and evaluation of predictions. Springer, Berlin, pp 107–142

    Chapter  Google Scholar 

  15. Pepe M, Kerr K, Longton G, Wang Z (2013a) Testing for improvement in prediction model performance. Stat Med 32(9):1467–1482

    Article  MathSciNet  Google Scholar 

  16. Pepe MS, Janes H, Kerr KF, Psaty BM (2013b) Net reclassification index: a misleading measure of prediction improvement. University of Washington Department of Biostatistics Working Paper #394 .

  17. Pfeiffer R, Gail M (2011) Two criteria for evaluating risk prediction models. Biometrics 67(3):1057–1065

    Article  MathSciNet  MATH  Google Scholar 

  18. Steyerberg EW (2010) Clinical prediction models: a practical approach to development, validation, and updating. Springer, New York

    MATH  Google Scholar 

  19. Thompson IM, Ankerst DP, Chi C, Lucia MS, Goodman PJ, Crowley JJ, Parnes HL, Coltman CA Jr (2005) Operating characteristics of prostate-specific antigen in men with an initial PSA level of 3.0 ng/ml or lower. JAMA 294(1):66–70

    Article  Google Scholar 

  20. Tzoulaki I, Liberopoulos G, Ioannidis JP (2009) Assessment of claims of improved prediction beyond the Framingham risk score. JAMA 302(21):2345–2352

    Article  Google Scholar 

  21. Vickers AJ, Cronin AM (2010) Traditional statistical methods for evaluating prediction models are uninformative as to clinical value: towards a decision analytic framework. In: Seminars in oncology, vol 37, p 31

  22. Vickers A, Elkin E (2006) Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 26(6):565

    Article  Google Scholar 

  23. Vickers AJ, Pepe MS (2014) Does the net reclassification index help us evaluate models and markers? Ann Intern Med 160(2):136–137

    Article  Google Scholar 

  24. Vickers AJ, Cronin AM, Begg CB (2011) One statistical test is sufficient for assessing new predictive markers. BMC Med Res Methodol 11(1):13

    Article  Google Scholar 

Download references


This work was supported in part by National Institutes of Health Grants R01 GM054438, U24 CA086368, and R01 CA152089.

Conflict of interest

None declared.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Margaret S. Pepe.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 416 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pepe, M.S., Fan, J., Feng, Z. et al. The Net Reclassification Index (NRI): A Misleading Measure of Prediction Improvement Even with Independent Test Data Sets. Stat Biosci 7, 282–295 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: