Abstract
Net reclassification improvement (NRI) has received much attention for comparing risk prediction models, and might be preferable over the area under the receiver operating characteristics (ROC) curve to indicate changes in predictive ability. We investigated the influence of the choice of risk cut-offs and number of risk categories on the NRI. Using data of the European Prospective Investigation into Cancer and Nutrition-Potsdam study, three diabetes prediction models were compared according to ROC area and NRI with varying cut-offs for two and three risk categories and varying numbers of risk categories. When compared with a basic model, including age, anthropometry, and hypertension status, a model extension by waist circumference improved discrimination from 0.720 to 0.831 (0.111 [0.097–0.125]) while increase in ROC-AUC from 0.831 to 0.836 (0.006 [0.002–0.009]) indicated moderate improvement when additionally considering diet and physical activity. However, NRI based on these two model comparisons varied with varying cut-offs for two (range: 5.59–23.20 %; −0.79 to 4.09 %) and three risk categories (20.37–40.15 %; 1.22–4.34 %). This variation was more pronounced in the model extension showing a larger difference in ROC-AUC. NRI increased with increasing numbers of categories from minimum NRIs of 18.41 and 0.46 % to approximately category-free NRIs of 79.61 and 19.22 %, but not monotonically. There was a similar pattern for this increase in both model comparisons. In conclusion, the choice of risk cut-offs and number of categories has a substantial impact on NRI. A limited number of categories should only be used if categories have strong clinical importance.
Similar content being viewed by others
References
Buijsse B, et al. Risk assessment tools for identifying individuals at risk of developing type 2 diabetes. Epidemiol Rev. 2011;33(1):46–62.
Wilson PW, et al. Prediction of coronary heart disease using risk factor categories. Circulation. 1998;97(18):1837–47.
Assmann G, Cullen P, Schulte H. Simple scoring scheme for calculating the risk of acute coronary events based on the 10-year follow-up of the prospective cardiovascular Munster (PROCAM) study. Circulation. 2002;105(3):310–5.
Conroy RM, et al. Estimation of ten-year risk of fatal cardiovascular disease in Europe: the SCORE project. Eur Heart J. 2003;24(11):987–1003.
Hippisley-Cox J, et al. Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study. BMJ. 2007;335(7611):136.
Pepe MS, et al. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004;159(9):882–90.
Cook NR, Ridker PM. Advances in measuring the effect of individual predictors of cardiovascular risk: the role of reclassification measures. Ann Intern Med. 2009;150(11):795–802.
Pencina MJ, et al. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27(2):157–72. (discussion 207–212).
Pencina MJ, D’Agostino RB Sr, Steyerberg EW. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med. 2011;30(1):11–21.
Tzoulaki I, Liberopoulos G, Ioannidis JP. Use of reclassification for assessment of improved prediction: an empirical evaluation. Int J Epidemiol. 2011;40(4):1094–105.
Mihaescu R, et al. Improvement of risk prediction by genomic profiling: reclassification measures versus the area under the receiver operating characteristic curve. Am J Epidemiol. 2010;172(3):353–61.
Cook NR, Paynter NP. Performance of reclassification statistics in comparing risk prediction models. Biometric J. 2011;53(2):237–58.
Boeing H, Korfmann A, Bergmann MM. Recruitment procedures of EPIC-Germany. European Investigation into Cancer and Nutrition. Ann Nutr Metab. 1999;43(4):205–15.
Schulze MB, et al. An accurate risk score based on anthropometric, dietary, and lifestyle factors to predict the development of type 2 diabetes. Diabetes Care. 2007;30(3):510–5.
DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837–45.
Demler OV, Pencina MJ, D’Agostino Sr. RB. Misuse of DeLong test to compare AUCs for nested models. Stat Med. 2012;31(23):2577–87.
Chambless LE, Cummiskey CP, Cui G. Several methods to assess improvement in risk prediction models: extension to survival analysis. Stat Med. 2011;30(1):22–38.
Sundstrom J, et al. Useful tests of usefulness of new risk factors: tools for assessing reclassification and discrimination. Scand J Public Health. 2011;39(4):439–41.
Steyerberg EW, Van Calster B, Pencina MJ. Performance measures for prediction models and markers: evaluation of predictions and classifications. Rev Esp Cardiol. 2011;64(9):788–94.
Mealiffe ME, et al. Assessment of clinical validity of a breast cancer risk model combining genetic and clinical information. J Natl Cancer Inst. 2010;102(21):1618–27.
Steyerberg EW, et al. Assessing the incremental value of diagnostic and prognostic markers: a review and illustration. Eur J Clin Invest. 2011;42(2):216–28.
Cook NR, Paynter NP. Comments on ‘extensions of net reclassification improvement calculations to measure usefulness of new biomarkers’ by M. J. Pencina, R. B. D’Agostino, Sr. and E. W. Steyerberg. Stat Med. 2012;31(1):93–5. (author reply 96–97).
Third Report of the National Cholesterol Education Program (NCEP). Expert panel on detection, evaluation, and treatment of high blood cholesterol in adults (adult treatment panel III) final report. Circulation. 2002;106(25):3143–421.
Grundy SM, et al. Implications of recent clinical trials for the National Cholesterol Education Program Adult Treatment Panel III guidelines. Circulation. 2004;110(2):227–39.
Pencina MJ, D’Agostino RB Sr, Demler OV. Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models. Stat Med. 2012;31(2):101–13.
Reilly BM, Evans AT. Translating clinical research into clinical practice: impact of using prediction rules to make decisions. Ann Intern Med. 2006;144(3):201–9.
Steyerberg EW, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128–38.
Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006;26(6):565–74.
Peirce CS. The numerical measure of the success of predictions. Science. 1884;4(93):453–4.
Pencina MJ. Response to ‘Net reclassification improvement and decision theory’ by Vickers et al. Stat Med. 2009;28(3):526–8.
Greenland P (2008) Comments on ‘evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond’ by M. J. Pencina, R. B. D’Agostino Sr., R. B. D’Agostino Jr., R. S. Vasan, Statistics in Medicine. Stat Med. 2008; 27(2): 188–190. doi:10.1002/sim.2929
Acknowledgments
This work was supported in part by a grant from the German Federal Ministry of Education and Research (BMBF) to the German Center for Diabetes Research (DZD e.V.). The recruitment phase of the EPIC-Potsdam Study was supported by the Federal Ministry of Science, Germany (01 EA 9401) and the European Union (SOC 95201408 05F02). The follow-up of the EPIC-Potsdam Study was supported by German Cancer Aid (70-2488-Ha I) and the European Community (SOC 98200769 05F02). We thank Ellen Kohlsdorf for data management and Olga Kuxhaus for the support in preparing the manuscript.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
10654_2012_9744_MOESM1_ESM.ppt
Calibration Plot for the German Diabetes Risk Score Using 5 Groups of Absolute Risk. For the Hosmer–Lemeshow-Test 5 risk groups were defined with regard to the present use of the German Diabetes Risk Score with cut-offs at 0.88, 2.37, 6.30 and 16.21 % of absolute risks calculated with Cox-regression. Observed Probability was calculated with the number of events during 5 years of follow-up divided by the full number of participants in each of these 5 risk groups. Predicted Probability was calculated as the mean of absolute risks in each of the 5 risk groups derived from German Diabetes Risk Score. (PPT 117 kb)
Rights and permissions
About this article
Cite this article
Mühlenbruch, K., Heraclides, A., Steyerberg, E.W. et al. Assessing improvement in disease prediction using net reclassification improvement: impact of risk cut-offs and number of risk categories. Eur J Epidemiol 28, 25–33 (2013). https://doi.org/10.1007/s10654-012-9744-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10654-012-9744-0