A confidence ellipse for the Net Reclassification Improvement

Mühlenbruch, Kristin; Kuxhaus, Olga; Pencina, Michael J.; Boeing, Heiner; Liero, Hannelore; Schulze, Matthias B.

doi:10.1007/s10654-015-0001-1

A confidence ellipse for the Net Reclassification Improvement

METHODS
Open access
Published: 28 February 2015

Volume 30, pages 299–304, (2015)
Cite this article

Download PDF

You have full access to this open access article

European Journal of Epidemiology Aims and scope Submit manuscript

A confidence ellipse for the Net Reclassification Improvement

Download PDF

Kristin Mühlenbruch^1,2,
Olga Kuxhaus^1,2,
Michael J. Pencina³,
Heiner Boeing⁵,
Hannelore Liero⁴ &
…
Matthias B. Schulze^1,2

1950 Accesses
7 Citations
Explore all metrics

Abstract

The Net Reclassification Improvement (NRI) has become a popular metric for evaluating improvement in disease prediction models through the past years. The concept is relatively straightforward but usage and interpretation has been different across studies. While no thresholds exist for evaluating the degree of improvement, many studies have relied solely on the significance of the NRI estimate. However, recent studies recommend that statistical testing with the NRI should be avoided. We propose using confidence ellipses around the estimated values of event and non-event NRIs which might provide the best measure of variability around the point estimates. Our developments are illustrated using practical examples from EPIC-Potsdam study.

Background

Risk prediction models have become a main focus in epidemiological research in the past years. Although a large number of prediction models exists, of which some have already been integrated in treatment strategies or health promotion programs, there is an ongoing effort to improve prediction models by the use of new risk markers. For the evaluation of such model extensions, the Net Reclassification Improvement (NRI) was proposed by Pencina et al. in 2008 as an addition to the evaluation of discrimination, e.g. by comparing receiver operating characteristic curves [1]. The NRI is based on the calculation of the amount of correctly and incorrectly reclassified cases and non-cases comparing classification of individuals into a priori defined risk categories in terms of their predicted risk between two nested models. Since its publication it has been used in a growing number of studies, however, there is a large heterogeneity in its use, presentation, and interpretation [2, 3]. Especially with regard to testing statistical significance of NRI estimates, there remains uncertainty. Pencina [4] discussed that even small NRI values (<0.01) might produce statistically significant p values and Pepe et al. [5] points out that valid methods for inference for the NRI do not exist. In a recent review of NRI measures, Kerr et al. [2] raise concerns about the proposed test statistic and variance formula. This suggests that statistical testing should be avoided for the NRI measure. However, confidence intervals provide precision estimates and are preferable, not only for the overall NRI, but also for its components. The NRI components do not reflect an overall improvement but rather improvement among cases and non-cases separately. Therefore, our aim was to introduce a method to calculate a confidence ellipse around the two components of the NRI which reflects the precision of the estimates and can help interpret the magnitude and variability of the observed effects.

Definition of NRI

Extension of prediction models with additional risk factors usually leads to changes in predicted risk for individual study participants. When predefined risk categories are used, this is reflected by upward and downward movements across these risk categories from the reference to the extended model. This reclassification is used for the calculation of the NRI which considers proportions of upward and downward movements separately for cases and non-cases (1) [1].

$$\begin{aligned} NRI_{cases} & = P\left( {up|case} \right) - P\left( {down|case} \right), \\ NRI_{{non{ - }cases}} & = P\left( {down|non{ - }case} \right) - P\left( {up|non{ - }case} \right), \\ NRI & = NRI_{cases} + NRI_{{non{ - }cases}} \\ & = P\left( {up|case} \right) - P\left( {down|case} \right) + P\left( {down|non{ - }case} \right) - P\left( {up|non{ - }case} \right) \\ & = \left[ {p_{up,cases} - p_{down,cases} } \right] + \left[ {p_{{down,non{ - }cases}} - p_{{up,non{ - }cases}} } \right] \\ \end{aligned}$$

(1)

The corresponding standard error for the NRI and its components was defined by Pencina et al. [4] and depends on the standard error of cases, which often is a much smaller group:

$$SE\left( {\widehat{NRI}} \right) = \sqrt {SE\left( {\widehat{NRI} _{cases} } \right)^{2} + SE\left( {\widehat{NRI} _{non - cases} } \right)^{2} } ,$$

$$SE\left( {\widehat{NRI}_{cases} } \right) = \sqrt {\frac{{\hat{p} _{up, cases} + \hat{p} _{down, cases} - \left( {\hat{p} _{up, cases} - \hat{p} _{down, cases} } \right)^{2} }}{{N_{cases} }} } ,$$

$$SE\left( {\widehat{NRI} _{non - cases} } \right) = \sqrt {\frac{{\hat{p} _{up, non - cases} + \hat{p} _{down, non - cases} - \left( {\hat{p} _{{up, non{ - }cases}} - \hat{p} _{down, non - cases} } \right)^{2} }}{{N_{non - cases} }}} .$$

As such, the NRI is the sum of the single components (NRI _cases, NRI _non-cases) reflecting improvement among cases or improvement among non-cases or both. Thereby, the overall measure does not include evaluation of improvement among cases or non-cases separately. Absolute risks are derived from regression models; either logistic regression or Cox-regression with the disease as the outcome variable.

Confidence ellipse for two components of NRI

Pencina already suggested to report CIs for the NRI and used the bootstrap method for their construction [4]. Calculation of CIs would be informative not only for the overall NRI but also for the single components. Besides the bootstrapping method, CIs can be calculated with a formula related to the construction of CIs for independent proportions according to Agresti [6]; this approach will be applied further on. The standard errors for the overall NRI and its single components were defined before, so that the CIs can be defined as follows:

$$\left[ {\widehat{NRI} - z_{1 - \frac{\alpha}{ 2}} SE\left( {\widehat{NRI}} \right),\widehat{NRI} + z_{1 - \frac{\alpha}{ 2}} SE(\widehat{NRI})} \right]$$

with $z_{1 - \frac{\alpha}{ 2}}$ as the $\left( {1 - \frac{\alpha}{ 2}} \right)$-quantile of the standard normal distribution. The CIs for NRI _cases and NRI _non-cases can be calculated with the same method. While CIs of the two NRI components, NRI _cases and NRI _non-cases, can be interpreted individually, this again would not allow an easy interpretation in terms of the overall improvement. To overcome this problem, we propose to use a confidence ellipse which allows evaluating the single components NRI _cases and NRI _non-cases in combination.

We introduce the following notation: Let $\theta = (\theta_{1, } \theta_{2} )$ be the parameter consisting of the NRI components, i.e.

$$\theta_{1} = NRI_{cases} = P\left( {up |case} \right) - P\left( {down |case} \right)\;{\text{and}}$$

$$\theta_{2} = NRI_{{non{ - }cases}} = P\left( {down |non{ - }case} \right) - P\left( {up |non{ - }case} \right)$$

We define the following probabilities

$$p_{1} = {\text{P}}\left( {up \cap case} \right),\, p_{2} = {\text{P}}\left( {down \cap case} \right),\, p_{3} = {\text{P}}\left( {up \cap non{ - }case} \right),\, p_{4} = {\text{P}}\left( {down \cap non{ - }case} \right)\;{\text{and}}\; p_{5} = {\text{P}}\left( {case} \right)$$

and can write $\theta$ as a function of these probabilities:

$$\theta = \left( {\theta_{1} ,\theta_{2} } \right) = g\left( \varvec{p} \right) = \left( {g_{1} \left( \varvec{p} \right),g_{2} \left( \varvec{p} \right)} \right) = \left( {\frac{{p_{1} - p_{2} }}{{p_{5} }}, \frac{{p_{4} - p_{3} }}{{1 - p_{5} }}} \right).$$

Consequently, the maximum likelihood estimates of $\theta_{1}$ and $\theta_{2}$ are given by the relative frequencies $\hat{p}_{j} = v_{j} /N$ (with $N = N_{cases} + N_{{non{ - }cases}}$) as follows:

$$\hat{\theta }_{1} = \frac{{\hat{p}_{1} - \hat{p}_{2} }}{{\hat{p}_{5} }} \;{\text{and}}\; \hat{\theta }_{2} = \frac{{\hat{p}_{4} - \hat{p}_{3} }}{{1 - \hat{p}_{5} }}$$

with

	Up	Down	Total
Case	$ v_{1} $	$ v_{2} $	$ v_{5} $
Non-case	$ v_{3} $	$ v_{4} $	$ N - v_{5} $

Applying the multivariate central limit theorem to the vector of relative frequencies $\hat{\varvec{p}} = (\hat{p}_{1} ,\hat{p}_{2} , \hat{p}_{3} , \hat{p}_{4} , \hat{p}_{5} )^{T}$ we get, that for a large sample size $N$ the distribution of $\sqrt N (\hat{\varvec{p}} - \varvec{p})$ can be approximated by a five-dimensional normal distribution, i.e.,

$$\sqrt N \left( {\hat{\varvec{p}} - \varvec{p}} \right){ \mathop \to \limits^{ D }} {\text{N}}_{5} \left( {0,A(\varvec{p})} \right).$$

(2)

Here $A(\varvec{p})$ is the covariance matrix of the limit distribution. It depends on the underlying probabilities $p_{j}$ and can be computed as:

$$A\left( \varvec{p} \right) = \left( {\begin{array}{*{20}l} {p_{1} (1 - p_{1} )} \hfill & { - p_{1} p_{2} } \hfill & { - p_{1} p_{3} } \hfill & { - p_{1} p_{4} } \hfill & {p_{1} (1 - p_{5} )} \hfill \\ { - p_{1} p_{2} } \hfill & {p_{2} (1 - p_{2} )} \hfill & { - p_{2} p_{3} } \hfill & { - p_{2} p_{4} } \hfill & {p_{2} (1 - p_{5} )} \hfill \\ { - p_{1} p_{3} } \hfill & { - p_{2} p_{3} } \hfill & {p_{3} (1 - p_{3} )} \hfill & { - p_{3} p_{4} } \hfill & { - p_{3} p_{5} } \hfill \\ { - p_{1} p_{4} } \hfill & { - p_{2} p_{4} } \hfill & { - p_{3} p_{4} } \hfill & {p_{4} (1 - p_{4} )} \hfill & { - p_{4} p_{5} } \hfill \\ {p_{1} (1 - p_{5} )} \hfill & {p_{2} (1 - p_{5} )} \hfill & { - p_{3} p_{5} } \hfill & { - p_{4} p_{5} } \hfill & {p_{5} (1 - p_{5} )} \hfill \\ \end{array} } \right).$$

With the help of the so-called delta method we can derive from (2) the asymptotic variance of $\hat{\theta } = (\hat{\theta }_{1} ,\hat{\theta }_{2} )$. Here we use, that $\hat{\theta } = g(\hat{\varvec{p}})$. To derive the asymptotic variance of $\hat{\theta }$ one has to multiply the matrix of partial derivatives of $g$ with $A(\varvec{p})$. This leads to

$$\sqrt N \left[ {\left( {\begin{array}{*{20}c} {\hat{\theta }_{1} } \\ {\hat{\theta }_{2} } \\ \end{array} } \right) - \left( {\begin{array}{*{20}c} {\theta_{1} } \\ {\theta_{2} } \\ \end{array} } \right)} \right] {\mathop \to \limits^{ D }} {\text{N}}_{2} \left( {0,W(\varvec{p})} \right)$$

with $W(\varvec{p}) = \left( {\begin{array}{*{20}c} {w_{1} } & 0 \\ 0 & {w_{2} } \\ \end{array} } \right)$ and

$$w_{1} = \frac{{p_{1} + p_{2} }}{{p_{5}^{2} }} - \frac{{\left( {p_{1} - p_{2} } \right)^{2} }}{{p_{5}^{3} }}, w_{2} = \frac{{p_{3} + p_{4} }}{{(1 - p_{5} )^{2} }} - \frac{{\left( {p_{3} - p_{4} } \right)^{2} }}{{(1 - p_{5} )^{3} }}.$$

The asymptotic normality of $\hat{\theta }$ implies that

$$N\left( {\hat{\theta } - \theta } \right)^{T} W^{ - 1} \left( {\hat{\varvec{p}}} \right)\left( {\hat{\theta } - \theta } \right){\mathop \to \limits^{ D }} \chi_{2}^{2}$$

(3)

with $\chi_{2}^{2}$, the Chi squared distribution with two degrees of freedom and $W^{ - 1} \left( {\hat{\varvec{p}}} \right)$ is the inverse of the matrix $W(\varvec{p})$.

Because of the diagonal structure of $W(\varvec{p})$ and with asymptotic result from (3) we can define a $\left( {1 - \alpha } \right)$ confidence ellipse for $\theta$ as

$$\left\{ {\theta \in [ - 1,1]^{2} \mid \frac{{\left( {\hat{\theta }_{1} - \theta_{1} } \right)^{2} }}{{\hat{w}_{1} }} + \frac{{\left( {\hat{\theta }_{2} - \theta_{2} } \right)^{2} }}{{\hat{w}_{2} }} \le \frac{{\chi_{2;1 - \alpha }^{2} }}{N}} \right\}.$$

The determination of the confidence ellipse allows to determine the simultaneous precision of the NRI estimates for cases and non-cases.

Using previous notation and the following relationships $\hat{p}_{up, cases} = \hat{p}_{1} /\hat{p}_{5}$, $\hat{p}_{down, cases} = \hat{p}_{2} /\hat{p}_{5}$, $\hat{p}_{{up, non{ - }cases}} = \hat{p}_{3} /(1 - \hat{p}_{5})$ and $\hat{p}_{up, cases} = \hat{p}_{4} /(1 - \hat{p}_{5})$, the confidence ellipse can also be defined with the following equation.

$$\left\{ {\theta \in [ - 1,1]^{2} \mid \left( {\frac{{\widehat{NRI}_{cases} - \theta_{1} }}{{SE\left( {\widehat{NRI} _{cases} } \right)}}} \right)^{2} + \left( {\frac{{\widehat{NRI}_{{non{ - }cases}} - \theta_{2} }}{{SE\left( {\widehat{NRI} _{{non{ - }cases}} } \right)}}} \right)^{2} \le \chi_{2;1 - \alpha }^{2} } \right\}.$$

Empirical data

Study population

The European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam study is a prospective cohort study initially including 27,548 participants aged 35–65 years. Details of recruitment and follow-up procedures were described previously [7, 8]. Briefly, within a median follow-up time of 7 years, 849 participants out of 25,167 participants free of diabetes at baseline developed incident diabetes. On this basis, the German diabetes risk score (GDRS) was developed using Cox-regression [9]. With the GDRS the 5-year risk for developing future type 2 diabetes can be calculated using information on lifestyle and anthropometric factors, diet and physical activity. It serves as the reference model in this underlying model comparison. We used data from 21,846 participants (727 cases) who had also information on family history of diabetes available. The extended model additionally included family history; this model was compared with the reference model. Table 1 shows the reclassification of cases and non-cases due to model extension based on the use of 5 predefined risk categories.

Table 1 Reclassification table by cases and non-cases resulting from adding family history of diabetes to the German DRS (GDRS), EPIC-Potsdam cohort (N = 21,846)

Full size table

Calculation of Confidence Intervals and Confidence ellipses

Based on the asymptotic method we determined 95 % CIs for NRI _cases and NRI _non-cases (Fig. 1). Taking into account the large number of non-cases it is obvious that estimation of NRI _non-cases was much more precise than of NRI _cases. The calculation of CIs for single components does not allow evaluating both components in combination.

Therefore, we computed a confidence ellipse for NRI _cases and NRI _non-cases to reflect precision of their estimates in combination and which also allows to evaluate the area of acceptable values. Figure 1 shows CIs for the single components (vertical and horizontal lines) as well as the confidence ellipse, both approaches were based on the five risk categories described before. When constructing CIs for the components separately, NRI _cases (0.0619) has a CI of 0.0219–0.1019. Therefore, the value 0.02 lies outside of this interval while the NRI _non-cases (0.0379) had a CI ranging from 0.0318 to 0.0440 thus including a value of 0.035. Using both CIs separately would therefore lead to the conclusion that NRI _cases is significantly higher than 0.02 while NRI _non-cases is not significantly higher than 0.035. However, examining the vector (0.02, 0.035) within the confidence ellipse we can see that it is located inside the area of the ellipse. Thus, the confidence ellipse indicates that—when evaluated together—neither is the NRI _cases different from 0.02 nor is the NRI _non-cases different from 0.035. This example clearly indicates that evaluating single NRI components separately might result in different decisions than evaluating the single components in combination by the use of confidence ellipses.

These results were based on the asymptotic method for both the calculation of CIs and of the confidence ellipse.

Discussion

The use of the NRI is informative for the evaluation of improvements of prediction models when taking into account the obvious limitations associated with the use of categories and cut-offs. Given that no established cut-offs for the NRI exist which allow interpreting its value as being meaningful from a clinical or public health point of view, reliance solely on significance testing has been frequently adopted in reclassification analyses.

As recommended in a recent review of the NRI methods [3], it is preferable to investigate model improvement separately for cases or non-cases. A general framework for testing the two components of the overall NRI, NRI _cases and NRI _non-cases, has previously been laid out by Pencina et al. [1]. However, a major drawback of examining single components in isolation is that the results cannot be interpreted in terms of the overall model improvement. We note that recent recommendations suggest not applying statistical testing at all [2, 3]. Likewise, our developments facilitate the use of confidence intervals. A particularly appealing approach is based on using the confidence ellipse which reflects the 2-dimensional nature of the situation. Our empirical example indicates that confidence ellipses can be useful in reflecting both, the precision of the NRI estimation as well as putting the results in the context of overall improvement.

Our proposed method of confidence ellipses is also flexible here as it can be applied to evaluating extensions of prediction models using equal or different weights as well as thresholds of acceptable model improvement for cases and non-cases as already discussed by Greenland [10].

In conclusion, confidence ellipses might be particularly useful in the context of evaluating overall or case- versus non-case-specific model improvement as they allow evaluating varying acceptable values of the NRI components in combination and also reflect the precision of their estimates.

References

Pencina MJ, D’Agostino RB Sr, D’Agostino RB Jr, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27(2):157–72.
Article PubMed Google Scholar
Kerr KF, Wang Z, Janes H, McClelland RL, Psaty BM, Pepe MS. Net reclassification indices for evaluating risk prediction instruments: a critical review. Epidemiology. 2014;25(1):114–21.
Article PubMed Central PubMed Google Scholar
Leening MJG, Vedder MM, Witteman JCM, Pencina MJ, Steyerberg EW. Net Reclassification Improvement: computation, interpretation, and controversies a literature review and clinician’s guide. Ann Intern Med. 2014;160(2):122–31.
Article PubMed Google Scholar
Pencina MJ, D’Agostino RB Sr, Steyerberg EW. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med. 2011;30(1):11–21.
Article PubMed Central PubMed Google Scholar
Pepe MS, Kerr KF, Longton G, Wang Z. Testing for improvement in prediction model performance. Stat Med. 2013;32(9):1467–82.
Article PubMed Central PubMed Google Scholar
Agresti A. Categorical Data Analysis. Balding DJ, Bloomfield P, Cressie NAC, editors. Hoboken: Wiley; 2002.
Bergmann MM, Bussas U, Boeing H. Follow-up procedures in EPIC-Germany—data quality aspects. European prospective investigation into cancer and nutrition. Ann Nutr Metab. 1999;43(4):225–34.
Article CAS PubMed Google Scholar
Boeing H, Korfmann A, Bergmann MM. Recruitment procedures of EPIC-Germany. European Investigation into Cancer and Nutrition. Ann Nutr Metab. 1999;43(4):205–15.
Article CAS PubMed Google Scholar
Schulze MB, Hoffmann K, Boeing H, Linseisen J, Rohrmann S, Mohlig M, et al. An accurate risk score based on anthropometric, dietary, and lifestyle factors to predict the development of type 2 diabetes. Diabetes Care. 2007;30(3):510–5.
Article PubMed Google Scholar
Greenland S. The need for reorientation toward cost-effective prediction: comments on ‘Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond’ by M. J. Pencina et al., Statistics in Medicine (DOI: 10.1002/sim.2929). Stat Med. 2008;27(2):199–206.
Article PubMed Google Scholar

Download references

Acknowledgments

This work was supported in part by a grant from the German Federal Ministry of Education and Research (BMBF) to the German Center for Diabetes Research (DZD e.V.). The recruitment phase of the EPIC-Potsdam Study was supported by the Federal Ministry of Science, Germany (01 EA 9401) and the European Union (SOC 95201408 05F02). The follow-up of the EPIC-Potsdam Study was supported by German Cancer Aid (70-2488-Ha I) and the European Community (SOC 98200769 05F02). We thank Dr. Manuela Bergmann who was responsible for the methodological and organisational work of data collections of exposures and outcomes and Wolfgang Fleischhauer for his medical expertise that was employed in case ascertainment and contacts with the physicians and Ellen Kohlsdorf for data management.

Author information

Authors and Affiliations

Department of Molecular Epidemiology, German Institute of Human Nutrition Potsdam-Rehbruecke, Arthur-Scheunert-Allee 114-116, 14558, Nuthetal, Germany
Kristin Mühlenbruch, Olga Kuxhaus & Matthias B. Schulze
German Center for Diabetes Research (DZD), Nuthetal, Germany
Kristin Mühlenbruch, Olga Kuxhaus & Matthias B. Schulze
Department of Biostatistics and Bioinformatics, Duke Clinical Research Institute, Duke University, Durham, NC, USA
Michael J. Pencina
Institute of Mathematics, University of Potsdam, Potsdam, Germany
Hannelore Liero
Department of Epidemiology, German Institute of Human Nutrition Potsdam-Rehbruecke, Nuthetal, Germany
Heiner Boeing

Authors

Kristin Mühlenbruch
View author publications
You can also search for this author in PubMed Google Scholar
Olga Kuxhaus
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Pencina
View author publications
You can also search for this author in PubMed Google Scholar
Heiner Boeing
View author publications
You can also search for this author in PubMed Google Scholar
Hannelore Liero
View author publications
You can also search for this author in PubMed Google Scholar
Matthias B. Schulze
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kristin Mühlenbruch.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Reprints and permissions

About this article

Cite this article

Mühlenbruch, K., Kuxhaus, O., Pencina, M.J. et al. A confidence ellipse for the Net Reclassification Improvement. Eur J Epidemiol 30, 299–304 (2015). https://doi.org/10.1007/s10654-015-0001-1

Download citation

Received: 03 November 2014
Accepted: 12 February 2015
Published: 28 February 2015
Issue Date: April 2015
DOI: https://doi.org/10.1007/s10654-015-0001-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

	Up	Down	Total
Case	\( v_{1} \)	\( v_{2} \)	\( v_{5} \)
Non-case	\( v_{3} \)	\( v_{4} \)	\( N - v_{5} \)

A confidence ellipse for the Net Reclassification Improvement

Abstract

Background

Definition of NRI

Confidence ellipse for two components of NRI

Empirical data

Study population

Calculation of Confidence Intervals and Confidence ellipses

Discussion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation