Background

Prediction models in autosomal dominant polycystic kidney disease (ADPKD) are used in clinical settings for several purposes. They can inform patients about their prognosis. They can identify patients at greatest risk of rapid disease progression who might benefit most from new therapies. They can also identify patients with slower disease progression who might benefit from a care strategy that delays treatment until a later stage [1, 2]. Finally, prediction models are useful for identifying patients with a particular disease risk profile who would be suitable for clinical trials [2, 3]. Relevant outcomes for ADPKD prediction models include total kidney volume (TKV) and estimated glomerular filtration rate (eGFR) [4], the primary clinical indicators of disease progression. Established predictors of these outcomes include age, sex, earlier measures of TKV and eGFR and Polycystic Kidney Disease genotype [5, 6].

The vasopressin V2 receptor antagonist, tolvaptan, has been recently approved for the treatment of ADPKD but, due to notable side effects and expense, represents a treatment where good risk prediction is important for targeting use. Tolvaptan is the first approved drug shown to directly affect disease progression [7]; all other therapies target co-morbidities that may contribute to progression but do not affect the underlying disease [8]. The indication for tolvaptan is currently limited to patients with evidence of rapid progression in Switzerland and European Union according to the European Medicines Agency [9] where the expected benefit outweighs the risk of side effects and associated high treatment costs [10]. The challenge for clinicians is to identify patients at highest risk of rapid progression without extensive diagnostic screening across the full patient population. Currently, TKV and the rate of TKV change are considered the most accurate predictors of progression [11]. However, for routine clinical and research purposes, direct measurement of kidney volume is less feasible due to time and technical demands as well as expense.

Recently, Mayo Clinic investigators developed a risk classification system for ADPKD patients using a single TKV value and an accompanying prediction model [12, 13]. In 2016, the European Renal Association – European Dialysis and Transplant Association Working Group published a recommendation that the Mayo Clinic prediction model be used to discriminate patients at high risk for rapid disease progression [14]. However, the prognostic performance of the prediction model has yet to be evaluated in an external sample outside the US, which is critical for establishing accuracy and generalizability of risk discrimination across different patient populations [12].

The aim of our study was to externally validate the Mayo Clinic Model using data from the prospective longitudinal Swiss ADPKD study, with a patient population both geographically and temporally removed from the original patient population in which the model was developed. We also sought to evaluate whether improved prediction performance could be achieved by including additional measurements of the most relevant predictor: height adjusted total kidney volume (HtTKV).

Methods

Swiss ADPKD validation data

Participants were eligible for the Swiss ADPKD study if they had an ADPKD diagnosis, were over 18 years of age and had an eGFR over 30 ml per min per 1.73m2 at enrolment [15]. For the present analysis, participants from the Swiss ADPKD study were included if they were under active follow-up between 2006 to 2016, had at least one follow-up visit and had not been treated with tolvaptan. Approximately 3% (N = 6) of patients had Atypical ADPKD (Class 2) and were excluded from the present analysis. Visits were done at the university hospital in Zurich and at the Hirslanden hospital Zurich. At every scheduled clinical visit, data were collected on medical history, kidney imaging metrics and laboratory values from blood and urine samples. Clinical measurements and assays were done according to a protocol with standardized operating procedures [16, 17]. Following an initial visit, a second visit occurred within 6–12 months and then visits were scheduled annually; when a study participant missed a scheduled visit, a study visit occurred at the next available opportunity to collect Magnetic Resonance Imaging (MRI) and other study data. The local ethics committee in Zurich approved the study (EK-number 1178) and all patients provided written informed consent.

Mayo Clinic risk classes and eGFR prediction model

The Mayo Clinic prediction model has been described [12]. Briefly, five risk subclasses with theoretical yearly percentage increases in kidney volume of <1.5% (Class 1A), 1.5–3% (Class 1B), 3–4.5% (Class 1C), 4.5–6% (Class 1D) and >6% (Class1E) were defined based on age and imaging data (Fig. 1) [12]. Then a linear mixed-effect model was used to predict eGFR after t years of follow-up using baseline (t = 0) predictors: log2HtTKV or risk subclass group (1A-1E) [12], sex, age, eGFR from the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation [18]. Years of follow-up was included as a linear term with a subject specific random effect. Interaction terms of years of follow-up with all predictors were also included in the model.

Fig. 1
figure 1

Subclassification of ADPKD patients based on HtTKV limits on their age at baseline. Limits are defined from the Mayo Clinic based on estimated kidney growth rates of <1.5% (dark green), 1.5–3.0% (mint), 3–4.5% (yellow), 4.5–6% (orange) and >6% (red)

Outcome

Estimated glomerular filtration rate (eGFR)

In accordance with the Mayo Clinic model, our primary outcome was eGFR at t years follow-up. Serum creatinine was measured at each visit by the central laboratory institute of clinical chemistry of the university hospital and the central laboratory in Zurich using the modified Jaffé method traceable to an isotope-dilution mass spectroscopy reference [17]. eGFR at baseline (t = 0) was estimated using the CKD-EPI equation [18].

Predictors

Total kidney volume (TKV)

At every study visit, a measurement of kidney volume was taken by using a standardized procedure protocol for MRI’s [16]. MRI acquisitions contain a breath-hold T1-weighted fast spoiled gradient echo sequence without fat suppression sequence (4 mm slice thicknesses) and trans-axial T2 weighted fast spin echo sequences. TKV was estimated by hand contouring [16]. Height adjusted total kidney volume (HtTKV) was obtained by dividing TKV by patient height (ml/m).

Statistical methods

Baseline characteristics are given as proportions and medians (interquartile range). Patients were stratified into the five subclasses (1A-1E) based on the Mayo Clinic estimated kidney growth rates limits of 1.5%, 3.0%, 4.5% and 6% (Fig. 1).

We applied the Mayo Clinic model to all participants of the Swiss ADPKD study to predict eGFR at t years follow-up, using log2HtTKV as a continuous predictor and keeping regression coefficients fixed at the values determined from the Mayo Clinic development sample. We also applied a second Mayo Clinic model, that replaced the baseline (t = 0) log2HtTKV with baseline risk subclass (1A-1E).

To try to improve upon the Mayo equation predictive performance, we tested two modifications to the original Mayo model. First, we included in the model a second HtTKV follow-up measurement (mostly within 6–12 month of the baseline measurement) to provide additional information on individual change in TKV (Model 1). The regression coefficient for the HtTKV term was refit to the Swiss ADPKD study sample, but all other regression coefficients were kept fixed at their original value, including the intercept. Second, we included information on all subsequent and available HtTKV measurements, again refitting the regression coefficient for the HtTKV term while keeping all other regression coefficients fixed at their original Mayo Clinic values (Model 2). Updated models with and without interaction terms of HtTKV*years were evaluated.

Evaluation of model performance

The model fit to the validation data set was assessed using R-squared statistics and Akaike’s information criterion (AIC). Discrimination was visually assessed using scatter plots comparing observed and predicted eGFR values on the natural scale with an estimated regression line, line of equality and confidence interval. The agreement was assessed using the Bland-Altman analysis [19]. The bias, defined as average (observed eGFR – predicted eGFR) was estimated along with the 95% limits of agreement, defined as the bias ±1.96 standard deviation of the difference between observed and predicted eGFR. Lastly, the % of predicted eGFR within 30% of the observed eGFR was calculated. We followed Steyerberg’s approach to validate and update clinical prediction models [10].

To compare the performance of the prediction models we estimated the continuous ranked probability score (CRPS) of the 3 competing models: original model, updated model 1 and updated model 2. The CRPS is a proper scoring rule to assess univariate predictive distributions with smaller values indicating better predictive performance [20]. The metric takes into account the entire predictive distribution of the outcome [21] and assesses both calibration and precision of predictive distribution. For evaluation of models with added TKV information, five-fold cross-validation was used given that no external validation was available [10].

The predictor HtTKV was missing in 3% of the participant-visits. We used multiple imputation (MI) to impute the missing values; specifically, a Markov Chain Monte Carlo method was implemented and multivariate normality was assumed [22]. We generated 30 imputed data sets for each model with HtTKV [23].

Stata 13.1 was used for all data analysis and graphics.

Results

Characteristics of the Swiss ADPKD study sample

Between April 2006 and March 2016, 214 patients with an ADPKD diagnosis were enrolled in the Swiss ADPKD study, contributing a total of 1985 person-visits. At baseline, the median age was 34 years (interquartile range [IQR]: 27–40), the median eGFR was 82 ml/min per 1.73 m2 (IQR: 70–95) and the median HtTKV was 497 ml/m (IQR: 317–762). Swiss ADPKD study follow-up times ranged from a minimum of 0.42 years for new enrolees to a maximum time of 10.28 years. We assessed change from one class to another in 206 patients from the 214 swiss ADPKD class1 patients. In total, there were 52 patients (25%) from the 206 Swiss ADPKD Study participants who progressed to a more severe risk class over the median 5 year follow-up and 7 (3%) who changed to a milder disease risk class. More than the half of the patients in class 1A (56%) remained in their class: 40% (10 patients) progressed from 1A to 1B (1 patient from 1A to 1C), 18% (9 patients) progressed from 1B to 1C (1 patients from 1B to 1D), 28% (16 patients) progressed from 1C to 1D (4 patients from 1C to 1B) and 28% (15 patients) progressed from 1D to 1E (3 patients from 1D to 1C).

Comparison of the Swiss ADPKD study sample to the Mayo development sample

Compared to the Mayo clinic development sample of 376 patients, the average eGFR was higher by 11 ml/min per 1.73 m2, median age was 10 years younger, and median HtTKV was 155 ml/m lower in the Swiss ADPKD study patients. Swiss ADPKD patients had a median follow-up time of 5 years (IQR: 2 to 9 years) compared to 6 years (IQR 4–10) in the Mayo Clinic patients. Comparing progression rates, more patients progressed in the Swiss ADPKD Study at 25% across all initial risk classes compared to 11% to 16% in the Mayo clinic development sample, though the median follow-up was 5 years compared to 4 years in the Mayo Clinic.

External validation of the Mayo Clinic model

In the Swiss ADPKD patient group, the Mayo Clinic model with the predictor log2HtTKV performed well with explained variance (R2) of 78% (Table 2), compared to the R2 of 69% in the development data set. Replacing baseline TKV with risk subclasses in the model resulted in a poorer model fit with a R2 of 70%, which is slightly lower to the R2 of 72% noted in the development set.

The scatter plot of observed eGFR versus predicted eGFR indicated good discrimination with 91.5% of the predicted within 30% of the observed when log2HtTKV was included as a continuous predictor (Fig. 2a, Table 1) and 88.7% of the predicted within 30% of the observed when risk subclasses were included (Fig. 2b, Table 2). The Bland-Altman analysis shown in Fig. 3 indicated a lower bias for the log2HtTKV model and little distortion of the variability of the distribution, as seen from the approximate zero slope of the regression line.

Fig. 2
figure 2

a Scatterplot of the observed eGFR versus the predicted eGFR derived from the model obtained from the development set with TKV as predictor with regression line and the line of equality. b Scatterplot observed eGFR vs. predicted eGFR derived from the model obtained from the development set with the five subclasses as predictor. c Scatterplot of the observed eGFR versus the predicted eGFR derived from the updated model 1 with two TKV measurements and d updated model 2 with time-varying TKV

Table 1 Baseline characteristics of the development set from the Mayo Clinic and the validation set from the Swiss ADPKD study
Table 2 Predictive performance of the validation, the updated models and the sensitivity analysis (in bold)
Fig. 3
figure 3

a Bland-Altman analysis of the observed eGFR versus the predicted eGFR derived from the model obtained from the development set with TKV as predictor. b Bland-Altman analysis observed eGFR vs. predicted eGFR derived from the model obtained from the development set with the five subclasses as predictor. c Bland-Altman analysis of the observed eGFR versus the predicted eGFR derived from the updated model 1 with two TKV measurements and d updated model 2 with time-varying TKV

Improving the Mayo prediction model

To evaluate whether the Mayo prediction model performance could be improved if additional information on TKV was available, we modified the formula to include subsequently TKV measurements. Updated model 1 (number of observation = 1867), which included a follow-up TKV measurement, showed good overall performance with a R2 of 77%, an AIC of 13,557.91 and CRPS of 58.16 (Table 2). In updated model 2 (number of observation = 1344), which included all available follow-up TKV measurements, resulted in a slightly better CRPS of 57.24 and substantially improved AIC of 9706.15 compared to Model 1. The R2 was reasonably similar between the updated models and similar to the original Mayo Clinic model. Good agreement between observed and predicted was maintained as shown by the high correlation (Fig. 2c, d). Both updating models reduced the bias and provided a good fit to the data (Fig. 3c, d). An interaction term of TKV*years in the updated models did change performance (Table 2).

Discussion

Accurate risk prediction is important for guiding clinical care, particularly when there are substantial costs to treatment. The goal of the Mayo Clinic model was to provide risk prediction for the ADPKD patient population; however prognostic performance has never been established in a broader patient sample and external validity of a prediction model is critical to assure accurate prediction across patient populations and therefore establish the model’s utility as a clinical tool.

Our results indicated that the Mayo Clinic model performs well in our Swiss ADPKD patient sample. Both models showed adequate discrimination and good calibration. The overall prediction performance in our sample as assessed with R2 was higher when the continuous predictor HtTKV was the used than when risk subclasses were used. These results suggest the models are generalizable and would perform well in routine clinical settings. Given the higher eGFRs in the Swiss ADPKD Study, these results were particularly notable, as poorer performance might be expected with upward shifts in the distribution of eGFR compared to the development set. However it should be noted that in the original Mayo Clinic prediction model and in our validation, an estimated eGFR from the CKD-Epi formula was used for the baseline assessment of kidney function. This estimation may itself introduce bias in the prediction of later kidney function, relative to the true GFR. To the extent that the CKD-EPI formula may perform differently in the two cohorts, our results could have impacted. We also did not distinguish between polycystic kidney disease genotypes 1 and 2, and prediction performance could vary between these groups. Further the R2 is known to be sensitive to the range and variability of the data; thus apparent improvement in prediction performance based on a higher R2 in our validation cohort compared to the original development cohort should be interpreted with caution.

It should be noted that the Mayo Clinic prediction model development set used TKV assessed via the ellipsoid equation [12], while the present study used the gold standard TKV assessment by boundary method [24], which could introduce additional variability in prediction performance. However, a recent study assessed patient reclassified by the Mayo risk classification system resulting from these different TKV assessment method. The investigators found only a limited impact with a few patients reclassified mostly to lower risk categories [24].

A second aim of our study was to evaluate whether additional information regarding TKV change could improve the model prediction performance. Based on the results of the validation study and relatively large size of the development sample, we followed Steyerberg’s approach [25] and fixed all regression coefficients at their original values under the premise that re-estimation runs the risk of replacing reliable but modestly biased estimators with unbiased but unreliable ones [10]. Allowing only the coefficient for TKV to vary, we found that the R2 remained relatively unchanged when baseline TKV was replaced with measurements from the first two assessments. Further including all available TKV measurements, including a current TKV assessment, did not provide substantial improvement in the prediction performance that would justify the additional cost, time and effort of TKV measurement.

Strengths of our study include a patient population that was entirely independent of the Mayo Clinic data set, varying geographically, culturally and temporally from the original development cohort. In addition the Swiss ADPKD study has comprehensive follow-up with repeated measurements of kidney volume over time in a well-described cohort of untreated ADPKD patients at an early disease stage. The inclusion of recently enrolled patients as well as those with nearly 10 years of follow-up establishes generalizability across the patient population. Prediction models need to perform well in general ADPKD patient populations, as they are used for clinical decision-making.

Conclusions

In conclusion, we found that the Mayo Clinic prediction model is an accurate tool to identify those at highest risk for rapid disease progression as defined by declining kidney function. The performance of the model was not substantially improved with by including additional TKV assessments, suggesting that follow-up TKV measurements may not be worth the cost and burden for the purposes of predicting progression. The Mayo prediction model may be a valuable tool for identifying patients for whom new treatments such as tolvaptan will provide benefits that outweigh the burden of side effects.