Background

The high worldwide prevalence of chronic kidney disease (CKD) [1, 2] and its burden on health care costs urges clinicians to accurately identify patients at high risk of poor prognosis. Prognostic models predicting renal failure in CKD patients have been recently developed [3,4,5,6] with the aim to facilitate effective clinical management of CKD patients, for instance timely planning of dialysis, and to achieve a more efficient cost allocation based on patients’ differential risk of renal failure and death.

In 2013 our group developed a classification tree model (hereafter named CT-PIRP) to stratify patients according to their annual estimated glomerular filtration rate (eGFR) decline. This model identified seven subgroups characterized by specific combinations of six variables (gender, age, proteinuria, baseline eGFR, phosphate levels, diabetes) which were associated with different levels of eGFR decline [7].

Because eGFR decline is correlated with kidney failure and death [8,9,10,11], we expect that the subgroups identified by the CT-PIRP model would have different risks of end-stage renal disease and of death. In community-based clinical settings in which general practitioners (GPs) are involved and advised to refer CKD patients to specialists in an early stage of the disease, eGFR decline is the main driver of adverse renal outcomes [12], because it reflects the underlying nephropathy, and patients’ adherence and response to specific therapies. However, the ability of the CT-PIRP model to predict renal replacement therapy (RRT) initiation and mortality needs to be determined. The aim of this paper is therefore to investigate the ability of the CT-PIRP model to predict RRT initiation and mortality, and to temporally validate the model on a cohort of CKD patients drawn from the PIRP project in a subsequent time interval. A validated CT-PIRP model could be very useful for nephrologists and GPs to stratify patients into clinical phenotypes at differential risks of three outcomes (eGFR decline, RRT inception and death), thereby assisting them in planning targeted follow-up strategies and treatments.

Methods

Data source

The study population consists of patients participating in the PIRP project [13], a collaborative network of nephrologists and general practitioners operating in Emilia-Romagna, a region of North-Eastern Italy with 4,351,393 inhabitants (2011 census data, National Institute of Statistics). The study was exempt from approval from the Ethics Committee of Emilia-Romagna. It was conducted in conformity with the regulations for data management from the Regional Health Authority of Emilia-Romagna, and with the Italian Code of conduct and professional practice applying to processing of personal data for statistical and scientific purposes (art. 20–21, legislative decree 196/2003; https://www.garanteprivacy.it/documents/10160/0/Codice+in+materia+di+protezione+dei+dati+personali+%28Testo+coordinato%29) published in the Official Journal no. 190 of August 14, 2004, which explicitly exempts the need for approval from the Ethics Committee when using anonymous data (Preamble number 8). In Italy, anonymous administrative data-gathering is subject to the law Protection of individuals and other subjects with regard to the processing of personal data, Act no. 675 of 31.12.1996 (amended by Legislative Decree no. 123 of 09.05.1997, no. 255 of 28.07.1997, no. 135 of 08.05.1998, no. 171 of 13.05.1998, no. 389 of 6.11.1998, no. 51 of 26.02.1999, no. 135 of 11.05.1999, no. 281 of 30.07.1999, no. 282 of 30.07.1999 and no. 467 of 28.12.2001) (https://www.garanteprivacy.it/web/guest/home/docweb/-/docweb-display/docweb/28335). Patients enrolled in the PIRP project provide written consent to use their anonymized data.

The PIRP project is funded by the Emilia-Romagna Region and started in 2004 with the aim to delineate intervention strategies for delaying illness progression, to increase awareness of CKD complications and to optimize CKD patient care and is still ongoing. Patients enrolled in the project receive specialized care, tailored to their severity and comorbidities, in 13 regional nephrology units to which they were referred by primary care physicians. The PIRP database collects demographic, clinical and laboratory characteristics of patients as well as their prescribed pharmacological treatment, and as of October 2018 included more than 27,000 patients.

The derivation cohort and the CT-PIRP model

The derivation cohort consists of 2265 CKD patients enrolled in the PIRP project from April 1, 2004 to June 30, 2010 that were followed up for at least 6 months and had at least four serum creatinine measurements between their first visit and June 30, 2011. This cohort was used to develop the CT-PIRP model, that is described in detail in a previous paper [7]. In summary, a classification tree analysis (CTA) was used to identify homogeneous subgroups of eGFR annual decline based on specific combinations of demographic and clinical characteristics. Seven subgroups of patients (also defined as nodes) with similar annual eGFR decline were identified by CTA as the result of the interactions of 6 variables (Fig. 1): proteinuria (coded as present if dipstick protein was > 20 mg/dL, or urine total protein was > 0.3 g/24 h or microalbuminuria was > 20 mg/L), eGFR, serum phosphorus, age, diabetes and gender. The other variables submitted to the CTA analysis as potential predictors of annual decline in eGFR were: hypertension, cardiac disease, smoking habits, diagnosis of nephropathy, and the baseline values of BMI, serum creatinine, serum uric acid, parathyroid hormone, glycaemia, triglycerides, LDL cholesterol, haemoglobin. The subgroup membership is a qualitative variable which embeds different patient characteristics, and as such it cannot be used to create a risk score and, moreover, the magnitude of the estimated eGFR decline alone does not fully describe the risk related to nodes.

Fig. 1
figure 1

Representation of the CT-PIRP Model. Rectangles indicate subgroups of patients; in each rectangle (corresponding to a node) the mean annual estimated eGFR change is reported. The absolute and percentage frequency of each node are indicated over the arrows leading to it. Reworked figure from Rucci et al. [7]

The validation cohort

Temporal validation assesses the performance of a prognostic model in a subsequent cohort of patients recruited from the same data source. It is the simplest form of external validation, is stronger than internal validation [14] and is widely used to evaluate the performance of prognostic models [15,16,17]. Thus, using the same inclusion criteria defined for the CT-PIRP model, we obtained a validation cohort from patients who entered the PIRP project between July 1, 2010 and December 31, 2016. Patients with complete data on the variables used in the CT-PIRP algorithm reported in Fig. 1 were assigned to the subgroup matching their characteristics. To enhance comparability of cohorts, we conducted a 1:1 matching of the two cohorts based on node membership and the time between the first and the last visit, rounded off to months.

Outcomes

The outcomes of interest were inception of RRT (dialysis or transplantation, with censoring of deaths) and all-cause mortality observed until December 31, 2016. Hospital admissions subsequent to patient enrolment in the PIRP project up to April 30, 2017 were also analyzed. Information on these outcomes was obtained through linkage of the PIRP database with the hospital discharge record databases and the mortality registry of Emilia-Romagna Region.

Statistical analysis

Patients’ characteristics of the derivation and validation cohorts were compared using the χ2 test or Mann-Whitney non-parametric test to take into account the non-normality of the distributions of variables. Incidence rate ratios (IRR) for RRT and mortality were used to compare the incidence of outcomes between the two cohorts.

The ability of the CT-PIRP model to predict mortality and RRT initiation was investigated in the derivation cohort using survival analysis at 6 years of follow-up. Subjects were censored on December 31, 2016 or at the date when a competing event occurred (RRT/death, loss to follow-up). Time to death or RRT inception was calculated for each subgroup using Kaplan-Meier (KM) estimate, starting at 6 months after enrolment (the minimum required follow-up time). To further evaluate the severity of illness in the subgroups of patients, the mean number of prescribed drugs (all ATC codes) and the annual number of hospital admissions after entering the PIRP project were compared across subgroups using ANOVA and Kruskal-Wallis tests, followed by post-hoc comparisons. We assigned each node a qualitative ranking based on the comparison of RRT and death risks estimated by Cox regression analyses. Very low risk was assigned when HR was less than 0.5, low risk for 0.5 < HR < 0.8, high risk when 0.8 < HR < 1.5 and very high risk when HR > 2.

The CT-PIRP model was validated in terms of discrimination and calibration. Discrimination refers to the model’s ability to identify substantially different risk profiles, while calibration indicates the predictive accuracy of the risk estimates obtained from the model [14]. As the CT-PIRP does not provide a risk score, we applied validation criteria specific for risk groups. Specifically, to evaluate discrimination we estimated RRT and mortality Kaplan-Meier survival curves of the CT-PIRP subgroups and verified whether these curves were well separated, which indicates good discrimination [18]. Both outcomes were treated as competing, applying censoring if the other outcome occurred. To evaluate calibration, we graphically compared the observed and the expected Kaplan-Meier survival curves of CT-PIRP subgroups, which should be overlapping if the model is well calibrated. The expected Kaplan-Meier curves were estimated based on the assumption that the baseline survival functions of the derivation and validation cohorts should be similar. Thus, we first estimated the baseline survival function in the derivation cohort using a Cox model with subgroup indicators as predictors; then we determined the population-average prediction in the validation cohort, by assigning to each node the corresponding baseline survival function estimated in the derivation cohort [19]. In addition, we fitted cause-specific Cox proportional hazards models for RRT and mortality in which subgroup membership, the cohort indicator and their interaction were included as predictors [20]. We expected to find some significant main effect of nodes (thus identifying high- or low-risk subgroups), possibly a significant main effect of cohort (highlighting heterogeneity in the baseline risk), but no significant interaction terms, indicating that subgroups were well discriminated regardless of the cohort of origin. The node with the largest number of outcome events was used as reference group. Robust standard errors of hazard ratios were obtained using the sandwich estimator to take into account patients’ clustering into nephrology units. To balance the length of follow-up between the two cohorts and to reduce the possible influence of long-term survivors [21], both cohorts were censored at 4 years of follow-up. The goodness of fit of these models was compared with that of other univariate Cox regression models using the baseline CKD-EPI stage or the category of annual eGFR progression rate as predictors. Lastly, we estimated the competing risks of death and RRT. This was done estimating the sub-hazard functions for RRT, mortality and loss to follow-up using the Fine and Gray model [22], and comparing the corresponding cumulative incidence function (CIF) for each node of both cohorts using stacked cumulative incidence plots. CIF represents the absolute risk for the event of interest in the presence of competing risk. Furthermore it is considered the appropriate method to take into account competing risks in prognostic models [23].

The validation process was reported according to the TRIPOD statement checklist [14]. Stata v.15.1 was used for all analyses; specifically, the user-written procedure stcoxgrp [19] was used to calculate Kaplan-Meier survival estimates.

Results

Predictive ability of the CT-PIRP model in the derivation cohort

The overall mean annual eGFR decline was − 1.33 ± 5.16 mL/min (Table 1); it was faster in nodes 1, 5 and 3 (− 3.66; − 2.97; − 2.83 mL/min respectively) and slower in nodes 6 and 7 (0.06 and − 0.84 mL/min). The Kaplan-Meier failure curves (Fig. 2a) show that Node 3 had the highest risk of RRT at 6 years (71.9%), while nodes 1, 6 e 7 had similar low risks (around 19%) and nodes 2, 4 and 5 risks ranged from 32.2 to 39.0%. Cox regression hazard ratios (HR) of 2.93 (p < 0.001), 0.43 (p < 0.001), 0.43 (p < 0.001) and 0.45 (p = 0.005) were found for nodes 3, 1, 6 and 7 compared to node 2 (proteinuria patients, with eGFR ≤33.652 and serum phosphates ≤4.3 mg/dl). Mortality risk ranged between 41.1 and 49.1% for nodes 3, 6 and 7, was 35.7% for node 2, 30.0% for node 5 (Fig. 2c) and was lower for nodes 4 and 1 (9.1 and 18.0% respectively) The latter four nodes showed a significantly lower mortality risk than node 7 (non-proteinuria, older, male patients) in Cox regression. Event-free (death or RRT) median survival time varied widely from the shortest (node 3: 2.05 years) to the longest (nodes 1 and 4: 6.00 years). Patients of node 1 showed low mortality and RRT risks despite having the fastest eGFR decline; the higher baseline eGFR (46.7 mL/min) and the younger age (63.8 years) of this group might account for these results. Moreover, this group was characterized by a higher proportion of patients with diabetic nephropathies (20.9%) and glomerulonephritis (24.4%).

Table 1 Characteristics of the derivation cohort

Matching and comparison of cohorts

The validation cohort comprised 3837 eligible patients, of which 2051 were matched with the derivation cohort. Matching was successful for each node in the two cohorts (Table 2) but showed some significant differences. Patients of the validation cohort had a 2.5 mL/min higher median baseline eGFR and a higher percentage with diabetes (38.1% vs 32.6%). eGFR change showed a significant but modest difference between the two cohorts only for node 5 (− 1.11 vs − 1.79 mL/min). The validation cohort showed a significantly lower incidence for RRT: IRR = 0.655 (95% CI: 0.553–0.773), which was due to the lower IRRs in nodes 4, 5, 6 and 7. Mortality was similar between the two cohorts except for node 7, that showed a significantly lower IRR in the validation cohort: IRR = 0.876 (95% CI: 0.767–0.999).

Table 2 Comparison of the matched derivation and validation cohorts

Temporal validation for RRT

The risk of RRT initiation at 4 years estimated in the validation cohort using KM curves (Fig. 2b) proved to be similar to that of derivation cohort, and it was highest for node 3 (proteinuric patients with low eGFR and high serum phosphate) (57.8%), and low for nodes 1 (6.7%), 6 (7.0%) and 7 (5.8%). In contrast to the derivation cohort, node 2 (proteinuric patients with low eGFR and low serum phosphate) appeared as a relatively high risk group (33.7%), while nodes 4 and 5 had lower risk (12.3 and 9.2%). These findings were consistent with those obtained using the Cox regression (Table 3) in which node 3 was at higher risk (HR = 3.848, p < .001), nodes 1, 6 and 7 had significantly lower hazard ratios ranging from 0.308 to 0.442, and nodes 4 and 5 had similar survival than node 2, used as reference. Significant cohort X node interactions were found for nodes 4, 5, 6 and 7, indicating that in those subgroups the estimated risk was lower in the validation cohort. Calibration was not completely satisfactory, because nodes 1, 2 and 6 showed similar survival estimates (Fig. 3), while in the remaining nodes (nodes 3, 4, 5 and 7) observed and expected estimates diverged after 2 years of follow-up.

Fig. 2
figure 2

Kaplan-Meier curves of the 4-year risk of RRT initiation and mortality for the nodes of the CT-PIRP model. Panel a: RRT in the derivation cohort. Panel b: RRT in the validation cohort. Panel c: mortality in the derivation cohort. Panel d: mortality in the validation cohort. The nodes are identified by the numbers placed upon the curves

Table 3 Results of the Cox proportional hazards regression on time to death and time to RRT inception
Fig. 3
figure 3

Calibration plots for the mortality and RRT initiation. For each node, lines indicate the predicted survival obtained from the Cox proportional hazard model with nodes as predictors and markers with confidence intervals indicate the observed Kaplan-Meier survival in the validation cohort

Temporal validation for mortality

The KM curves estimated in the validation cohort for mortality (Fig. 2d) had the same rank as those in the derivation cohort: node 4 had the lowest risk (4.2% mortality at 4 years) followed by nodes 5 (12.3%) and 1 (14.0%); nodes 2, 6 and 7 showed risks between 24.0 and 28.8%, while node 3 had the highest risk (49.5%). Cox regression was performed using node 7 as the reference (Table 3) and provided significant lower risks for node 4 (HR = 0.122, p < .001) and node 1 (HR = 0.298, p < .001). No significant interactions were found between nodes and cohorts, indicating that HR estimates for nodes were consistent across cohorts. Calibration was very good, because expected and predicted survival overlapped almost always perfectly (Fig. 3).

Competing risk analysis showed that the cumulative risks of adverse outcomes (CIFs) were very similar between the derivation and validation cohorts for all nodes except nodes 4 and 5, in which the estimated risk for RRT inception was lower in the validation cohort (Table 4 and Fig. 4).

Table 4 Results of the Fine and Gray competing risk survival analysis on time to death and time to RRT inception
Fig. 4
figure 4

Cumulative incidence functions of RRT, mortality and loss to follow-up for each node in the matched derivation and validation cohorts

The comparison of the goodness of fit indices of univariate Cox regression models using CT-PIRP nodes, baseline CKD-EPI stage and categories of eGFR progression rate is shown in Table 5. The CT-PIRP model fit was better than the CKD-EPI model for RRT and better than the eGFR progression rate for death.

Table 5 Goodness of fit comparison of univariate Cox regression models on time to death and time to RRT inception

Discussion

This study provides evidence on the validity of CT-PIRP model in identifying subgroups of CKD patients with different risks of inception to RRT and death. In particular, patients with proteinuria, low baseline eGFR and high serum phosphate had the highest risk of both RRT initiation and death (node 3). On the contrary, older patients without proteinuria (nodes 6 and 7) had a relatively high risk of death and a low risk of initiating RRT. The lower mortality risk was found in non-proteinuric, younger, non-diabetic patients (node 4).

The model is extremely well calibrated for the mortality outcome, while calibration for RRT inception is poorer. In fact, the prediction of RRT for nodes 4 and 5 is not very accurate, because of the lower number of dialysis events observed in the validation cohort. Patients belonging to nodes 4 and 5 had a shorter follow-up time and a different case-mix, with higher eGFR at baseline. It is likely that with a longer follow-up the prediction accuracy of the risk of RRT initiation would improve.

Two of the six variables included in the model, eGFR and the presence of proteinuria, are widely recognized as key risk modifiers of adverse renal outcomes [8, 10, 24,25,26,27,28]. The use of eGFR change as a much better predictor of adverse renal outcomes than the absolute GFR value has been advocated by several authors [26, 27, 29, 30] based on the assumption that incorporates the effect of pharmaceutical-dietary treatment [31,32,33,34] and of physiological factors such as the reduced muscle mass associated with chronic illness [25, 27]. In CT-PIRP, mean eGFR change is not explicitly specified as a model parameter, however it should be seen as embedded into the definition of subgroups.

The original feature of the model is that patients are stratified through empirically-based classification criteria and not by a priori grouping, that is common practice in CKD prognostic models [10, 26, 27, 29, 30]. The CT-PIRP model does not assign individual patients a numerical risk score, but rather identifies clinical phenotypes characterized by specific interactions of six baseline variables that may guide nephrologists towards an accurate and focused examination of patients.

The CT-PIRP model is a practical tool for nephrologists, because it allows them to identify patient subgroups at greater risk of experiencing renal failure and death at 4 years from their first evaluation (Nodes 2 and 3). In these patients, treatment compliance, diet adherence and interventions on modifiable risk factors need to be enhanced and RRT can be timely planned.

Conversely, most patients at low risk of renal failure but high risk of death (nodes 6 and 7) will require greater attention in the treatment of death risk factors, in particular modifiable cardiovascular risk factors. Introducing the CT-PIRP prediction tool into clinical practice can facilitate a more personalized therapeutic approach [35].

A recent systematic review [36] pointed out that prediction models are often impractical because they require predictors rarely used in clinical practice or they lack the information necessary to perform the external validation. The CT-PIRP model does not suffer from these limitations, because the requested information is routinely collected in clinical practice and patients are assigned to the subgroups on the basis of their characteristics.

The development of different tools to identify subgroups of patients at highest risk of adverse renal outcomes that need targeted assessment and interventions has been encouraged [3, 25]. The CT-PIRP model fills the gap of the lack of predictive models for renal adverse outcomes developed in Mediterranean countries where healthcare system is mainly public and an integrated care pathway is implemented.

Our findings should be interpreted in light of some important limitations. Only patients with at least four visits and 6 months of follow-up were included in the model’s development, precluding the assessment of its prognostic accuracy in patients who rapidly reached an endpoint. The follow-up time in the validation cohort was relatively short to accurately detect the outcomes of interest in slowly progressing patients. CT methodology suffers from a limitation related to the instability of the classifier: small changes in the data may modify a tree because, if a split changes, the branches deriving from the affected node change as well. Moreover, CT is a non-parametric method that is not grounded on specific statistical assumptions and as such its decision-making procedure is algorithmic rather than statistical [37]. As a consequence, in contrast to traditional statistical modelling methods, CT do not provide scores and confidence intervals [38].

It follows that the comparison of the predictive ability of CT-PIRP with that of other traditional prognostic models based on risk scores is not straightforward [39]. The comparison of CT-PIRP model with univariate models based on stratification variables such as baseline CKD-EPI stage and classes of eGFR decline showed that the CT-PIRP nodes predict RRT better than the CKD-EPI stages and predict mortality better than the eGFR progression rate.

Conclusions

The CT-PIRP is a promising simple prognostic model that provides an effective clinical stratification of CKD patients into subgroups at different risk of mortality and RRT, using only six variables, easily available in the current clinical practice. Thus the CT-PIRP model is applicable to most patients commonly seen in the nephrology clinics and may inform policy makers on the allocation of resources and support clinicians in the identification of patients who require a differential monitoring targeted to their risk level.

Future perspectives might include an external validation to confirm the predictive performance of the model in independent datasets.