Introduction

Chronic kidney disease (CKD), a common kidney disease with a progressive decline in renal function, is increasingly recognized as a global public health problem [1]. It causes more than half a million patients to develop end-stage renal disease (ESRD) every year, and over 700,000 deaths [2]. CKD is multifactorial and is defined as glomerular filtration rate (GFR) < 60 mL/min per 1.73 m2 or abnormalities in kidney structure or function present for more than 3 months [3, 4]. Diabetic nephropathy is the leading cause of CKD, accounting for approximately 40% of patients with non-dialysis-dependent CKD and ESRD [5]. Other pathological processes for CKD include chronic glomerulonephritis, ureteral obstruction, and renal fibrosis [6,7,8]. As effective therapeutic strategies for ESRD are currently limited, it is important to identify treatments to delay the progression of CKD to ESRD. Rapid CKD progression leads to irreversible pathological changes and may be associated with unfavorable outcomes. Once a patient has developed ESRD, renal replacement therapy is needed to maintain their daily activities. Therefore, there is an urgent need for reliable and accurate progression prediction models for CKD.

Many definitions of CKD progression have been used over the years, such as doubling of serum creatinine level, decrease in estimated GFR (eGFR) to < 15 mL/min per 1.73 m2, and development of ESRD [9, 10]. Currently, there are no clinically robust biomarkers to predict progressive CKD. Rather, clinicians rely on multiple longitudinal kidney measurements, such as the eGFR, proteinuria, and urinary protein/creatinine ratio (UPCR) to identify progression [11]. The shortcomings of these traditional biomarkers are well recognized, and a single index has limited predictive capacity for progressive CKD [12]. However, the use of complex and potentially expensive detection strategies may prevent at risk patients from benefiting from preventative interventions, especially in settings where renal replacement therapy is not readily available. The use of risk models is an attractive and likely cost-effective method for large-scale CKD risk stratification and would allow the identification of populations that would benefit the most from CKD detection. There have been several attempts to create a risk model for predicting the progression of CKD. However, the prediction accuracy of these models has not been tested through widespread application in clinical practice [13,14,15,16].

In the present study, we aimed to establish a model using Cox regression analysis based on commonly used and readily available clinical characteristics to predict disease progression in CKD patients. We performed univariate and multivariate analyses to screen for independent risk factors. The visualization model was constructed by nomogram and web-based calculator, and prediction performance was measured by discrimination, calibration, and clinical utility. This novel simple-to-use model might predict the prognosis of patients with CKD with high accuracy.

Methods

Ethics statement

The study was conducted in accordance with the ethical standards and the Declaration of Helsinki and according to national and international guidelines. It was approved by the authors' institutional review board (No. 883).

Patients

This study used data from 1138 patients with CKD obtained via the Dryad Digital Repository (http://www.datadryad.org/), shared by Limori et al. [17]. According to Dryad's terms of service, researchers can use these data for secondary analysis without infringing on the author's rights. All eligible individuals who were not undergoing dialysis were diagnosed with stage G2–G5 CKD based on the Kidney Disease Improving Global Outcomes classification [18]. All participants were at least 20 years of age and visited nephrology centers for the first time between October 2010 and December 2011. Patients with malignancy that was diagnosed or treated within the previous 2 years, transplant recipients, and those with active gastrointestinal bleeding at enrollment were excluded. All eligible patients were randomly stratified into two groups in a 2:1 ratio (training set and validation set, respectively).

Data collection

We performed a secondary analysis based on data from the above database. Fifteen probable prediction variables were selected, including gender, age, etiology (diabetes, nephrosclerosis, glomerulonephritis, and others), hemoglobin level, serum albumin level, creatinine level, eGFR, proteinuria, urinary occult blood, UPCR, hypertension, history of cardiovascular disease, diabetes, use of RAAS inhibitor, use of calcium channel blocker, and use of diuretics. Moreover, the vital status and follow-up time of each CKD case were extracted.

Predictor selection and development of the prediction model

Depending on the training set, Cox proportion hazard regression models were used to screen potential prognostic factors and estimate their weights [19, 20]. Univariable Cox regression analysis was performed to explore the potential predictors [21]. The selected prognostic factors (p value below 0.05 in univariate analysis) were then included in a multivariate Cox regression analysis to obtain an integrated nomogram by a stepwise feature selection algorithm based on the AIC [22]. Moreover, to facilitate clinical application, we established a visualization tool by a web-based calculator.

Validation of the prediction model

The performance of our model to predict survival was quantified using AUC values from the ROC analysis and the C-index. The performance of the novel model was also evaluated by examining calibration in training and validation sets. In addition, DCA was carried out to assess the clinical utility of the model. These tests were all performed in both the training and validation sets.

Statistical analysis

Continuous variables following a normal distribution are presented as mean ± standard deviation and categorical variables are presented as percentages. Differences between the training and validation sets were analyzed using chi-square tests for the categorical variables and t-tests for the continuous variables. A p value < 0.05 was used as a cutoff for statistical significance. Statistical analysis was performed using SPSS software (version 24.0) and R software (version 3.6.2).

Results

Baseline characteristics

Figure 1 shows a flow diagram of the selection process. After excluding 93 patients with missing data, a total of 1,045 patients was included in the analysis. Patients were randomly divided at a ratio of 6:3 into training (N = 696) and validation sets (N = 349). The demographics and clinical characteristics of the whole, training, and validation sets are presented in Table 1. In the whole cohort, 69.9% of the participants were male, and the mean age was 67.31 ± 13.6 years. Most patients had positive proteinuria and history of hypertension. Across the entire study population, 260 patients had disease progression (CKD progression defined as > 50% eGFR loss or initiation of dialysis).

Fig. 1
figure 1

Flow diagram that shows the development and validation of the prediction model

Table 1 Baseline demographics and clinical characteristics of patients in training cohort and validation cohort

Prognostic factors of CKD

Univariate Cox regression analysis showed that age, etiology, hemoglobin level, serum albumin level, creatinine level, eGFR, proteinuria, urinary occult blood, UPCR, hypertension, diabetes, use of renin–angiotensin–aldosterone-system (RAAS) inhibitor, use of calcium channel blocker, and use of diuretics were correlated with CKD progression. Multivariate Cox regression analysis identified etiology, hemoglobin level, creatinine level, proteinuria, and UPCR as independent prognostic factors of CKD patients (Table 2).

Table 2 Univariate and multivariable Cox hazards analysis of the training cohort

Development of an individualized prediction model

Based on Akaike information criterion (AIC) results, five factors (etiology, hemoglobin, creatinine, proteinuria, and UPCR) were selected to establish the predictive nomogram, which is an intuitive visualization of the model (Fig. 2A). According to the constructed model, the risk score of each sample was calculated according to the model coefficients combined with the corresponding value of the five chosen factors. CKD patients were divided into high- (N = 348) and low-risk (N = 348) groups based on their median risk score. Risk score distribution is shown in Fig. 2B. The Kaplan–Meier survival curve of low- and high-risk groups in the training set is shown in Fig. 2C (p < 0.001). The CKD progression status and follow-up time of each individual is shown in Fig. 2D.

Fig. 2
figure 2

The model to predict the probability of progression in chronic kidney disease (CKD) patients from the training cohort. A The nomogram based on the five variables identified by the Cox hazards analysis. B Distribution of the risk scores calculated by the nomogram scoring system. C Progression-free survival curves stratified by the low- and high-score groups. D Patient distribution in the low- and high-score groups based on progression status

Establishment of a web-based calculator

For convenient clinical use and visualization of the prognostic model, we developed an easy-to-operate web-based model (https://ncutool.shinyapps.io/CKDprogression/) to predict the progression of CKD based on the established nomogram (Fig. 3). Estimated disease progression probabilities can be obtained by drawing a perpendicular line from the total point axis to the outcome axis.

Fig. 3
figure 3

Establishing an easy-to-operate web-based calculator for predicting the progression of chronic kidney disease (https://ncutool.shinyapps.io/CKDprogression/). A Web progression-free survival rate calculator. B 95% confidence interval of the web progression-free survival rate

Model performance in the training set

In the training set, the model was evaluated using time-dependent receiver operating characteristic (ROC) curve analysis over 1-, 2-, and 3-years, along with the concordance index (C-index). The area under the ROC curve (AUC) values for the 1-, 2-, and 3-year survival probabilities were 0.947, 0.931, and 0.939, respectively (Fig. 4A). The C-index for the prediction of progression-free survival was 0.912. The calibration curves of the model showed good probability consistencies between the predicted and observed values (Fig. 4B). These results might confirm that our model was reliable in predicting the prognosis of CKD. Furthermore, a decision curve analysis (DCA) confirmed our expectations, as the analysis revealed that the net benefit in 1-, 2- and 3-year predictions was the highest in the combined nomogram model compared to the single variable (Fig. 4C). Hence, we chose the combined model for clinical use.

Fig. 4
figure 4

Model discrimination and performance in the training set. A Receiver operating characteristic curves for model-based progression-free survival prediction. B Calibration plot examining estimation accuracy. C Decision curve analysis assessing clinical utility

Model performance in the validation set

In the validation set, CKD patients were divided into high- (N = 174) and low-risk (N = 175) cohorts based on their median risk score. The risk score distribution is shown in Fig. 5A. The CKD progression status and follow-up time of all individuals are shown in Fig. 5B. The Kaplan–Meier survival curve of the low- and high-risk groups is shown in Fig. 5C (p < 0.001). The time-dependent ROC curve analysis validated prediction accuracy of this model over other features (Fig. 5C).

Fig. 5
figure 5

Validation of the nomogram in the validation set. A Distribution of the risk scores calculated by the nomogram scoring system. B Patient distribution in the low- and high-score groups based on progression status. C Progression-free survival curves stratified by the low- and high-score groups. D Time-dependent ROC curves for nomogram vs. other single parameters included in the model

In addition, we performed calibration plot analysis in the validation set. The calibration curves of the model showed good probability consistencies between the predicted and observed values (Fig. 6A). DCA analysis revealed that the net benefit in the 1-, 2- and 3-year predictions was the highest in the combined nomogram model compared to the single variable (Fig. 6B).

Fig. 6
figure 6

Assessment of the model in the validation set. A Calibration plot examining estimation accuracy. B Decision curve analysis assessing clinical utility

Discussion

Predicting the CKD outcome in individual patients is beneficial for identifying those who need an aggressive therapeutic regimen. This study showed that etiology, together with proteinuria, serum creatinine, and UPCR, was a better predictor of the risk of progression in patients with CKD when compared to a single indicator. Furthermore, a novel nomogram and corresponding web-based calculator were built as a reference for clinicians to help with clinical decision-making. The risk score identified the highest risk patients accurately, and therefore can identify patients who may benefit most from management by nephrologists without referring the entire population with CKD to them.

The etiology of CKD is multifactorial and diverse. The main causes included diabetes, nephrosclerosis, and glomerulonephritis. In the present study, we found that the highest risk of progression is diabetic kidney disease (DKD). Type 2 diabetes is the most common cause of severe kidney disease worldwide, and DKD is associated with premature death [23]. Although, the fundamental mechanism responsible for the development of DKD to ESRD is poorly understood [24], it is now believed that vessel disease and inflammation are the main pathological mechanisms of CKD [25]. Approximately 40% of diabetic patients develop DKD, and the resultant kidney damage often leads to kidney failure, ultimately requiring dialysis or kidney transplant [26]. Our results suggest that measures should be taken to delay the progression of CKD, especially in cases of DKD.

Proteinuria generally precedes the loss of renal function in kidney disease [27]. For instance, a population-based cohort study in China found that elevated albuminuria was a key predictor of progression to CKD or ESRD and indicated a higher risk of cardiovascular disease and mortality [28]. However, an increasing number of studies have cast doubt on this classic paradigm. In several recent studies, eGFR reduced to 20–39% resulted in normal albuminuria levels [29,30,31]. In some clinical trials, improvement in proteinuria did not translate to an increased GFR or a reduction in end points such as the need for dialysis or death [32, 33]. The critical role of proteinuria as a single predictor of CKD prognosis requires further study. In the current study, incorporation of several factors including proteinuria might increase model accuracy.

Anemia is a common feature at any stage of CKD, especially in patients with advanced stages of CKD. Anemia in CKD is mainly attributable to the relative decrease in erythropoietin production by the kidneys, absolute or functional iron deficiency, and shortened red blood cell survival. The severity of anemia increases with CKD progression and affects nearly all patients with ESRD [34, 35]. The development of erythropoietic stimulatory agents, such as recombinant human erythropoietin and darbepoetin alpha, has resulted in substantial health benefits for patients with end-stage renal failure, including improved quality of life, reduced blood transfusion requirements, decreased left ventricular mass, diminished sleep disturbance, and enhanced exercise capacity [36, 37]. It is generally believed that low levels of hemoglobin are associated with worse outcomes in patients with CKD [38]. These results are in agreement with the findings of our model.

Previous studies have tried to establish models for progression of CKD to kidney failure [13, 14, 39, 40]. Although the estimations produced by previous models were moderately accurate, the results were somewhat complex because many predictors were involved, each with precise classification levels. We identified five easily accessible and simple demographic and clinical characteristics to include in our novel model, which demonstrated that these traditional factors are important in patients with CKD. Our model showed good calibration and discrimination, and the AUC values generated to predict 1-, 2-, and 3-year progression-free survival in the training set were 0.947, 0.931, and 0.939, respectively. In the validation set, the model revealed excellent calibration and discrimination, and the AUC values generated to predict 1-, 2-, and 3-year progression-free survival were 0.948, 0.933, and 0.915, respectively. These results showed that our model can perfectly predict patient survival in CKD. Moreover, we developed an easy-to-operate calculator that allows the public to freely predict local cases and diagnose the adaptability of the model.

Admittedly, there are some shortcomings in our research. First, the model was developed based on the five variables. However, these factors were unstable throughout the follow-up period, which might have partly influenced the precision of the model. Second, although the performance of the model was good in both the training and validation sets, multicenter clinical application is needed to evaluate the external utility of this model. Third, as the main outcome measure was the progression status of CKD, other outcomes such as survival time should be evaluated in future studies.

In conclusion, we constructed and validated a model incorporating five clinical characteristics (etiology, proteinuria, hemoglobin, creatinine, and UPCR) to predict disease progression in CKD patients. This model could serve as a reliable tool for determining CKD treatment strategies and potential outcomes.