External validation of prognostic models for chronic kidney disease among type 2 diabetes

Background Various prognostic models have been derived to predict chronic kidney disease (CKD) development in type 2 diabetes (T2D). However, their generalisability and predictive performance in different populations remain largely unvalidated. This study aimed to externally validate several prognostic models of CKD in a T2D Thai cohort. Methods A nationwide survey was linked with hospital databases to create a prospective cohort of patients with diabetes (n = 3416). We undertook a systematic review to identify prognostic models and traditional metrics (i.e., discrimination and calibration) to compare model performance for CKD prediction. We updated prognostic models by including additional clinical parameters to optimise model performance in the Thai setting. Results Six relevant previously published models were identified. At baseline, C-statistics ranged from 0.585 (0.565–0.605) to 0.786 (0.765–0.806) for CKD and 0.657 (0.610–0.703) to 0.760 (0.705–0.816) for end-stage renal disease (ESRD). All original CKD models showed fair calibration with Observed/Expected (O/E) ratios ranging from 0.999 (0.975–1.024) to 1.009 (0.929–1.090). Hosmer–Lemeshow tests indicated a good fit for all models. The addition of routine clinical factors (i.e., glucose level and oral diabetes medications) enhanced model prediction by improved C-statistics of Low’s of 0.114 for CKD and Elley’s of 0.025 for ESRD. Conclusions All models showed moderate discrimination and fair calibration. Updating models to include routine clinical factors substantially enhanced their accuracy. Low’s (developed in Singapore) and Elley’s model (developed in New Zealand), outperformed the other models evaluated. These models can assist clinicians to improve the risk-stratification of diabetic patients for CKD and/or ESRD in the regions settings are similar to Thailand. Graphical abstract Supplementary Information The online version contains supplementary material available at 10.1007/s40620-021-01220-w.


Introduction
Chronic Kidney Disease (CKD) is a major worldwide health burden and the most common microvascular complication of type 2 diabetes (T2D) [1,2]. In 2017, more than 840 million individuals developed CKD [3], increasing health care demand, particularly in low to middle-income countries (LMICs) [1]. In the UK and the United States, the prevalence of CKD in T2D was reported to range between 25 and 36%, of which 19% was estimated to be advanced (stages [3][4][5] [4]. The age-standardised global mortality of CKD due to diabetes has been estimated at 7.6 per 100,000 [5].
Early detection and treatment are beneficial in the prevention or delay of CKD progression. Despite improved screening, many CKD patients face delayed diagnosis until an advanced stage due to a lack of overt symptoms. Prognostic models for complications associated with T2D progression that incorporate clinical information systems would facilitate improved treatment allocations, healthcare management, and improve understanding of clinical research strategies [6,7].
Despite many potential advantages, prognostic models have several shortcomings and frequently reported deficiencies [20]. Multiple models have been developed in different ethnicities [8][9][10][11][12][13][14][15][21][22][23][24][25][26][27][28][29] but no single model has consistently outperformed all others in Asian populations. For instance, a study based in China performed a limited temporal internal model validation over time on the same data [10]. Most importantly, adaptation of a suitable prognostic model by ethnicity is particularly in an Asian context given that half of the ten countries affected by diabetes worldwide are Asian [4]. Furthermore, recent recommendations have proposed re-evaluation to including race/ethnicity in CKD prediction models [30].
Therefore, this study conducted external validation and improvement of previously published prognostic models of CKD and end stage renal disease (ESRD) in Thai T2D patients.

Methods
We adhered to the TRIPOD guidelines for the development and validation of a clinical prediction score [31,32]. We focused on external validation of existing models of CKD-ESRD risk predictions in T2D, supplemented with the addition of routine clinical factors to potentially increase the discriminatory power in our local population [18].

Study design and data sources
Data from the Thailand National Health Examination Survey (Thai-NHES) and the standard health databases version 2.4 2019 edition (http:// spd. moph. go. th/ healt hdata/) were used for model validation. The NHES IV and V were populationbased cross-sectional surveys conducted in 2009 and 2014, respectively. These surveys captured: health interviews, physical examination, nutrition assessment, and healthrelated behaviours [33]. Briefly, a multi-stage sampling of adult subjects from the regions, provinces, and districts across the country was used [34,35].
The standard health databases included medical service records from hospitals, mostly under the direction of the Ministry of Public Health. They comprised a set of tables of all transactions from outpatient and inpatient services for each individual; of 43 files available, only the six that were related to outpatient services (i.e., Person, Diagnosis, Chronic, Drug, Laboratory, and Death) were used for this study.

Settings and participants
A total of 19,671 and 18,564 participants were de-identified from NHES IV-V, respectively; removal of duplicates and missing or invalid citizen identification (CID) resulted in 29,089 participants remaining, see Fig. 1. These were linked with the standard hospital health databases (1999-2019) using an encrypted CID to construct the initial sampling frame, leaving a total of 26,170 participants.
We confirmed T2D status based on self-report, medication use, and/or pathology tests (Fasting Plasma Glucose (FPG) ≥ 126 mg/dL or HbA1c ≥ 6.5%). We excluded type 1 diabetes (T1D) with age at onset less than 30 years with severe insulin treatment. There were 3416 participants with identified T2D, of whom 270 (7.9%) were excluded on the basis that CKD was diagnosed prior to T2D, leaving a total of 3146 participants. Of these, 3014 (10.4%) participated in both NHES IV-V, with 402 newly diagnosed participants identified after the survey, see Fig. 1. These participants were followed up from 1999 to October 31st, 2019.

Outcomes
The primary study outcomes included diabetic nephropathy (CKD stage 3-5) based on the International Classification of Disease, Tenth Edition (ICD-X), which was confirmed by estimated glomerular filtration rate (eGFR) < 60 mL/ Fig. 1 Flowchart for participant inclusion min/1.73m 2 measured within 3 months before and after diagnosis, see Table S1. ESRD (CKD stage 5) was defined as eGFR < 15 mL/min/1.73m 2 , or dialysis identified by ICD-X code diagnosis. eGFR was based on the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) formula [36].

Established prognostic factors
We focused on prognostic factors identified through our systematic review, including demographics (age, sex, education, income, and area of residence), biomarkers, comorbidities, medication usage, and clinical features; the latter included diabetes duration, body mass index (BMI; kg/m 2 ), waist and hip circumference (cm), systolic/diastolic (SBP/DBP) blood pressure (mmHg), pulse (beat/min), smoking, alcohol consumption, dietary control measures, physical activity, dyslipidaemia, hypertension, and family history of diabetes (FHD, presence of T2D in 1 st -degree relatives). Biomarkers included lipid profile (i.e., high-density lipoprotein (HDL), low-density lipoprotein (LDL), triglycerides (TG), total cholesterol (TC) in mg/dL, FPG (mg/dL), haemoglobin (g/dL)) and dipstick proteinuria. Comorbidities included a history of cardiovascular disease (CVD) and stroke. CVD was defined by self-report, clinical diagnosis or receipt of treatment for coronary heart disease. Medications recorded included oraldiabetic, blood pressure or cholesterol-lowering drugs.
We included clinical data associated with diabetic complications (i.e., retinopathy, stroke, and composite CVD's) based on ICD-X diagnostic codes (Table S1), laboratory follow-up, medication treatment (Table S2), or death certification (based on ICD-X).
All factors were included according to their definitions in the original studies (Table S3-S4).

Statistical analysis
Descriptive statistics for predictor variables were summarised as mean (± standard deviation) or median (interquartile range) for continuous variables or frequency (percentage) for categorical variables. Participant characteristics were compared between groups using Chi-Square or Fisher's Exact test, where appropriate for categorical variables, and oneway ANOVA or Kruskal Wallis for continuous variables. The predictors which were missing ranged from only 0.1% (n = 3) to 5.8% (n = 199). Therefore, a complete case analysis was applied for the whole analyses.
We evaluated prognostic models originally derived by logistic [8,10] or Cox regression models [9,[11][12][13]] that were identified from our systematic review (PROSPERO: CRD42018105287). Prognostic scores were calculated according to the published regression formulae using the coefficient and intercept or baseline hazard, see Table S4.
External validation was undertaken in accordance with guidelines for the validation and interpretation of risk prediction models [18,19]. In brief, we evaluated model performance through comparisons between the original published equation and models that included additional adjustment (e.g., intercept, regression coefficients) for other potential predictors, see Appendix [18,[38][39][40].
Briefly, model performance was evaluated as follows [7]. Discrimination was assessed by concordance of C-statistics, area under receiver operator characteristic curves (AUROC) [41], and 95% confidence intervals (CI's). Calibration, i.e., the closeness between the observed and predicted values, was assessed using the Hosmer-Lemeshow goodness-of-fit test, the observed to expected (O/E) ratios with 95% CI, and calibration plots. We also used global heuristic shrinkage factors and penalised regression to address issues of overoptimism in updated prognostic models [39,42,43].
All statistical analyses were conducted using STATA version 16.0. A two-sided p-value less than 0.05 was considered significant.
T2D was character ized on the basis of a FPG ≥ 7.0 mmol/L in four [9][10][11][12] studies, or medical record review in the remaining two studies [8,13]. Identification of CKD was mainly based on eGFR and ICD-X codes. The number of prognostic factors included in each model varied between 4 and 11 and included age, sex, SBP, creatinine, and diabetes duration as common predictor variables.

3
These models had fair to good calibration, and discrimination C-statistics ranged between 0.713 [10] and 0.920 [12].

Participant characteristics comparisons
Participants in our study were slightly younger with fewer males (39.8% vs 43.7%-56.2%) compared to the other six CKD-ESRD studies (Table S5). Mean diabetes duration, BMI, serum creatinine, eGFR and SBP-DBP for our cohort fell within the range reported across the various models but the prevalence of dyslipidemia and hypertension was much higher among our participants. Our cohort had lower FPG and HDL-C, but higher lipid levels (i.e., LDL-C, TG and TC). Moreover, the percentages of anti-hypertensive, antihyperlipidaemic and oral diabetic medications were lower than for other reported models.
CKD incidence in our study was similar to that reported by Low and colleagues [8] (i.e., 43.9 vs 42.9%), but much higher than that reported in the remaining studies [9,10], which ranged from 0.7 to 12.3%. The incidence of ESRD in the study by Lin et al. [12] was comparable to our study (5.04% vs 5.90%), but much higher than the other two studies that reported it [11,13] (0.4% and 2.5%), see Table 1.       The coefficients for the associations between prognostic factors and CKD/ESRD in our cohort were estimated and compared to those in the original models, see Table S6. Our coefficients were mostly similar to the model proposed by Low and colleagues [8], but several predictors (i.e., sex, BMI, location, HDL-C, presence of hypertension, and/or dyslipidemia) were not significant compared to the models proposed by Miao et al. [9]. Most predictors in Wu's model were also significant in our data; however, the effect sizes were lower for SBP, and diabetes duration and the direction of effect was reversed for BMI. Comparison of the corresponding rank odds ratio of predictors included in their respective CKD models identified creatinine (β = 4.653) and retinopathy (β = 1.045) with the strongest effects for females in Miao's model, whereas SBP (β = 0.902) and diabetes duration (β = 0.891) were highly associated with CKD in Wu's models, respectively (Table S6). For modelling ESRD, only three of the 10 predictors were significant in Elley's [11] equations, including creatinine, diabetes duration and microalbuminuria, whereas in Wan's [13] models for female participants, insulin use, oral diabetic drug, and SBP were significantly correlated with ESRD in our multivariate analyses (Table S6).

External validation
External validations were performed for models M 1 to M 6 where applicable (Table S7). Results of CKD-ESRD models are summarised in Table 3. At baseline (M 0 ), all prognostic models showed fair calibration, but discrimination varied from poor to moderate, i.e., 0.585 to 0.707 and 0.671 to 0.760 for CKD and ESRD, respectively (Fig. 2). Sexspecific specific CKD and ESRD models performed better for females. For CKD, Miao's model for females generated a C-statistic of 0.786 (0.765-0.806) compared to 0.720 (0.691-0.749) for males, see Table 3.
All CKD-ESRD models provided improved C-statistics following additional adjustments of the regression coefficient (M 3 ) and updated models from (M 4 -M 6 ), see Figure  S2. We updated CKD models by adding biomarkers (i.e., FPG groups < 126 vs ≥ 126 mg/dL) and/or interaction effects with oral diabetic drug use; the greatest improvement was observed in the model by Wu and colleagues with a C-statistic of 0.790 (0.774 -0.806), see Table 3.
Four ESRD risk scores showed moderate to good calibration for baseline validation, recalibration, and updated models, see Figure S3 and Table 3. Fitting the equations using our validation set of ESRD equations (M 5 ) showed worsening shrinkage, with a penalty of 12.31% and 15.55% for Lin's and Wan's male models, respectively.
The Brier score is another measure of prediction accuracy, ranging between 0 and 1, where lower scores indicate better accuracy. The Brier scores for the baseline and updated models are presented in Table 3. In the updated CKD model, the lowest Brier score was observed in Miao's model for females (0.162), Low's model (0.168), Miao's model for males (0.178), and Wu's model (0.185). Of the four ESRD models, the Brier score for the updated models (M 4 ) was superior and ranged from 0.043 to 0.061. Table S8 provides a summary of the model improvements implemented following baseline validation. New additional predictor variables (i.e., glucose level and/or interaction with oral diabetic medication) significantly improved the discrimination for the CKD models. The highest improvement was observed in Wu's models with ∆C-statistic of 0.214 (0.193 -0.234). Most ESRD models showed minor significant discrimination improvements in the updated models.

Discussion
We externally evaluated, validated, compared and updated six previously published models for predicting CKD/ESRD in a nationwide cohort of Thai participants with T2D, in line with recent framework guidelines [18,19,31,38]. At baseline, most models provided only modest discrimination of T2D patients who developed CKD/ESRD. Two [10,12] models demonstrated similar performance to their parent models. All models showed good calibration and upon modification, the agreement between observed and expected risk was fair, with only a few models showing slight overestimation.
In this study, the associations observed between prognostic factors and CKD/ESRD risk in Thai participants with T2D differed from previous studies. For instance, either hypertension or dyslipidaemia, LDL-C, and BMI were negatively associated with CKD risk in some models [8][9][10], with only a few predictors (i.e., diabetic duration, creatinine, and oral diabetic medications) significantly correlated with ESRD risk. We suspect that the lack of associations or variation in the direction of effect observed between previously reported predictor outcomes may have resulted from heterogeneity among the predictors and outcomes in our data, and that used previously for the development sets. However, we were unable to include two important biomarker predictor variables for four [8,[11][12][13] models (i.e., UACR and HbA1c) as they were unavailable in our data.
We postulate that the magnitude of the C-statistics and miscalibration observed may be explained by case-mix effects represented by the number of events, predictor effects, and heterogeneity in the population characteristics [19,44,45]. Variation of the included predictor variables, and sample size characteristics between derivation and validation settings, are likely responsible for the modest model performance in our population [19,46].
In general, discrimination and calibration improved in our updated models. Although most models demonstrated lower discrimination in our data compared to their original settings, our updated models showed consistent improvement for all evaluation metrics (i.e., Brier score, shrinkage factor, penalty regression, and C-statistics). Most CKD-ESRD models also showed better reclassification (i.e., ∆C-statistic) for the enhanced models. Despite a lack of existing standards, Pencina et al. proposed that ∆C-statistics greater than 0.01 represents a relevant improvement in model prediction [47,48]. For our data, all models showed significant improvement on modification, with ∆C-statistics ranging between 0.041 and 0.214 for CKD and 0.025 to 0.089 for ESRD equations.
The Brier score has been proposed as a measure of discrimination and calibration for model validation [49]. In this study, ESRD models performed better compared to those for CKD as determined by Brier scores. Almost every validation and updated model showed improved predictions (as judged by a Benchmark value less than 0.25) [40].
In our updated models, four proved more effective either for the prediction of CKD [8,9] or ESRD [11,13] in our population, without the need for recalibration or updated equations. These models consistently exceeded all others in terms of calibration and discrimination, and were more comparable to the derived models. Only Elley's model [11] provided a web calculator (http:// www. nzssd. org. nz/ cvd_ renal/) to facilitate easier routine clinical practice use.
The strengths of our study include the long-term follow-up of diabetic progression in 26,170 individuals over 20 years, the definition of CKD from multiple data sources, and the evaluation of previously published prognostic models identified from a current SR/MA. This study was based on real world data from a clinical setting that used a broad range of routinely captured potential predictor variables evaluated for prognostic performance of renal outcomes in those with incident diabetes. To our knowledge, this is the first independent validation of CKD-ESRD prognostic models in an Asian population using real world data, beyond the populations from which the models originated. Therefore, our findings should be useful in predicting CKD-ESRD occurrence in other Asian regions where their settings are similar to Thailand.
Our study highlighted that eGFR assessment using creatinine was beneficial to kidney disease surveillance in a Thai population. By avoiding specific race/ethnicity coefficients, our updated models still offered accurate prognostic estimates which could be enhanced further through improved clinical and laboratory standards [30,50].
Our study has several limitations. Markers of kidney damage, such as albuminuria and cystatin-C were not available in our data and missing data for some predictor variables precluded prognostic risk estimates for some models.

Conclusions
In conclusion, we have provided an independent external validation of prognostic models for the prediction of incident CKD/ESRD in participants with T2D from Thailand. All evaluated prognostic models showed only moderate discriminative performance, but fair calibration at baseline validation. Updated prognostic scores improved predictive performance in most of the evaluation metrics (i.e., discrimination, calibration, and Brier score). An updated prognostic model for clinical use in Asian populations is provided.
Although no model was excellent, prognostic equations not delimited by sex (i.e., Low's [8] and Elley's [11]) performed better in our data and may offer clinical utility as a CKD screening tool in primary care for patients with diabetes.