Derivation and validation of a machine learning risk score using biomarker and electronic patient data to predict rapid progression of diabetic kidney disease

Diabetic kidney disease (DKD) is the leading cause of kidney failure in the United States and predicting progression is necessary for improving outcomes.To develop and validate a machine-learned, prognostic risk score (KidneyIntelX™) combining data from electronic health records (EHR) and circulating biomarkers to predict DKD progression.Observational cohort studyTwo EHR linked biobanks: Mount Sinai BioMe Biobank and the Penn Medicine Biobank.Patients with prevalent DKD (G3a-G3b with all grades of albuminuria (A1-A3) and G1 & G2 with A2-A3 level albuminuria) and banked plasma.Plasma biomarkers soluble tumor necrosis factor 1/2 (sTNFR1, sTNFR2) and kidney injury molecule-1 (KIM-1) were measured at baseline. Patients were divided into derivation [60%] and validation sets [40%]. The composite primary end point, progressive decline in kidney function, including the following: rapid kidney function decline (RKFD) (estimated glomerular filtration rate (eGFR) decline of ≥5 ml/min/1.73m2/year), ≥40% sustained decline, or kidney failure within 5 years. A machine learning model (random forest) was trained and performance assessed using standard metrics.In 1146 patients with DKD the median age was 63, 51% were female, median baseline eGFR was 54 ml/min/1.73 m2, urine albumin to creatinine ratio (uACR) was 61 mg/g, and follow-up was 4.3 years. 241 patients (21%) experienced progressive decline in kidney function. On 10-fold cross validation in the derivation set (n=686), the risk model had an area under the curve (AUC) of 0.77 (95% CI 0.74-0.79). In validation (n=460), the AUC was 0.77 (95% CI 0.76-0.79). By comparison, the AUC for an optimized clinical model was 0.62 (95% CI 0.61-0.63) in derivation and 0.61 (95% CI 0.60-0.63) in validation. Using cutoffs from derivation, KidneyIntelX stratified 46%, 37% and 16.5% of validation cohort into low-, intermediate- and high-risk groups, with a positive predictive value (PPV) of 62% (vs. PPV of 37% for the clinical model and 40% for KDIGO; p < 0.001) in the high-risk group and a negative predictive value (NPV) of 91% in the low-risk group. The net reclassification index for events into high-risk group was 41% (p<0.05).A machine learned model combining plasma biomarkers and EHR data improved prediction of progressive decline in kidney function within 5 years over KDIGO and standard clinical models in patients with early DKD.


INTRODUCTION
Approximately 1 out of 4 adults with type 2 diabetes mellitus (T2D) has kidney disease (i.e. Diabetic Kidney Disease or DKD). Each year, 50,000 individuals with DKD progress to kidney failure in the United States. 1 Estimated glomerular filtration rate (eGFR) and urinary albumin creatinine ratio (uACR), existing diagnostic measurements incorporated into the Kidney Disease: Improving Global Outcomes (KDIGO) guidelines for risk stratification, 2 lack precision in identifying patients who will experience rapid kidney function decline (RKFD), especially in earlier stages of DKD (G1-G3). 3 As a result, primary care physicians (PCP) and diabetologists are often not able to appropriately risk stratify and counsel patients on the progressive nature of their disease.
Easily interpretable and accurate prognostic tools that integrate into clinical workflow are lacking, resulting in suboptimal treatment and referral delays to a nephrology specialist. This has led, in part, to the unacceptable level of RKFD (and kidney failure) in this population 4-8 with a high proportion of patients starting unplanned dialysis. 1, 9,10 Several blood-based biomarkers have shown associations with DKD progression, most significantly soluble tumor necrosis factor receptors 1/2 (TNFR1/2), and plasma kidney injury molecule-1 (KIM-1). [11][12][13] However, implementation of accurate prognostic models combining clinical data from patients' electronic health record (EHR) with blood-based biomarkers is lacking. Although EHR data is widely available, its volume and complexity limits integration with biomarker values using traditional methodologies. Recently, machine learning approaches have been developed that can combine biomarkers and EHR data to produce prognostic risk scores. A simple risk score that improves the ability to identify patients with DKD at low, intermediate, and high risk of RKFD has the potential to improve outcomes through more effective use of medications and efficient resource allocation at the primary care physician level.
In the current study, we developed and validated the performance of a biomarker-enriched, machine learned risk score (i.e., the KidneyIntelX TM test) to predict RFKD in patients with early stage DKD and compared performance to standard clinical models. We determined risk-based thresholds that can easily be integrated into standard clinical workflows and enhance existing clinical practice guidelines.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Study Sample
The cohort is derived from the BioMe Biobank at the Icahn School of Medicine at Mount Sinai (ISMMS) and Penn Medicine Biobank (PMBB). The BioMe Biobank is a plasma and DNA biorepository with recruitment from 2007 which includes informed-consent access to the patients' EHR from a diverse local community in New York City. 14,15 PMBB is a research cohort enrolled from the University of Pennsylvania Health System with recruitment from 2008. 14 Participants actively consented to allow the linkage of biospecimens with their longitudinal EHR (eFigure 1). Both BioMe and PMBB are institutional biobanks that attempt to be representative of the patient populations of the institutions they serve. Patients are recruited from outpatient general medicine clinics and certain subspecialty clinics with limited pre-selection criteria. 16,17 The study protocol was approved by institutional review boards at both ISMMS and University of Pennsylvania; all participants had provided broad written informed consent for research and were not specifically compensated for participation in the current study. Blood was collected on the day of enrollment into BioMe or PMBB with plasma isolation as per standard procedures and continuously stored at -80°C until shipping to the RenalytixAI laboratory, New York, NY where biomarker measurements were performed.

Inclusion Criteria
We selected patients from BioMe and PMBB who were 21-81 years at the time of biobank enrollment ("baseline"), with T2D, an eGFR between 30 and 59.9 ml/min/1.73m 2 or an eGFR>60 ml/min/1.73 m 2 with uACR ≥ 30mg/g. The KDIGO risk model categorizes patients based on eGFR and albuminuria and has 3 colors that correspond to the prognosis of prevalent CKD (we did not include patients at "low risk" or green because they do not have CKD). 2 Patients were included if by the KDIGO eGFR and uACR criteria they were stage G3a-G3b with all grades of albuminuria (A1-A3) and G1, G2 with moderate to high albuminuria (UACR ≥ 30 mg/g (A2-A3)). 2 Proportion of each DKD stage was evaluated against national estimates derived from the National Health and Nutrition Examination Survey (NHANES) years 2018-2019. 18 For eGFR, we defined the baseline period as 1 year before or up to 3 months after the biobank enrolment date. Baseline uACR values were derived from closest values ±1 year from enrolment to maximize sample size as these are measured less frequently; subjects without baseline values of eGFR and uACR meeting these criteria were excluded. Only patients with a stored plasma specimen, a minimum follow-up time from enrolment of at least 21 months, and ≥3 eGFR values after baseline (eFigure 1) were included. Patients with kidney transplants or on chronic maintenance dialysis before baseline were excluded from the study.

Ascertainment of clinical variables
BioMe Biobank and PMBB . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20119552 doi: medRxiv preprint Sex and race were obtained from biobank questionnaires or EHR data. Clinical data was extracted for all EHR variables with concordant time stamps. Hypertension and T2D status at baseline were determined using the eMERGE Network phenotyping algorithms. 16 Cardiovascular disease and heart failure were determined by International classification of disease (ICD)-9/10 clinical modification codes.

Biomarker Assays
The three plasma biomarkers were measured in a proprietary, analytically validated multiplex format using the Mesoscale platform (MesoScale Diagnostics, Gaithersburg, Maryland, USA), which employs electrochemiluminescence detection methods combined with patterned arrays to allow for multiplexing of assays. Each sample was run in duplicate, along with quality control samples with known low, moderate, and high concentrations of each biomarker on each plate. Assay precision was assessed using a panel of 7 reference samples that spanned the measurement range. The intra-assay coefficient of variation (CV) results for KIM-1, TNFR1, and TNFR2 were mean CV 3.9%, 5.4%, and 3.7%, respectively. The inter-assay CV results for the reference samples for KIM-1, TNFR-I, and TNFR-2 were mean CV 9.9%, 10.1%, and 7.8%, respectively. Assays satisfied dilution linearity and were run at 1:4 dilution. Levey-Jennings plots were employed and followed the Westguard rules for re-run of samples. The laboratory personnel performing the biomarker assays were blinded to all clinical information.

Data Harmonization
We harmonized data from BioMe and PMBB. Race/ethnicity was collapsed into 4 major, non-overlapping categories (White, non-Hispanic Black, Hispanic, and Other). ICD and Current Procedural Terminology (CPT) codes were included as yes/no variables with timestamps. Medications were mapped to RxNorm codes 19 and laboratory values to Logical Observation Identifiers Names and Codes (LOINC) codes. 20 Only variables represented in >70% of subjects throughout the combined dataset (except uACR and blood pressure due to their established clinical importance) were included and used for training of the KidneyIntelX algorithm.

Ascertainment and definition of the kidney endpoint
We determined eGFR using the CKD-EPI creatinine equation. 21 We employed linear mixed models with an unstructured variance-covariance matrix and random intercept/slope for each individual to estimate eGFR slope. 22 The primary composite outcome included RKFD defined as an eGFR slope decline of ≥ 5 ml/min/1.73 m 2 /year, 2 a sustained (confirmed at least 3 months later) decline in eGFR of ≥40% 23 from baseline, or "kidney failure" defined by sustained eGFR < 15 ml/min/1.73 m 2 confirmed at least 30 days later, or receipt of longterm maintenance dialysis or receipt of a kidney transplant. 2 Additionally, two nephrologists (SC/GNN) independently adjudicated all outcomes examining each individual patient over their longitudinal course, accounting for eGFR changes (ensuring annualized decline of ≥5 ml/min or ≥ 40% sustained decrease), . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20119552 doi: medRxiv preprint corresponding ICD/CPT codes and medications to ensure that outcomes represented true decline rather than a context dependent temporary change (e.g., due to medications/hospitalizations). Follow up time was censored after loss to follow-up, after the date that the non-slope components of the composite kidney endpoint were met, or 5 years after baseline.

Statistical Analysis
The datasets were randomized into a derivation (60%) and validation sets (40%). The validation dataset was completely blinded and sequestered from the total derivation dataset. Using only the derivation set, we evaluated supervised random forest algorithms on the combined biomarker and all structured EHR features without a priori feature selection and identified a candidate feature set; eTable 1. The derivation set was then randomly split into secondary training and test sets for model optimization with 70%-30% spitting and a 10-fold cross-validation for AUC. We considered both raw values and ratios of the biomarkers. Missing uACR values were imputed to 10 mg/g, 24 missing blood pressure (BP) values were imputed using multiple predictors (age, sex, race and antihypertensive medications), 25 and median value was used for other features where missingness was < 30%. (eTable 2) We conducted further iterations of the model by tuning the individual hyperparameters. A hyperparameter is a parameter which is used to control the learning process (e.g., no of RF trees) as opposed to parameters whose weights are learned during the training (e.g.: weight of a variable).
Tuning hyperparameters refers to iteration of model architecture after setting parameter weights to achieve the ideal performance. For random forest architecture, it could include components such as maximum depth of decision tree, number of trees in forest, and majority voting rules. 26 The final model was selected based on AUC performance.
We generated risk probabilities for the composite kidney endpoint using the final model in the derivation set, scaled to generate a continuous score from 5-100 by increments of 5, and applied this score to validation set.
Risk cut-offs were identified in the derivation set to encompass the top 15% as the high risk (scores 90-100), bottom 45% as the low risk (scores 5-45), and the intervening 40% as the intermediate risk group (scores 50-85). Primary performance criteria were AUC, positive predictive value for high risk group and negative predictive values for low risk group (PPV and NPV, respectively) at the pre-determined cut-offs. The selected model and associated cut-offs were then validated by an independent biostatistician (MK) in the sequestered validation cohort.
In addition to these traditional test statistics, we assessed calibration by examination of the slope of observed vs. expected outcome plots of the KidneyIntelX score vs. only the observed outcomes. We also constructed Kaplan Meier curves for time-dependent outcomes of 40% decline and kidney failure with hazard ratios using the Cox proportional hazards method.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20119552 doi: medRxiv preprint The discrimination of the KidneyIntelX model was compared to a recently validated comprehensive clinical model which included age, sex, race, eGFR, cardiovascular disease, smoking, hypertension, BMI, UACR, insulin, diabetes medications, and HbA1c and was developed to predict 40% eGFR decline in eGFR in T2D. 24 Finally, we calculated the net reclassification index (NRI) for events and non-events. 27,28 All a-priori levels of significance were <0.05. All hypothesis tests were two-sided. All analyses were performed with R software (www.rproject.org).
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Baseline Characteristics of Cohorts
Baseline characteristics of the total study cohort incorporating derivation and validation (n=1146) were as follows; median age 63 years, 581 (51%) female, median eGFR was 54 ml/min/1.73 m 2 , and the median uACR was 61 mg/g. uACR was available in 62% of the cohort and imputed to 10 mg/g in 38%. The most common comorbidities were hypertension (91%), coronary heart disease (35%), and heart failure (33%). The majority (81%) were on ACE inhibitors or angiotensin receptor blockers (ARBs). Baseline characteristics between derivation and validation sets including event rates were balanced. The median number of serum creatinine/eGFR values per patient during the follow-up period was 16 ( Table 1). Distribution of DKD stages of the study cohort is similar to national estimates (eTable 3).

KidneyIntelX Clinical Utility Cut-points
The distribution of patients into KDIGO risk categories was established using 296 subjects (64%) with uACR available in the validation cohort with the population stratified into "moderately increased risk" (53%), "high risk" (compared to a PPV 41% for KDIGO "very high risk", p value < 0.001; Table 2). The NPV in the low-risk group . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20119552 doi: medRxiv preprint was 91% for KidneyIntelX compared to 85% for KDIGO "moderately increased risk" group (p= 0.33). In the subgroup with non-imputed uACR (n=296), the PPV for KidneyIntelX in the high-risk strata further improved to 69% and the NPV improved to 93%. Confusion matrices are available in eTable 5. Additional risk cutoffs by potentially relevant proportions of the population are shown in Table 2 and of the KDIGO model are in eTable 6.
KidneyIntelX correctly classified more cases into the appropriate risk strata (NRIevent = 55% in the derivation set and 41% in the validation set, p value < 0.05; eTable 7) compared to KDIGO. NRInon-event was -8.2% in the derivation set and -7.9% in the validation set (p value NS).

Time to Event Analyses for 40% Sustained Decline or Kidney Failure
Patients with high-risk KidneyIntelX scores (top 15% in the derivation set and top 16.5% in the validation set) had greater risk of progression to time-to-event categorical outcomes of 40% sustained decline or kidney failure than patients in the low-or medium-risk strata combined (hazard ratio (HR) 9.2; 95% CI: 6.2-13.6 in derivation and 9.1, 95% CI 5.8-14.4 in the validation set; Figure 3A&B). Kaplan-Meier curves by KDIGO risk categories in the training and validation set are shown in eFigure 3.

Subgroup analysis
KidneyIntelX performed similarly across patients with an eGFR greater or less than 60 at baseline (0.78 and 0.76 respectively). Additionally, when only data in the year prior to enrollment was included, the AUC was identical (0.77) as was the PPV for the top 16% (62%) and the NPV for the bottom 45% (91%). Kaplan-Meier plots did not change when limited to patients with data ≥5 years to ensure alive for at least 5 years (eFigure 4)

Discrimination for "Kidney Failure" Endpoint
Using the same KidneyIntelX risk score specifically trained for the composite kidney endpoint, the AUC of KidneyIntelX for the "kidney failure" endpoint alone was 0.87 (95% CI 0.84-0.89) in the derivation cohort and 0.89 (95% CI 0.87-0.91) in the validation cohort.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

DISCUSSION
Utilizing patients with T2D from two biobanks with plasma samples and linked EHR data, we developed and validated a risk score combining clinical data and three plasma biomarkers via a random forest algorithm to predict a composite kidney outcome consisting of RKFD, sustained 40% decline in eGFR, and kidney failure over 5 years. We demonstrated that the KidneyIntelX outperformed models using only standard clinical variables, including KDIGO risk categories. 3,20 There were marked improvements in discrimination over clinical models, as measured by AUC, NRI, and improvements in PPV compared to KDIGO risk categories.
Furthermore, we showed that KidneyIntelX accurately identified over 40% more patients experiencing events than the KDIGO strata. Finally, KidneyIntelX provided good risk-stratification for the accepted FDA endpoint of sustained 40% decline in eGFR or kidney failure with a 9-fold difference in risk between the high-risk and lowand intermediate-risk strata for this clinical and objective endpoint.
DKD is an increasingly complex and common problem challenging modern healthcare systems. In real world practice, the prediction of RKFD in patients with T2D is challenging, particularly in early disease with preserved kidney function and therefore, implementation of improved prognostic tests is paramount. Our integrated risk score has near-term clinical implications, especially when linked to clinical decision support (CDS) and embedded care pathways. The current standard for clinical risk stratification (KDIGO risk strata) 2 has three risk strata that overlap with the population of DKD patients that we included in our study. We also created a risk score with three risk strata (low, intermediate, and high) incorporating KDIGO classification components (eGFR and uACR), as well as the addition of other clinical variables, and three blood-based biomarkers. In this way, we were able to augment the ability to accurately risk-stratify DKD patients, thereby enabling improved patient management.
Low-and intermediate-risk patients with DKD can continue care with their existing PCP's or diabetologists and require less intensity of treatments, unless repeat testing, changes in clinical status or local arrangements regarding referral to specialist care indicate otherwise. For those with high-risk scores, oversight may include more referrals to nephrology, 29,30 increased monitoring intervals, improved awareness of kidney health, referral to dieticians, reinforcement of usage of antagonists of the renin angiotensin aldosterone system, and increased motivation to start recently approved medications, including SGLT2 inhibitors and GLP-1 receptor agonists to slow progression. [31][32][33][34] Adoption of these new therapies is lagging, especially in patients considered to be 'low- is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20119552 doi: medRxiv preprint reduce the chances for type 2 error, or minimize the sample size needed to detect a statistically significant difference with treatment vs. control. Interventions that prevent or slow CKD progression and foster patientcentered kidney replacement modalities support the goals of the US Department of Health and Human Services' Advancing American Kidney Health initiative. 35 KidneyIntelX included inputs from biomarkers examined in several settings, including patients with DKD.
Soluble TNFR1 and 2 and plasma KIM-1 have demonstrated reliable independent prognostic signals for kidney function decline and ESRD. 11,12,15,[36][37][38][39][40][41] By incorporating biomarker levels and the EHR data into our machine learning algorithm , we were able to provide a multidimensional representation of the patient and allow for the model to generate improved prognostic estimates. 42,43 Our study should be interpreted in light of the following limitations. uACR was missing in 38% of the cohort, but this is representative of current state of care. For example, uACR was missing in over 50% of the diabetes population in two large nationally representative datasets. 1,44 Moreover, our goal was to develop a risk score using real world data from EHR for prediction where uACR is missing in a significant number of patients. More widespread availability of uACR values would enhance the performance of KidneyIntelX, as it was a contributing feature in our model. However, even with this limitation, the performance of KidneyIntelX was more robust than KDIGO strata in those with uACR measured. Second, there was lack of protocolized followup resulting in missing data and lack of kidney biopsies, as real-world data from the EHR were used. Missing data can lead to biased machine learning models and the data are prone to ascertainment bias. 45 However, the median number of eGFR values per patient was 16, and the median time of follow-up was 4.3 years, thereby providing the opportunity to determine whether the kidney outcome was met. Although the primary biobanked cohorts used in the study are broadly representative of the parent hospital populations in terms of age, race/ethnicity and gender distribution and our study is representative of the US DKD population, we cannot rule out an inherent bias since the recruitment was opt-in recruitment and patients who chose to participate in the cohorts from which the study population was selected may be different than those who did not participate in the primary cohorts. Finally, both cohorts are from the Northeast of USA and an independent validation cohort is needed to ensure generalizability. However, only 1/3 rd of the participants were white, thus there was adequate representation of racial groups that experience disparities for kidney disease.
In conclusion, a machine learned model combining plasma biomarkers and EHR data significantly improved prediction of adverse kidney outcomes over standard clinical models in patients with T2 DKD from two large academic medical centers.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20119552 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20119552 doi: medRxiv preprint Definitions: a Sustained 40% decline in eGFR (confirmed at least 3 months later) decline in eGFR of ≥40% from baseline b "Kidney failure" defined by sustained eGFR < 15 confirmed at least 30 days later, or receipt of long-term maintenance dialysis or receipt of a kidney transplant c Composite: Combination of slope ≥ 5 ml/min/1.73 m 2 /year or sustained 40% decline in eGFR or kidney failure Abbreviations: TNFR1-tumor necrosis factor 1; TNFR2-tumor necrosis factor 2; KIM-1-kidney injury molecule-1 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 2, 2020. .

B.
Prediction distributions of patients with DKD according to the risk of RKFD in the derivation and validation set. Event denoted with an orange dot and represents the composite kidney endpoint within 5 years. Non-event denoted with blue dots and represents an absence of the composite kidney event in the follow-up period. Event . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 2, 2020. . https://doi.org/10.1101/2020.06.01.20119552 doi: medRxiv preprint

Figure 3. Kaplan-Meier Curves by KidneyIntelX Risk Strata for the Endpoint of Sustained 40% Decline in eGFR or Kidney Failure in Derivation (Panel A) and Validation (Panel B)
The risk cutoffs derived from derivation and applied to validation were Low risk= 0-0.061129; Intermediate risk 0.061129-0.30209; high risk 0.30209-1. In derivation set, 45% were low risk, 40% were intermediate risk, and 15% were high risk. In the derivation set, 47% were low risk, 37% were intermediate risk, and 16.5% were high risk. The hazard ratio for high vs. low risk was 18.3, 95% CI 10.1-33.1 in derivation and 14.7, 95% CI 7.8-27.6 in validation. The hazard ratio for high vs. intermediate risk was hazard ratio 5.7, 95% CI 3.7-8.7 in derivation and 6.0 95% CI 3.5-10.0 in validation. The hazard ratio for high vs. low and intermediate risk combined was 9.2; 95% CI: 6.2-13.6 in derivation and 9.1, 95% CI 5. 8-14.4