FormalPara Key Summary Points

Type 2 diabetes mellitus and related complications are prevalent and result in heavy economic and disease burdens both within the US healthcare system and globally.

This study developed predictive risk models for coronary heart disease, heart failure and stroke tailored to an integrated delivery health system patient population with type 2 diabetes and compared the performance of the locally fitted model to the QRisk3, RECODE and ASCVD risk equations.

The locally fitted model performed significantly better than the other three models for predicting incident cardiovascular disease in the health system population.

Use of population-specific clinical data and application of machine learning methods can transform existing general predictive models to locally fitted models that perform better in local populations.

Digital Features

This article is published with digital features, including a summary slide, to facilitate understanding of the article. To view digital features for this article go to https://doi.org/10.6084/m9.figshare.14717004.

Introduction

Type 2 diabetes mellitus (T2DM) is one of the most prevalent chronic diseases in the world and results in heavy economic and disease burdens both within the US healthcare system and globally [1,2,3]. The prevalence of type 2 diabetes (T2DM) has continuously increased over the last decade in the US from 4.21% (12.1 million) in 2002 to 9.4% in 2015, according to a recent retrospective study and the 2017 National Diabetes Statistics Report [4, 5]. Recent studies reported that the overall prevalence of diabetes in the US is projected to reach 21% in 2050. Total estimated direct medical cost of T2DM and its related complications were last reported at $237 billion in 2017 [6, 7]. The majority of the costs associated with diabetes are attributed to the micro-/macrovascular complication events [8,9,10].

Longitudinal electronic health records (EHRs) including diagnoses, tests, procedures, treatments, medication administrations, biomarkers and other laboratory data have been widely implemented in clinical settings and used in health services research in the US [11,12,13,14,15]. EHR data have been used to develop diabetes risk models. For instance, the QRISK3 prediction algorithms were developed to estimate the 10-year risk of cardiovascular disease in women and men using general practice data in England from the QResearch database [16]. Several diabetes risk models in the US have also been used to describe disease progression and support outcomes-driven evidence-based diabetes management, including the 10-year risk equations for complications of type 2 diabetes (RECODe), and American College of Cardiology/American Heart Association atherosclerotic cardiovascular (AS-CVD) disease equations [17, 18].

However, these national models may not be useful at the health system level if the local population significantly differs from the population used to build the model. The increased availability of EHR data, combined with advances in computing and machine learning methods, makes it possible to locally derive risk prediction models. Because they are built off a local population, it is possible these locally fitted models may outperform similar risk prediction models built for other populations.

Outcomes-driven evidence-based diabetes management would become widely adopted if a good prediction model were available to provide quick assessment at the point of care among specific health system populations. Thus, this study’s main objectives are to: (1) describe the development of a predictive risk models for coronary heart disease, heart failure and stroke tailored to Ochsner Health’s (Louisiana’s largest integrated delivery health system) patient population with type 2 Diabetes and (2) compare the performance of the Ochsner model to the risk equations for coronary heart disease of RECODe, AS-CVD and QRISK3.

Methods

Population, Setting and Study Design

This study is a secondary data analysis of EHR data acquired from the Louisiana Experiment Assessing Diabetes cohort study. The LEAD cohort includes electronic health record data which were obtained from the Research Action for Health Network (REACHnet) for the time period between January 1, 2013, and October 31, 2017 [19]. Clinical data from REACHNet conform to the National Patient-Centered Clinical Research Network (PCORnet) common data model, the specification that defines a standard organization and representation of data for the PCORnet distributed research network [20]. We conducted a retrospective observational cohort study within Ochsner Health. The study population was restricted to patients who received care within Ochsner, which was a sub-population derived from the LEAD study cohort [21,22,23]. The definition of T2DM in the present study was formulated according to the Surveillance Prevention and Management of Diabetes Mellitus (SUPREME-DM) definitions as follows: (1) one or more of the International Classification of Disease, Ninth Revision, Clinical Modification (ICD-9-CM) codes and Tenth Revision, Clinical Modification (ICD-10-CM) codes for type 2 diabetes mellitus associated with inpatient encounters; (2) two or more ICD codes associated with outpatient encounters on different days within 2 years; (3) combination of two or more of the following associated with outpatient encounters on different days within 2 years: (1) ICD codes; (2) fasting glucose level ≥ 126 mg/dl; (3) 2-h glucose level ≥ 200 mg/dl; (4) random glucose ≥ 200 mg/dl; (5) hemoglobin A1c (HbA1c) ≥ 6.5%; (6) prescription for an antidiabetic medications [24]. The study and analysis plan were approved by the Ochsner Health Institutional Review Board, which granted waiver of consent for this retrospective data only study.

Data Variables

Patients’ data extracted from the PCORnet common data model for the present study included demographic characteristics, clinical biomarkers, medical histories and medication utilization. The demographic characteristics included age at diabetes diagnosis, race/ethnicity and sex. Clinical information with encounter dates, dates of diagnoses and laboratory test dates included weight, height, body mass index (BMI), blood pressure, diagnoses of various diseases, total cholesterol, triglycerides, high-density lipoprotein (HDL) cholesterol, low-density lipoprotein (LDL) cholesterol, glycosylated hemoglobin (HbA1c) and estimated glomerular filtration rate (eGFR). We also included healthcare settings for each patient encounter and medication prescription histories such as antihypertensive drugs, glucose-lowering drugs and lipid-lowering drugs. Antidiabetic medications included insulin, sulfonylurea, metformin, dipeptidyl peptidase-4 inhibitors, alpha glucosidase inhibitors, amylin analogs, sodium-glucose cotransporter-2 inhibitors, glucagon-like peptide-1 receptor agonists, meglitinide, thiazolidinedione, insulin analogs and medications that increase the secretion of insulin. Antihypertensive medications included beta blockers, calcium channel blockers, ACE inhibitors, diuretic, angiotensin receptor blockers, alpha blockers, sympatholytics and vasodilators. Lipid-lowering medications included statins, niacin, bile sequestrants, PCSK9 inhibitor, fibrates, ezetimibe and fish oil. The eGFR was estimated using the Modification of Diet in Renal Disease (MDRD) [25].

Baseline clinical data were limited to 180 days before the first recorded date of the T2DM diagnosis for consistency in the data collection period for the survival analysis described below. Documentation of clinical care after the index date was used as follow-up data instead of baseline data. Patients with any missing value in baseline data were excluded. Among a total of 93,034 T2DM patients in the original cohort, 86,789 were excluded because of missing HbA1c data at baseline (before T2DM diagnosis documented).

Outcomes

The main outcomes of the present study were recorded diagnosis of coronary heart disease (CHD: ICD-9 CM codes: 410-414,429.2; ICD-10 CM codes: I20-I25), heart failure (HF: ICD-9 CM codes: 402.01,402.11,402.91,428, ICD-10 code: I50) and stroke (ICD-9 CM codes: 430-436, ICD-10 CM codes: I60-I66).

Development of Health System Tailored T2DM Risk Models

The Ochsner risk models employed Cox proportional hazards models for CHD, HF and stroke, followed by LASSO regression to select predictor variables from demographic characteristics, clinical variables, medications and biomarkers. LASSO regularization is a well-established machine learning method that can help select important variables [18, 26]. This approach fitted the Ochsner risk model via penalized maximum likelihood to minimize the risk of overfitting. In addition, the LASSO method has computational convenience and performs competitively in real examples, incorporating different penalties for different coefficients. Unimportant variables receive larger penalties than important ones, so that important variables tend to be retained in the selection process, whereas unimportant variables are more likely to be dropped [27].

Assessment of Model Performance

All compared diabetes risk models included in this study were Cox proportional hazards models. RECODe developed multiple risk equations for T2DM complications, including cardiovascular disease, congestive heart failure, stroke and other microvascular complication outcomes [18]. However, AS-CVD and QRISK3 models only predicted the risk of developing CHD as an outcome of T2DM complication [16, 17]. Since CHD is the common outcome among all models, the Ochsner, RECODe, AS-CVD and QRISK3, model performances were compared on CHD only.

Since the study objective is to develop a specific risk prediction model for Ochsner Health System, this study used the same cohort to test the model performance for the Ochsner model as well as the other risk models. Patients were not excluded if they had missing AS-CVD or QRISK3 or RECODe data. If covariates in ASCVD, QRISK3 and RECODe were missing, the analysis used the baseline data from the respective publications.

Model discrimination was assessed by the C-statistic (area under the receiver-operating characteristic curve) [28]. The baseline survival of CHD was defined as the 5-year survival of CHD in the Ochsner cohort and was calculated by using the Kaplan-Meier survival function [29]. The baseline survival of CHD was used to conduct the performance comparison of the Ochsner model, RECODe and QRISK3. The baseline survivals of CHD in AS-CVD disease equations were used to evaluate its performance since the equations published their own gender- and race-specific baseline survivals for the US population. The patient risk of developing cardiovascular outcomes was calculated by the equation: \(P\left( {t,x} \right) = 1 - S\left( {t,x} \right)~\) being the failure (event) probability, that is, the chance of an event occurring in the interval (0, t) for an individual with covariate vector x. In addition, \(S\left( t \right)~ = S\left( 0 \right)\^e^{{\left( {\sum \beta \times x - \sum \beta \times \bar{x}} \right)}}\). In the equation, \(S\left( 0 \right)~\) is the baseline survival of the three cardiovascular outcomes, x is the corresponding value of each variable in each model, \(\bar{x}\) is the corresponding mean of the cohort’s characteristics for each continuous variable in each model, and \(\bar{x}\) is “0” for each categorical variable for the reference group in the model[18]. C-statistics were calculated by using \(P\left( t \right)\) as probability and event status (i.e., whether the patient had cardiovascular event).

A logistic regression model was used to assess the calibration of risk models. The outcome probability, \(P\left( x \right)\), is a function of the prognostic index (PI), \(\beta \times x\), and the baseline log odds of an event, \(\beta _{0} = {\text{logit}}\{ ~P\left( 0 \right)\}\). Assessing model calibration means comparing the observed event probabilities with those predicted by the model. The observed event probability for an individual is taken as 1 if the individual experiences an event (outcome Y = 1) and 0 otherwise (outcome Y = 0). We write the PI as \({\text{PI}} = \beta _{0} \times x\beta ~\). The predicted event is \(P\left( x \right) = {\text{logit}}^{{ - 1}} ({\text{PI}}) = \left\{ {1 + e^{{\left( { - PI} \right)}} } \right\}^{{ - 1}}\). A logistic regression model \({\text{logit}}\left\{ {{\text{Pr}}\left( {Y\, = \,1} \right)} \right\} = \gamma _{0} + \gamma _{1} \,{\text{PI}}\), which is linear in the PI, was used to check agreement between observed and predicted probabilities[30].

If a model is well calibrated, the estimates of \(\gamma _{0}\) and \(\gamma _{1}\) are identically 0 and 1, respectively. The model calibration was assessed with three tests at time t: (1) intercept test, (2) slope test and (3) joint test [30].

All analyses were conducted using R, version 4.0.3 [31]. All methods were carried out in accordance with relevant guidelines and regulations. This study was funded by the Ochsner Health Clinical Research and Innovation Support Program (CRISP). The data analyzed in this study were not identifiable. This study was deemed exempt from Institutional Review Boards from Tulane University and Ochsner Health. The permission to access the data was granted as the project was funded and led by the data owner (Ochsner Health). The content of this publication is solely the responsibility of the author(s) and does not necessarily represent the views of the sponsoring health system. The authors do not have any conflicts of interest to disclose.

Results

A total of 6245 patients were included in the present study. Table 1 shows the baseline characteristics for the T2DM cohort in Ochsner Health System. The mean (standard deviation [SD]) of age was 61.0 (11.7) years old. Most of the study population was female (51.5%) and White (59.2%). The mean (SD) hemoglobin A1c (HbA1c) was 7.4 (1.7) mmol/mol, and 1284 (20.6%) of the patients had HbA1c > 8%. In addition, the percentage of the population who had hypertension history, CHD history, HF history and stroke history was 78.4%, 16.6%, 8.7% and 9.6%, respectively. Lastly, the percentages of using hypolipidemic, antidiabetic and antihypertensive drugs at baseline were 34.3%, 21.8% and 56.7%, respectively. During the follow-up period, 413 (6.6%) patients had CHD, 295 (4.7%) had HF, and 105 (1.7%) had stroke.

Table 1 Characteristics of the T2DM cohort in Ochsner Health (2013–2017) documented within 180 days prior to first notation of diagnosis in EHR

Table 2 provides the coefficients of the Ochsner risk equations for each of cardiovascular outcomes. The LASSO regularization method revealed that common variables in Ochsner models include age, BMI, systolic blood pressure, HbA1c and eGFR. The other significant predictors were medical histories, such as CHD, HF and hypertension, followed by medication prescription histories and race.

Table 2 Coefficients of the Ochsner models for calculating 5-year risk of CHD, HF and stroke

Among factors identified as statistically significant in the Ochsner (n = 11), RECODe (n = 14), AS-CVD (n = 15) and QRISK3 (n = 23), only age was common to all four risk equations (Supplementary Material Table S1). Three significant predictors of CHD were common between the Ochsner and RECODe models, including age, HbA1c (%) and HDL cholesterol (mg/dl). Only two significant predictors of CHD were common between the Ochsner model and QRISK3 equations, including age and BMI (kg/m2). Five significant predictors of CHD were common between the Ochsner and AS-CVD models, which included age, sex, race, HbA1c and HDL cholesterol.

Table 3 presents the comparisons of model discrimination with alternative risk equations among the Ochsner T2DM cohort. The Ochsner model equations had high internal discrimination with C-statistics of 0.85 for CHD. The Ochsner model equations had better discrimination than RECODe with C-statistics 0.46, AS-CVD disease equations with C-statistics 0.54 and the QRISK3 with C-statistics of 0.72 for CHD.

Table 3 C-statistics of risk model performance for each model in the Ochsner T2DM cohort

Table 4 shows the logistic regression results of the prognostic index on having CHD in the Ochsner T2DM cohort. The estimate of the intercept in the Ochsner model suggested that the predicted risk of having CHD at 5 years is about exp(−3.829) = 0.021 higher that a perfect calibration, indicating that the Ochsner model overestimated the 5-year CHD risk. Along with the joint test results, a miscalibration for the Ochsner model was common with all the other models (Supplementary Material Table S2). Among the four models, the Ochsner model equations had a relatively high internal calibration.

Table 4 Logistic regression results of PI on having CHD in the Ochsner T2DM cohort

Discussion

In an era of learning health systems during which health policy changes are driving population health management to improve the quality, cost and experience of healthcare, health systems need reliable, reproducible predictive analytic tools that account for the diverse characteristics of populations they serve and allow for better patient care. Our results show a significantly better performance in the locally fitted Ochsner model than the other three models for predicting incident cardiovascular disease in the Ochsner population. These findings suggest locally fitted models may provide more useful predictive analytics compared to existing broader models. Although we only compared the number of significant predictors of CHD among the four models, this study found that the Ochsner model required fewer predictors, implying future efficiencies in data extraction and mapping. We did not compare the number of significant predictors of HF and stroke since neither QRISK3 nor AS-CVD predicted the incident risk of developing HF or stroke. This study also found that the Ochsner model showed the best discrimination of predicting cardiovascular risk in the Ochsner T2DM cohort. The experience from the learning health system will be disseminated to other health systems in the state and other regions.

While the discrimination of the Ochsner model was significantly better than the other models for CHD, none of the four models performed well on any of the calibration tests. Failure of calibration tests where risk is overestimated on the high end, as seen in the Ochsner model, is associated with overfitting in models with rare events [32]. One of the benefits of the penalized regression methods undertaken in this investigation is the prevention of exactly that outcome, suggesting overfitting may not be the simple root of the observed problem. Regardless, as the calibration error is in the direction of overestimating risk, it might be argued the result would still serve a purpose in successfully identifying individuals for preventive measures.

While the better performance of a locally fitted model may appear logical, the cross-model comparison on model performance is very preliminary and should be interpreted with caution. For example, the RECODe model derived its risk equations from ACCORD (2001–2009) clinical trial data [18]. However, some of the required variables for this model were not available for each patient in our cohort. Although the RECODe model states its risk equations are tolerant of missing data, its performance may have been significantly hampered in our investigation. On the other hand, if a commonly generalized risk prediction model requires data quality identical to a randomized controlled trial for success, then it has already introduced significant barriers to local implementation.

A systematic review of prediction models for cardiovascular disease risk in the general population argued that the predictive performance of most models for predicting CVD risk is heterogeneous, and the usefulness of most models remains unclear [33]. This systematic review also concluded that it is impossible to recommend which specific model should be used in which setting or location, which was broadly supported by the poor results of the nationally generalizable models in our local population.

This study has several limitations. First, the Ochsner model assesses a 5-year risk compared to the 10-year risk prediction of the comparative models. It is possible that required predictors from the other models which failed in our analysis would become significant in predicting longer term risk in years 5–10 after diagnosis. In addition, it may underestimate the performance of AS-CVD disease equations because our analysis used the gender- and race-specific baseline 10-year survival estimates derived from the AS-CVD study [17]. We cannot directly apply the baseline 5-year survival estimates from the Ochsner T2DM cohort. We have to accumulate more years of data for a cohort with 10 years' follow-up from the current cohort of a maximum 5-year follow-up (EHR records from 2013 to 2017). Additionally, the study used apparent validation which employed the same cohort as the training sample. Therefore, the Ochsner model’s predictive performance estimates could be more optimistic than other validation methods, such as split-sample validation and bootstrap validation [27]. It is also a predictable result because the applied machine learning method was fitted to the local population. Furthermore, the performance of the Ochsner model may be overestimated when applying to patients with missing data since patients with baseline missing data were excluded from model development. Imputation of missing data was not conducted because of the large proportion of patients with missing information (> 25%). The analysis handled missing data by minimizing the number of predictors using the machine learning approach. The LASSO regularization can keep relatively important clinical factors in real-world clinical settings. The Ochsner model had fewer covariates in the risk prediction models than other models. Notably, in real-world settings, patients with missing data within an EHR often have missing risk assessments until the required data are captured and model calculations are subsequently updated. Lastly, the incidence of cardiovascular outcomes may be overestimated among the Ochsner T2DM cohort since we only required a 180-day baseline prior to first recorded diagnosis as an “all-comer” approach for health system’s population health management needs. Thus, the comparison of model performance may also be biased because of the potential bias of the cardiovascular outcome ascertainment.

Conclusions

Use of population-specific clinical data and application of machine learning methods can transform existing general predictive models to locally fitted models that perform better in local populations. Predictive analytics are increasingly incorporated into population health management strategies for risk profiling patients, evaluating the comparative effectiveness of different therapeutic plans and estimating long-term outcomes for different treatment goals. “Generalized” risk prediction models do not necessarily have to be re-built for the local population; however, researchers and clinicians should be cautious about the results of these models when applying them to local populations as the risks may be over- or underestimated. Locally fitted models may provide better support for achieving population-specific strategies.