FormalPara Key Summary Points

Why carry out this study?

Type 2 diabetes mellitus (T2DM) imposes a significant economic burden, primarily due to micro/macrovascular complications. Diabetes simulation models can be used to inform diabetes management and improve health outcomes.

In many settings, patient-level data are often fully de-identified to maximize patient information safety, which presents challenges for the application of simulation models.

This study aimed to assess the prediction performance of the Building, Relating, Assessing, and Validating Outcomes (BRAVO) diabetes model when applied on fully de-identified patient-level data from the Exenatide Study of Cardiovascular Event Lowering Trial (EXSCEL).

What was learned from the study?

The BRAVO diabetes model demonstrated acceptable discrimination and calibration in predicting non-fatal myocardial infarction, non-fatal stroke, heart failure, revascularization, and all-cause mortality, using only fully de-identified patient-level data.

This study’s findings underscore the robustness and utility of the BRAVO diabetes model when working exclusively with fully de-identified data, which can guide future research and facilitate the continuation of investigations while maintaining data privacy.

Introduction

The high prevalence of type 2 diabetes mellitus (T2DM) has led to rising healthcare costs associated with managing diabetes [1]. A significant portion of these costs is attributed to the micro/macrovascular complication events that can occur in individuals with T2DM [2]. As the burden of diabetes continues to grow, it is essential to develop accurate and reliable predictive models to assess the progression of diabetes and its associated complications.

A new diabetes risk engine, the Building, Relating, Assessing, and Validating Outcomes (BRAVO) diabetes model, has been derived using the Action to Control Cardiovascular Risk in Diabetes (ACCORD) clinical trial dataset, one of the largest USA-based diabetes trials [3]. The BRAVO diabetes model is a person-level, discrete-time microsimulation model that predicts the progression of diabetes based on individuals’ sociodemographic and clinical characteristics and treatments. The model predicts risks of macrovascular events (such as myocardial infarction (MI), heart failure (HF), stroke, angina, and revascularization), microvascular events (such as chronic kidney disease, end-stage renal disease, retinopathy, blindness, neuropathy, amputation), and adverse events (such as hypoglycemia, diabetic ketoacidosis) over a user-specified time horizon. The performance of the BRAVO diabetes model has been validated by 18 clinical trials, including the CANVAS study, EMPA-REG study, and LEADER study [4, 5]. Furthermore, calibration efforts were made using globalization [5] and localization [6] (other US patient cohorts) approaches to enhance the accuracy of its predictions.

The BRAVO model can be applied for a diverse array of research purposes, including cost-effectiveness analysis, trial extrapolation, policy evaluation, risk stratification, and therapeutic strategy optimization. Although the model can utilize cohort-level data to populate the simulation, using patient-level data is recommended to optimize the model’s performance and outcomes. However, as a result of concerns surrounding patient information safety, patient-level data are often de-identified. This process may mask critical information required by the BRAVO model to populate the simulation, thereby posing challenges for conducting simulation-based research.

This study aims to validate the model’s performance when populated exclusively with a fully de-identified dataset to ensure its applicability in secure settings. We used data from the Exenatide Study of Cardiovascular Event Lowering (EXSCEL) clinical trial, which is a large-scale, randomized, double-blind, placebo-controlled study that evaluated the cardiovascular safety and potential benefits of exenatide [7].

Methods

Data Sources and Participants

We used de-identified data extracted from the EXSCEL trial, which include over 8000 patients with type 2 diabetes at increased risk of cardiovascular events [7]. In brief, EXSCEL investigated the effects of the once‐weekly glucagon-like peptide 1 (GLP‐1) receptor agonist, exenatide (2 mg injection), on cardiovascular‐related outcomes in T2DM, including non-fatal MI, non-fatal stroke, and cardiovascular death [8]. The EXSCEL study provided participants’ baseline characteristics and trajectories of key biomarkers, including hemoglobin A1c (HbA1c), low-density lipoprotein (LDL), systolic blood pressure (SBP), and body mass index (BMI). To protect patients’ private information, data were provided in ranges instead of specific values when extracted from the EXSCEL study (e.g., BMI > 30). To convert the ranges back into specific values to support the simulation, we used data from the 2018 National Health and Nutrition Examination Survey (NHANES) for the data imputation of age and BMI. An imputation technique was applied, and the median mean value was used to replace the range [9]. We used the BRAVO diabetes model to predict the study outcomes in a 7-year window using the EXSCEL clinical trial data.

Statistical Analysis

We assessed the model’s discrimination power using C-statistics and calibration using the Brier score. The C-statistics measure a model’s ability to differentiate outcomes, with values from 0 to 1. Higher values, close to 1, indicate better prediction and discrimination. Brier scores evaluate a model’s calibration for predicting outcomes, with values from 0 to 1 and lower values indicating better prediction accuracy. Cardiovascular outcomes predicted in this study included non-fatal MI, non-fatal stroke, HF, revascularization, and all-cause mortality. All analyses were performed using SAS 9.4 and STATA 15.1.

This article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors. All statements and recommendations comply with the 1964 Declaration of Helsinki.

Results

After excluding patients with missing values at baseline, the study cohort comprised 3901 patients in the treatment group and 3861 patients in the control group (placebo) (Fig. 1). Both groups had similar distributions for age, gender, race, diabetes duration, smoking status, BMI, SBP, HbA1c, and LDL levels (Table 1). Additionally, the medical history of patients showed comparable percentages of MI, HF, stroke, angina, surgical revascularization, and blindness between the two groups. The similarity in baseline characteristics suggests that the groups are well balanced at baseline to compare the effects of the Exenatide 2 mg treatment versus the placebo in the study.

Fig. 1
figure 1

EXSCEL cohort selection

Table 1 Baseline characteristics of EXSCEL study groups

The BRAVO diabetes model demonstrated acceptable discrimination and calibration in predicting the risk of non-fatal MI, non-fatal stroke, HF, revascularization, and mortality. We examined the discrimination power using the C-statistics for these outcomes (Table 2). The outcomes include non-fatal MI (C-statistic 0.620, 95% CI 0.590–0.650), non-fatal stroke (C-statistic 0.696, 95% CI 0.634–0.758), HF (C-statistic 0.700, 95% CI 0.666–0.733), revascularization (C-statistic 0.664, 95% CI 0.590–0.738), and all-cause mortality (C-statistic 0.746, 95% CI 0.711–0.781). These values suggested acceptable discrimination performance for the model in predicting these cardiovascular outcomes.

Table 2 Discrimination power of the BRAVO model in predicting cardiovascular outcomes

We examined the calibration performance by the Brier scores for various outcomes in the study (Table 3). The outcomes include non-fatal MI (Brier score 0.080, 95% CI 0.074–0.085), stroke (Brier score 0.031, 95% CI 0.028–0.034), HF (Brier score 0.037, 95% CI 0.034–0.041), revascularization (Brier score 0.036, 95% CI 0.033–0.039), and all-cause mortality (Brier score 0.102, 95% CI 0.098–0.106). These Brier scores indicate the model’s acceptable calibration performance in predicting these cardiovascular outcomes.

Table 3 Calibration of the BRAVO model in predicting cardiovascular outcomes

Discussion

This study validated the performance of the BRAVO diabetes model using de-identified data from the EXSCEL clinical trial. Our findings indicate that the model is robustly predicting the progression of T2DM and associated cardiovascular outcomes among individuals with T2DM. This validation is essential as it supports the use of the BRAVO diabetes model in clinical practice and policymaking related to diabetes management, when only de-identified data were available.

Several risk engines, including the BRAVO diabetes model, have been developed to predict adverse outcomes in individuals with diabetes, leveraging novel technology and a wealth of new information [10,11,12,13,14,15,16]. The BRAVO model, developed using a broad US diabetes cohort, stands out owing to its strong internal and external validity, achieved by utilizing individual patient data rather than aggregate estimates for external validation [17]. In contrast to other studies that relied on aggregate data, the BRAVO diabetes model’s performance was found to be acceptable in terms of both discrimination and calibration in this study using individual-level clinical trial data. These findings are consistent with the previous validations of the BRAVO model using other clinical trial data, such as the CANVAS study, EMPA-REG study, and LEADER study [5]. Our findings not only reinforced the robustness of the BRAVO diabetes model when using de-identified data but also further demonstrated its applicability to a diverse population of individuals with diabetes across various large clinical trials. The ability of the model to predict various cardiovascular outcomes with reasonable accuracy highlights its potential utility in guiding clinical decision-making for individuals with T2DM.

There are some limitations to this study. First, we used de-identified data from the EXSCEL clinical trial, which required the imputation of some variables based on NHANES data. This process may introduce some uncertainty in the results. However, the matching technique used for imputation and the use of mean values to replace the ranges aimed to minimize this limitation. Second, the compliance and drop rate of the drug between the treatment and control groups may influence the outcomes and their predictions, potentially leading to variations of Brier scores in outcomes we assessed in this study. As a result of the data use agreement, we were unable to perform sensitivity analyses to address these issues. Additionally, the current BRAVO model does not account for social and other unobserved conditions, which may also contribute to the variations assessed. Third, the study cohort was primarily derived from a single clinical trial, which may limit the generalizability of the findings. The cohort, predominantly composed of White individuals (70%), does not fully represent the broader US population, particularly given the disproportionate representation of Black adults among US patients with type 2 diabetes [18]. Furthermore, our study was confined to a 7-year time window, and longer periods would be beneficial for further validation. Despite these limitations, our findings remain robust and provide a solid foundation for future studies to refine and validate the model using data from other sources or real-world populations.

Conclusion

Our study revealed that the BRAVO diabetes model exhibited strong discrimination and calibration when predicting cardiovascular outcomes among individuals with T2DM using solely fully de-identified data. These findings indicate that the BRAVO diabetes model could serve as a valuable tool for predicting T2DM progression in secure settings without requiring identifiable information. This further demonstrates the versatility of the model and its potential for extending its use across a wide range of application areas.