Background

Type 2 diabetes is prevalent and costly, affecting nearly 1 in 10 adults in the United States at an estimated health care cost of $327 billion [1] and affecting over 450 million individuals worldwide [2] Professional society guidelines recognize type 2 diabetes patient heterogeneity through recommendations for individualized treatment of type 2 diabetes patients [3, 4]. Specifically, treatment guidelines suggest consideration of comorbidity burden and age to guide glycemic control intensity and specific medication choice, acknowledging variation in the risks of over- versus under-treatment in different subgroups of type 2 diabetes patients. Real-world data, however, suggests widespread clinical inertia in diabetes care [5,6,7], potentially reflecting the paucity of evidence to guide treatment individualization. Thus, tools to identify individuals for whom intensive glycemic control may be beneficial are needed, especially for reducing risk of cardiovascular disease, which remains the leading cause of mortality in type 2 diabetes patients [8, 9].

Subgroup analyses of randomized trials may provide the best evidence to guide individualized intensification of glycemic control for cardiovascular risk reduction. The Action to Control Cardiovascular Risk in Diabetes (ACCORD) study and the Veterans Affairs Diabetes Trial (VADT) did not find associations of intensive glycemic control with MACE [10, 11]. However, prior subgroup analyses of both the ACCORD and VADT trials suggest heterogeneous treatment effects (HTE). Individuals without a history of cardiovascular events prior to randomization or whose baseline hemoglobin A1c (HbA1c) was ≤ 8.0% in the ACCORD trial demonstrated a reduction in the primary outcome of MACE in the intensive treatment group [10]. Similarly, intensive glycemic control was associated with reduced cardiovascular events in VADT trial participants with lower coronary artery calcium scores [12].

In contrast to these univariate, hypothesis-driven subgroup analyses, machine learning provides hypothesis-free approaches to evaluating patient subgroups based on combinations of variables [13]. In this study, we aimed to address evidence gaps for type 2 diabetes treatment individualization to mitigate cardiovascular disease risk using a hypothesis-free, data-driven method—causal forests machine learning [14,15,16]—to identify HTE of intensive glycemic control on MACE in the ACCORD and VADT studies.

Materials and methods

Study samples

The Colorado Multiple Institutional Review Board and local VA Research and Development Committee provided human subjects oversight and approval of the study. We included individual-level data from two randomized clinical trials in this study. The ACCORD and VADT studies have been described in detail previously [10, 11]. Both studies included adults with type 2 diabetes and a hemoglobin A1c (HbA1c) ≥ 7.5% at enrollment. The VADT study enrolled participants from December 2000 through May 2003, and follow-up continued through May 2008; median follow-up time in the VADT study was 5.6 years. The ACCORD study enrolled participants at high cardiovascular risk from January 2001 to October 2005; follow-up in the ACCORD study continued until June 2009 with a median on-protocol follow-up time of 3.7 years and a median total follow-up time of 4.9 years. Both studies randomized participants to receive intensive or standard glycemic control. The VADT study aimed to achieve a target of at least 1.5% lower HbA1c in participants randomized to intensive control compared to standard control. In the ACCORD study, intensive glycemic control participants were treated to a target HbA1c < 6% as compared to a target HbA1c of 7–7.9% for the standard glycemic control arm. In this secondary analysis, we included data from all 1791 VADT study participants and 10,251 ACCORD study participants.

Outcome

The primary outcome was major adverse cardiovascular events, defined as fatal or non-fatal myocardial infarction or stroke based on endpoint adjudication in the original trials. All-cause mortality was a secondary outcome.

Predictors

We included baseline variables that were common to the two studies: patient demographics, comorbidities, diabetes medications, cardiovascular disease medications, and laboratory values (Table 1; Additional file 1 Table S1). Estimated glomerular filtration rate (eGFR) was calculated using the Modification of Diet in Renal Disease Study Equation [17]. Hemoglobin glycation index (HGI) was estimated as the residual between measured HbA1c and HbA1c predicted by regressing on fasting glucose in the ACCORD study participants [18].

Table 1 Study population characteristics

Statistical approach

We fit causal forests [14,15,16] to identify HTE of intensive glycemic control in the ACCORD and VADT studies. We first examined ACCORD and VADT studies separately and report the correlation between variable importance statistics from the two causal forests using Kendall’s tau b. We then fit another causal forest using pooled individual-level data from both studies. Finally, we used variable importance from the causal forest analysis of the pooled study data to construct a representative causal tree. All analyses with causal forests contained 5000 trees and a minimum node size of approximately 5% of the total sample size, with each tree fit using an honest splitting and estimation approach [14,15,16] from random samples representing half of the stratified samples. To avoid overfitting, each tree only considers half of the covariates for splitting, randomly selected from the set of predictors. To compare contributions of variables to HTE, we employed a statistic included in the grf package in R which generates a weighted average of importance for each variable [15, 16].

As there is not a consensus approach for translating causal forests to decision trees, we employed an approach that we have previously used to generate a summary decision tree from causal forests [19]. Specifically, we used the variable importance measure to identify the top variables contributing to HTE based on causal forests, which we then use to build a summary decision tree with the caveat that this approach will not always identify significant HTE subgroups in the data [15]. We found that a stable summary causal tree with eight subgroups based on five of the top eight variables resulted when we included 8, 9, or 10 of the most important variables based on causal forests analysis of pooled study data. The summary causal tree once again required at least 5% of the total sample in every terminal node, and honest cross-validation for tuning the shrinkage parameter [15]. To quantify HTE in the summary causal tree, we calculated the absolute risk difference in MACE (primary outcome) and all-cause mortality (secondary outcome) between the intensive and standard glycemic control arms within each terminal node subgroup of the summary causal tree using pooled data from both trials and in the ACCORD and VADT samples separately. We plotted cumulative incidence of MACE in the intensive and standard glycemic control arms using pooled data from both the ACCORD and VADT studies in each of the subgroups of the summary causal tree and compared the incidence curves using a log-rank test.

All analyses were conducted in R (version 3.5.3, R Foundation for Statistical Computing, Vienna, Austria). Statistical code is available upon request.

Results

In comparison to the VADT study population, the ACCORD study group had a larger proportion of women, but a smaller proportion of participants with Hispanic ethnicity. ACCORD study participants had a lower HbA1c and were less likely to use insulin. The VADT study population included more participants with a history of angina, prior history of MI and congestive heart failure, and prior coronary artery revascularization. Additional similarities and differences in baseline patient characteristics can be found in Table 1.

While several variables were highly ranked both when causal forests were applied to the ACCORD study and to the VADT study separately, variable importance ranks were only moderately correlated (Kendall’s tau-b of 0.632; Additional file 1: Fig. S1). Next, we repeated the causal forests analysis using pooled data from both studies and including an indicator variable for study (ACCORD or VADT). Out of 47 variables evaluated, the ten most highly prioritized variables (HGI, fasting glucose, diabetes duration, total cholesterol, high-density lipoprotein cholesterol, eGFR, BMI, age, low-density lipoprotein cholesterol, and HbA1c) after applying causal forests to the pooled study data are shown in Additional file 1: Table S2. Of these top variables, most were also among the most highly prioritized variables when performing the same analysis on each individual study (Additional file 1: Table S2). Notably, the study indicator variable had an importance score of 0.00.

We next generated a summary causal tree that defined specific HTE subgroups. The summary causal tree was stable when including 8–10 of the most highly prioritized variables from the causal forest analysis of the pooled ACCORD and VADT study data, utilizing only five variables to divide the pooled sample into eight subgroups (Fig. 1). In subgroups 1–4, comprising 45% of the pooled sample, intensive glycemic control was associated with lower MACE (risk difference of − 4.3% [95% CI: − 7.7, − 1.0], − 5.1% [95% CI: − 8.7, − 1.5], − 4.5% [95% CI: − 8.1, − 1.0], and − 4.2% [95% CI: − 6.9, − 1.4], respectively; Fig. 1), and lower cumulative incidence of MACE over the follow-up time (Fig. 2). Subgroup 4 comprising 10% of the pooled sample also demonstrated consistent direction of effect and 95% confidence intervals excluding the null with intensive glycemic control associated with lower incidence of MACE in both the ACCORD and VADT studies (risk difference − 3.6% [95% CI: − 6.5, − 0.6] in ACCORD and − 7.6% [95% CI: − 14.9, − 0.3] in VADT). In two subgroups (subgroups 6 and 7) intensive glycemic control was associated with higher MACE in the pooled sample (risk difference of 3.1% [95% CI: 0.2, 6.0] and 3.1% [95% CI: 0.3, 5.9], respectively; Fig. 1), and with higher cumulative incidence of MACE over the follow-up time (Fig. 2). Neither subgroup 6 nor 7 exhibited consistent and significant associations of intensive glycemic control with higher MACE in the ACCORD and VADT study samples separately. The direction of effect of intensive glycemic control on MACE in Subgroup 7 was consistent in both study samples but with 95% confidence intervals including the null in VADT (3.2% [95% CI: 0.3, 6.1] in ACCORD, and 2.6% [95% CI: − 6.9, 12.2] in VADT; Fig. 1).

Fig. 1
figure 1

Summary causal tree of heterogeneous treatment effects of intensive glycemic control on all-cause mortality. Splitting variables and cut-points for each split are shown, resulting in eight terminal subgroups (N (%) represent number and proportion of participants in pooled ACCORD + VADT sample in each subgroup). Units for splitting variables are mL/min for eGFR (estimated glomerular filtration rate), mg/dL for serum glucose, kg/m2 for BMI (body-mass index), and years for age. Risk difference of MACE and 95% confidence intervals in each subgroup in the pooled sample and in each study alone shown below diagram

Fig. 2
figure 2

Cumulative incidence curves for major adverse cardiovascular events within each subgroup identified by causal forests applied to pooled data from both the ACCORD and VADT studies. Cumulative incidence is represented by the lines with 95% confidence intervals indicated by the shaded regions with standard glycemic control arm in pink and intensive glycemic control arm in green for each subgroup

As HTE can be a function of absolute event rates [20], we examined whether subgroup-specific effects of intensive glycemic control on MACE correlated with subgroup-specific MACE rates. Subgroup 4, in which intensive glycemic control was associated with lower MACE in pooled analysis of the ACCORD and VADT studies and when each study was examined separately, had the lowest MACE rate of the eight subgroups identified by causal forests (Fig. 3). We did not observe a discernible pattern in HTE in relation to increasing MACE rates across subgroups. In fact, intensive glycemic control was associated with lower MACE in both the subgroup with the lowest event rate (Subgroup 4) and the subgroup with the highest event rate (Subgroup 1) (Fig. 3; Additional file 1: Fig S2).

Fig. 3
figure 3

Comparison of subgroup effects based on subgroup-specific rates of major adverse cardiovascular events (MACE). Subgroups were ordered from left to right by event rates in pooled data including both ACCORD and VADT studies. A Event rates in each subgroup across both treatment arms (“All”, purple), among those randomized to standard glycemic control (“Standard”, blue), and among those randomized to intensive glycemic control (“Intensive”, green). Dotted lines show the event rates of MACE in the full sample (purple), in those randomized to standard glycemic control (blue), and in those randomized to intensive glycemic control (green). B Risk differences of MACE associated with standard versus intensive glycemic control, stratified by study and subgroup with subgroups ordered from left to right by increasing MACE event rates in the full sample including pooled data from both studies. Positive risk differences reflect higher MACE and negative risk differences reflect lower MACE in intensive glycemic control compared to standard glycemic control. Blue and green dotted lines represent the average treatment effect of intensive versus standard glycemic control on MACE in the ACCORD and VADT studies, respectively

To determine if any beneficial effects of intensive glycemic control on MACE were balanced by detrimental effects on mortality, we examined all-cause mortality associated with intensive glycemic control in the 8 subgroups identified in the summary causal tree for HTE on MACE. In subgroup 4—in which intensive glycemic control was associated with lower MACE in pooled data analysis and in each trial separately—intensive glycemic control was not associated with all-cause mortality (risk difference of − 0.8% [95% CI: − 2.8, 1.2] in pooled sample, − 1.0% [95% CI: − 3.2, 1.2] in ACCORD, and 0.5% [95% CI: − 4.7, 5.7] in VADT; (Additional file 1: Table S3). Intensive glycemic control, however, was associated with higher all-cause mortality in subgroup 8 in analysis of pooled data from both trials and in ACCORD study data alone (Additional file 1: Table S3), confirming the identification of HGI in prior work as a determinant of HTE of intensive glycemic control on all-cause mortality [13]. None of the other subgroups exhibited significant associations of glycemic control intensity with all-cause mortality (Additional file 1: Table S3).

Discussion

In this secondary analysis of the ACCORD and VADT trials, we found heterogeneous treatment effects of intensive glycemic control on MACE. The most influential variables for identifying HTE are factors known to be associated with either cardiovascular disease or diabetes-related outcomes. A summary causal tree using the top variables from causal forests applied to pooled data from both trials defined eight HTE subgroups. Three subgroups (34% of the combined ACCORD and VADT sample) had consistent associations of intensive glycemic control with cardiovascular benefit in pooled data and in ACCORD and VADT separately, and two subgroups (34% of the combined ACCORD and VADT sample) demonstrated worse cardiovascular outcomes associated with intensive glycemic control in pooled data and in ACCORD and VADT separately.

We did not observe a consistent pattern of cardiovascular benefit or harm of intensive glycemic control in relation to cardiovascular risk in the subgroups. One subgroup (Subgroup 4) demonstrated lower MACE associated with intensive glycemic control in the pooled sample, the ACCORD trial, and the VADT trial samples. Aside from a BMI ≥ 28 kg/m2, this was a relatively healthy subgroup of trial participants: age < 61 years, glucose ≤ 228 mg/dL, eGFR ≥ 69 mL/min/1.73m2, and low HGI bounded between − 0.3 and 0.84. Consistent with these clinical characteristics, this subgroup had the lowest rate of MACE in the pooled sample of the ACCORD and VADT studies. Moreover, intensive glycemic control was not associated with all-cause mortality in this subgroup across both trials, providing reassurance that cardiovascular disease risk reduction was not offset by higher non-cardiovascular mortality. The identification of this subgroup lends supportive evidence to current treatment guidelines, which suggest that intensive glycemic control targets may be considered in diabetes patients who are younger with few medical comorbidities [3, 4].

On the other hand, our analysis also identified lower MACE associated with intensive glycemic control in Subgroup 1, the subgroup with the highest risk of cardiovascular events. We observed directional consistency of the HTE in Subgroup 1 in the ACCORD and VADT study samples separately, though with wide confidence intervals that include the null in VADT. Similarly, Subgroups 6, 7, and 8, which all exhibited higher MACE associated with intensive glycemic control in the pooled analysis of VADT and ACCORD, were at low-, intermediate-, and high-risk of MACE, respectively, in the pooled study sample. While HTE are often correlated with underlying risk of the outcome—forming the basis for risk-based treatment recommendations in many clinical contexts [21, 22], the approach used here identified HTE of intensive glycemic control that appear independent of MACE risk. Thus, machine learning approaches may provide complementary information to cardiovascular risk estimation to guide diabetes treatment individualization across the spectrum of cardiovascular disease risk.

Our finding that HGI was the most highly ranked variable in the analysis of pooled data from the ACCORD and VADT trials extends prior work describing associations of glycemic variability with cardiovascular risk in diabetes patients. As mentioned above, HGI is the difference between measured HbA1c and the HbA1c that would be predicted on the basis of a concomitant fasting plasma glucose measurement. A high HGI, therefore, would indicate that an individual’s HbA1c is higher than would be predicted from fasting plasma glucose, potentially implying high glucose variability not reflected in the fasting glucose measurement. The identification of high HGI as an important determinant of adverse cardiovascular effects of intensive glycemic control (Subgroup 8) supports prior work that has found associations of glucose and HbA1c variability with microvascular and macrovascular complications and hypoglycemia in diabetes patients, including in the VADT and ACCORD trials [10, 23,24,25,26,27,28,29,30,31,32]. While calculating HGI in routine care may be impractical as it is derived by regressing HbA1c on glucose in a population, our findings and the prior work on glycemic variability suggest that discordance between glycemia measured by HbA1c and fasting glucose may be a useful adverse prognostic indicator. When assessing risk of microvascular complications, initial work from McCarter et al. concluded that HGI was an independent predictor of risk [33]. A follow up study by Lachlin et al. however, argued that HGI was highly correlated with the HbA1c level, and that it is not an independent predictor of the risk of microvascular complications [34]. Their conclusion was that the effect of the glycation index on risk can be explained by the associated level of HbA1c. While others have explored comparative risk of outcomes related to HbA1C and HGI, the prioritization of HGI more highly than HbA1c when including both variables in our analysis may suggest that HGI captures treatment-related risk in ways that are not redundant with HbA1c.

Although we found one subgroup, Subgroup 4, in which intensive glycemic control was associated with fewer cardiovascular events, we would not interpret this result as advocating for treating similar real-world patients to an HbA1c target < 6%. That a similar benefit was observed in the VADT study and in the ACCORD study—which had very different HbA1c targets—suggests that a more intensive glycemic control strategy may be beneficial for certain patients without targeting specific HbA1c thresholds. Given mounting evidence of the efficacy of glucagon-like peptide-1 receptor agonists and sodium glucose co-transporter-2 inhibitors for improving cardiovascular outcomes in patients with diabetes independent of effects on glycemia [35,36,37,38,39,40], optimal cardiovascular disease prevention through diabetes treatment may depend on both individualized glycemic control goals and medication choice. Future work examining HTE of the new cardioprotective classes of diabetes medications may identify evidence-based strategies for tailoring diabetes treatment based on more than just underlying cardiovascular risk.

Identifying subpopulations of type 2 diabetes patients has been of great interest over the last several years with analyses focused on understanding predictors of disease progression and treatment response. Our analysis complements prior hypothesis-free, data-driven analyses [41, 42] and hypothesis-based analysis using clinical features [43] to define diabetes patient subgroups with differential response to treatment. Prior secondary analyses of the ACCORD study suggest that features of on-trial HbA1c may be associated with trial outcomes [42, 44]– an approach that contrasts with our study focused on baseline characteristics in which HbA1c was not used in the summary decision tree. In addition to highlighting the urgent need to understand how diabetes patient heterogeneity might inform better tailoring of treatment, our study draws attention to important data and methodological gaps in advancing diabetes precision medicine. First, causal forests applied to each of the trials separately yielded only modest correlation in variable importance rankings, highlighting the value of pooling individual-level data from multiple studies when examining HTE using machine-learning and the potential sensitivity of these methods to between-study heterogeneity in study populations and intervention design [19]. Differences in the trial designs of ACCORD (factorial) [10] and VADT (parallel treatment RCT) [11] may also have contributed to differences in results when analyzing the trials separately. To make the most of trial data, methods that can flexibly accommodate differences in trial design are needed. Second, there is a lack of consensus on how best to translate causal forests results to interpretable, mutually exclusive subgroups of real-world patients. In this study we used the most highly prioritized variables from causal forests to generate a summary decision tree, an approach that can improve interpretability of the causal forests output but can also lead to overfitting. Given the substantial heterogeneity of type 2 diabetes patients, methodological advances in translating machine learning methods to clinical decision-making may prove impactful to individualized diabetes care.

There are several limitations to acknowledge in this study. First, the interpretation of results should remain limited to the populations represented in the ACCORD and VADT studies. Though pooling data from the two studies broadens the general population representation in our analysis, the resulting pooled sample is derived from select randomized trial samples and does not necessarily represent the real-world diabetes patient population. Additional evaluation is needed to assess the performance of the HTE subgroups in the summary causal tree in diabetes patients drawn from the general population and the VA health system. Second, the summary causal tree presented here is difficult to interpret from the perspective of relating variable cut-points to physiology and clinical outcomes and would be difficult to institute into clinical practice in its current form. Third, the VADT study was considerably smaller than the ACCORD study, limiting statistical power for subgroup analyses in the VADT study—reflected by wide confidence intervals for the treatment effects—and granting greater weight to the ACCORD study in pooled analyses. Finally, we evaluated only one machine-learning algorithm for HTE detection—causal forests—in our analysis. Assessing whether similar variables are prioritized for HTE identification across disparate algorithms exceeds the scope of this manuscript but may be a valuable next step in translating results of machine learning subgroup analyses to clinical care.

Conclusions

In sum, using data from two randomized trials of intensive glycemic control in type 2 diabetes patients, we found subgroups defined by combinations of HGI, eGFR, serum glucose, BMI, and age that exhibited different associations of intensive glycemic control with major adverse cardiovascular events. This hypothesis-free, data driven approach identified a subset of patients consisting of younger trial participants with overweight/obesity, low HGI, preserved renal function, and lower serum glucose levels that may benefit from intensive glycemic control to lower MACE, consistent with contemporary guidelines for the care of diabetes patients. We also highlight that potential benefit of intensive glycemic control to lower MACE was not clearly correlated with underlying risk of MACE in subgroups, suggesting that clinical decision-making for diabetes treatment intensity based primarily on cardiovascular risk estimation may miss patient subgroups at high cardiovascular risk who might benefit from intensive glycemic control.