A total of 205 publications (100 full-length articles and 105 abstracts) were identified in EMBASE, and two duplicates were removed. Two full-length articles and one conference abstract that were not identified in this search but were known by the authors to meet eligibility criteria for the analysis were included [25,26,27]; one article was published in a journal that is not indexed in EMBASE, and one article was published after the initial search, while the conference abstract was not captured in the search for unknown reasons. A total of 206 full-length manuscripts and conference abstracts identified in the searches were manually reviewed for relevance based on abstracts and titles. Of these, 175 manuscripts and abstracts were excluded as they did not report T2DM modeling applications that included cardiovascular outcomes data. Full texts of the remaining eight articles and 23 abstracts were further reviewed to determine relevance; three articles and seven abstracts were deemed not to meet the study eligibility criteria. This yielded a final set of five full-length articles and 16 conference abstracts, of which 11 had full-length posters available (Fig. 1). An overview of the included studies is provided in Table 1, and a complete summary of extracted data is available in the Supplementary Excel workbook (with separate tabs for full-length manuscripts and for conference abstracts and conference proceedings).
The full-length articles were relatively well documented, providing confidence that we understood the methods used to model cardioprotection. The assessment of study relevance and credibility using the ISPOR–AMCP–NPC questionnaire found that the full-length publications were generally deemed credible (see Excel workbook in the Supplementary Material). Naturally, there was less information content in the posters and especially the abstracts, and as such, we have less confidence that we fully understood the methodologies.
The full-length articles used diverse approaches to model the impact of cardioprotection. Iannazzo et al.  and Gourzoulidis et al.  used a discrete-event simulation (DES) model driven by risk prediction equations for nonfatal myocardial infarction, nonfatal stroke, unstable angina, heart failure, transient ischemic attack, coronary revascularization, cardiovascular death, macroalbuminuria, acute kidney injury, and end-stage kidney disease (ESKD) estimated from patient-level EMPA-REG OUTCOME trial data. The cost-effectiveness of empagliflozin and SoC versus SoC alone was estimated for adult patients with T2DM and high cardiovascular risk in Italy and Greece, respectively. No additional benefit was assigned to empagliflozin on the basis of a better biomarker profile. The simulation model was used to extrapolate the 3-year EMPA-REG OUTCOME study (including treatment patterns and event risks) to lifetime as well as to the characteristics of the Italian target population (as opposed to the trial population). We categorized these studies as “2. AHA-specific HRs only” because treatment effects are expressed through HRs for the treatment assignment covariate only, and differences in surrogate biomarkers are not used.
Iannazzo et al.  note that the target Italian patient population includes many sicker patients who would have been excluded from EMPA-REG OUTCOME, and that extrapolation of trial findings to a lifetime horizon introduces uncertainty. Gourzoulidis et al.  point out that treatment sequences and pharmacy cost offsets were not captured in the model, and that “the model does not capture rare, but severe, T2DM-related complications, such as blindness and amputation, nor the more frequent, but less costly, urinary or genital tract infections.” To these, we would add additional concerns, including (1) the model design extends the treatment effects indefinitely without adjustment for treatment durability, (2) the need for treatment intensification is not considered, (3) microvascular endpoints typically included in health-economic models are not captured (e.g., retinopathy and neuropathy), (4) some well-established cardiovascular risk factors (e.g., blood pressure, lipids) are absent, and (5) lack of external validation increases decision-making uncertainty. It should also be pointed out that the application was limited to a within-trial comparison of empagliflozin plus SoC versus SoC, so it does not consider relevant glycemic-lowering comparators.
Arbel et al.  performed a cost-effectiveness analysis (author-described as cost minimization analysis) of empagliflozin versus liraglutide to inform the choice of optimal strategy for reducing cardiovascular deaths in patients with T2DM and established atherosclerotic disease in the USA using the results of EMPA-REG OUTCOME and LEADER. The cost per one cardiovascular death averted was obtained by dividing the total drug acquisition costs (estimated if the placebo arms in the CVOTs had received the empagliflozin or liraglutide, respectively) by the absolute number of cardiovascular deaths averted (adjusted for a 2:1 randomization for EMPA-REG OUTCOME). Results were presented separately for the EMPA-REG OUTCOME and LEADER patient populations, for both study drugs. We categorized this study in “2. AHA-specific HRs only” because treatment effects on cardiovascular death are included directly through observed events avoided in the studies, and biomarkers were not considered.
Arbel et al.  acknowledged the absence of head-to-head trial evidence and mismatch between the EMPA-REG and LEADER populations, as well as that the focus on just the cardiovascular death endpoint ignores other relevant MACE outcomes. We would additionally highlight that (1) the absence of even non-MACE outcomes ignores many potential clinical and economic impacts, (2) limitation to drug acquisition costs overlooks a large number of economic impacts (e.g., cost offsets) of interest to decision-makers, (3) the absence of both explicit time in the model and extrapolating beyond absolute number of cardiovascular events avoided during the trial (effectively ignoring differences in population characteristics, competing risks, and treatment intensification) limits generalizability, and (4) lack of external validation increases decision-making uncertainty.
Kamstra et al.  model the direct medical costs associated with reductions in cardiovascular events in patients with T2DM reported in the CANVAS Program and EMPA-REG OUTCOME trials from a US managed care organization perspective. Applying study inclusion and exclusion criteria to National Health and Nutrition Examination Survey (NHANES) survey data, the authors estimated that 50.5% and 15.4% of a typical managed care organization’s T2DM patient population would be eligible for the CANVAS Program (primary and secondary cardiovascular disease prevention) and EMPA-REG OUTCOME (secondary cardiovascular disease prevention), respectively. Cost savings associated with the numbers of averted MACE (including cardiovascular death, nonfatal myocardial infarction, and nonfatal stroke) and hospitalization for heart failure events were calculated by multiplying the differences in event rates per patient-years by the unit costs for each event. Scaling up to the eligible proportions of the managed care organization population, results were presented as cost savings per member per month. This study includes cardioprotection through direct inputs of event rate differences, and biomarkers were not considered, so we categorized it as “2. AHA-specific HRs only.”
Kamstra et al.  note that their analysis was restricted to the first events of each type only, as second and later events were not reported in the main clinical publications. They also point out differences between the CANVAS Program and EMPA-REG OUTCOME in terms of methods and measurement of outcomes. We would additionally point out that (1) the 1-year time horizon may be considered too short to inform some decisions, (2) the model was not designed to provide information on other cost components (such as treatment costs, costs of microvascular endpoints, and adverse events), and (3) lack of external validation increases decision-making uncertainty.
Nguyen et al.  used a Markov model to estimate the cost-effectiveness of empagliflozin versus standard treatment for the prevention of cardiovascular morbidity and mortality in patients with T2DM and high cardiovascular risk from a US payer perspective. They constructed a lifetime Markov model with constant 3-month cycle length and 10 health states. Event rates for myocardial infarction (including fatal, nonfatal, and silent myocardial infarction), heart failure (including heart failure severity and hospitalization for heart failure, as well as death from heart failure), transient ischemic attack, stroke (including fatal and nonfatal events classed as major, minor, or reversible ischemic neurological deficit), severe hypoglycemia, ESKD (including death from ESKD), and all-cause mortality were obtained for the empagliflozin and placebo study arms from EMPA-REG OUTCOME and assumed constant. Additionally, constant event rates for unstable angina hospitalization, vascular disease, and death from vascular disease taken from the literature were assumed to be common to both study arms. Unit costs and QALY inputs were sourced from the literature. Because biomarkers were not included and cardioprotection was captured by direct inputs on event rates, we categorized this study as “2. AHA-specific HRs only.”
Nguyen et al.  note that their analysis was limited by the lack of reporting on heart failure in the EMPA-REG OUTCOME study (initiation of loop diuretics was used as a proxy for heart failure), that extrapolation from the median 3.1 years of observation in EMPA-REG OUTCOME to lifetime is problematic, and that the analysis could be improved by inclusion of further microvascular endpoints. The authors chose not to directly compare empagliflozin to other AHAs because of differences in methodologies, populations, and study designs in the clinical trials of these agents. To these limitations we would add that, for an assessment of cost-effectiveness over a lifetime horizon, treatment intensification over time should have been considered, and that lack of external validation increases decision-making uncertainty.
Most of the conference abstracts compared empagliflozin plus SoC to SoC alone [27, 31,32,33,34,35,36,37,38,39,40,41,42], and the same EMPA-REG OUTCOME DES cost-effectiveness model reviewed earlier was often used. One study  employed an EMPA-REG OUTCOME budget impact model, with largely similar characteristics to the cost-effectiveness model. This same budget impact model was probably used in two other abstracts [31, 35]. Wilson et al.  estimated the cost offsets to a health plan’s budget for patients with T2DM and established cardiovascular disease using empagliflozin plus SoC versus SoC alone and, while details of its methods remained opaque, cardioprotection was captured by direct inputs of annualized event rates obtained from the EMPA-REG OUTCOME study. Kragh et al.  estimated the effects on clinical and economic outcomes in Canada when adding liraglutide or placebo to SoC over a range of up to 25 years. They used a state-transition model and employed multivariate causal relationships to capture event risks and treatment effects observed in the LEADER trial on the endpoints myocardial infarction, stroke, hospitalization for heart failure, ischemic heart disease, retinopathy, nephropathy, and severe hypoglycemia. We were unable to evaluate details of their model, but given that biomarkers were not included, this model captures cardioprotection through direct HRs only. Because cardioprotection is captured in all of these studies without the use of treatment effects on surrogate biomarkers, we categorized these abstracts and posters as “2. AHA-specific HRs only.”
Two conference abstracts used economic models commonly seen in economic analyses of T2DM: Willis et al.  used ECHO-T2DM to evaluate results from the CANVAS Program on canagliflozin, and Evans et al.  used the IQVIA Core Diabetes Model to evaluate results from the SUSTAIN-6 study on semaglutide. Both studies aimed to evaluate the extent to which these models could capture cardioprotection through treatment effects on surrogate biomarkers (e.g., HbA1c, systolic blood pressure, and lipids) and whether the full magnitudes of cardioprotective effects could be modeled by using HRs to include the remaining effect not mediated through these traditional biomarkers. We classified these studies as “3. AHA-specific HRs plus treatment effects mediated through known risk factors.”
Table 2 provides an overview of advantages and disadvantages for the possible categories to capture cardioprotection in economic modeling. The qualifying studies fell into just two of these categories. All five of the manuscripts and 15 of the 17 conference abstracts and posters incorporated cardioprotection with treatment effects constrained to HRs from a CVOT (in some cases, equivalently specified using study arm-specific event rates) [25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44]. Many of these applications appear to use the same model. The remaining two posters incorporated cardioprotection using HRs from a CVOT plus treatment effects mediated through known risk factors [19, 20].
While not picked up by design in the literature search, limiting cardioprotection to treatment effects mediated by known biomarkers only (category 1) was the status quo until at least the publication of EMPA-REG OUTCOME results . There is broad experience with this approach in the modeling community, as well as with health care decision-makers. Moreover, a wealth of evidence on head-to-head differences in biomarker changes to inform economic analysis exists. This approach has been shown to capture only part of the cardiovascular benefit [17, 19, 20] using well-established cardiovascular risk prediction equations such as those from UKPDS. When this approach is used to perform economic assessments comparing agents with and without demonstrated cardiovascular benefits, results will be biased.
The most common empirical approach to modeling cardioprotection is with treatment effects constrained to HRs from a CVOT (category 2), with empirical examples identified in the literature review ranging in scope from limited (e.g., cost per cardiovascular death averted ) to broad (e.g., cost-effectiveness modeling including many endpoints over a lifetime horizon [25, 28]). This approach is commonly used in other disease areas (e.g., cancer) and is conceptually simple. Moreover, using HRs obtained from CVOTs directly (without resorting to indirect differences in surrogate biomarkers run through external risk prediction equations) can match trial results (internal validity).
While relatively easy to communicate, using CVOT results directly has several major limitations: an incomplete set of endpoints (especially microvascular complications), between-arm differences in background SoC associated with glycemic equipoise, and restriction to comparisons with placebo. Unlike indirect comparison of biomarkers, the opportunities to populate indirect comparisons for HRs are limited by the small number of CVOTs and substantial heterogeneity. In particular, CVOTs differ by study population (e.g., primary vs. secondary cardiovascular prevention), set of endpoints included and study definitions, and study follow-up. Comparing HRs across trials without adjustment can be misleading. For example, EMPA-REG OUTCOME recruited patients with T2DM and established cardiovascular disease, whereas the CANVAS Program recruited patients with T2DM not only with established cardiovascular disease (secondary cardiovascular prevention) but also high cardiovascular risk (primary cardiovascular prevention). Comparing HRs directly (like Kansal et al. ) thus risks an apples-to-oranges comparison. While patient-level data could be used to make a balanced comparison, simultaneous access to CVOTs of competing agents is extremely limited. With access to patient-level data from one study, however, matching-adjusted indirect comparison can improve the comparison by matching to the other study’s summary patient population characteristics (for methodological details, see the description by NICE in the UK ). There are, as of yet, no published matching-adjusted indirect comparisons in this space. In addition, extrapolation of CVOT results to long (lifetime) horizons can be hampered by concerns of the durability of treatment and assumptions regarding cardiovascular risk post CVOT duration (including treatment intensification).
A less common empirical approach [19, 20] involves modeling cardioprotection using both HRs and changes in known cardiovascular risk factors simultaneously (category 3), potentially combining the advantages of categories 1 and 2. Like category 1, including differences in cardiovascular risk factors potentially enables more accurate extrapolation of biomarker-mediated cardioprotection over long time horizons, particularly when treatment intensification is important and when comparisons are performed versus active comparators (which may have different biomarker profiles). Like category 2, cardioprotection can be replicated accurately. This approach can be implemented in many existing economic models with minimal modification.
Naively using the HRs together with effects on biomarkers can lead to a double-counting of benefits, however. Using HRs that have been adjusted to remove cardiovascular benefits mediated through improvements in other cardiovascular risk factors can be generated in auxiliary modeling simulations [19, 20]. These adjustments naturally depend on the risk prediction equations in the economic model, and ideally one would use regression techniques on the patient-level data to obtain the adjusted HRs. Moreover, indirect comparison is required both for biomarker differences (category 1) and HRs (category 2) to support non-placebo comparisons. Additionally, performing an indirect comparison for these adjusted HRs involves even more complexity (and sources of uncertainty) than indirect comparison for HRs (see above).
While the literature review did not identify any category 4 studies, the benefits of estimating new risk prediction equations using CVOT or other recent data including AHAs with cardioprotection are obvious. Like category 1, risk prediction equations that capture cardioprotection via cardiovascular risk factors (both known and potentially novel) would improve the modeling of diabetes, its treatment and complications, and facilitate head-to-head comparisons using registration trial data. Using CVOT data poses some important challenges to the estimation of such risk prediction equations, including follow-up limited to years rather than decades (like the UKPDS), focus on primary composite outcomes (potentially too few events for individual complications and lack of power), and potential confounding related to treatment intensification (glycemic equipoise). Economic modeling of mediated cardioprotection requires that covariates included in the risk prediction equations have a causal interpretation, so care must be taken with covariate selection (especially when novel risk factors are considered).
The literature review did not identify any studies that used a hybrid approach (category 5). Hybrid approaches can combine the advantages of other approaches in novel ways, potentially avoiding some of the disadvantages in the process. An example not including cardioprotection is the UKPDS approach to estimating the cost-effectiveness of intensive versus conventional diabetes management , in which a trial-based analysis was supplemented with a model-based analysis of post-trial benefits and costs to approximate a lifetime horizon.
In summary, the modeling efforts to date have largely been limited to direct application of HRs, which have the potential to double count benefits when biomarker improvements are also simulated. The two applications that considered both HRs and biomarkers used a simple calibration approach to mitigate double-counting. Future work leveraging individual level data is needed to assess the validity of this approach. In addition, per good modeling guidance, models should be validated. Currently, validating the predictive accuracy of models in this area (like other disease areas where a sufficient number of events for analysis only occurs over relatively long time periods) is challenging owing to novelty of agents offering direct cardioprotection.
Taking into consideration the emerging evidence from renal outcomes trials , similar modeling considerations for including both glucose-lowering effects and potential mechanistic pathways of SGLT2 inhibition for direct effects may become applicable for the modeling of renoprotection.
Some limitations of this analysis should be acknowledged. First, the scope of this study considered only how cardioprotection has been simulated using diabetes models; other model features (e.g., absolute risks and disease scope) were excluded from analysis. Second, the categories we created may have been too broad; in particular, there was considerable diversity in the methods used in studies classified in category 2. Third, given the relatively short time period in which cardioprotection modeling in T2DM has had to evolve, much of the literature thus far is limited to harder-to-find conference proceedings, so there may be more types of methods in use than we were able to identify. Similarly, our search of just the EMBASE database may not have captured studies that may have offered additional methodologies.