FormalPara Key Points for Decision Makers

Sodium–glucose cotransporter 2 inhibitors (SGLT2i) have been recommended in diabetes guidelines and shown to be cost effective. However, existing model-based cost-effectiveness analyses of SGLT2i have limitations regarding their representativeness, the integration of all treatment effects and the lack of real-world evidence.

Dutch reimbursement criteria of SGLT2i represent 16% of individuals with diabetes. Their characteristics significantly differ from SGLT2i trial populations. for individuals who qualified for reimbursement, SGLT2i were cost effective at €5440/QALY compared with care-as-usual. Several pragmatic scenarios were tested; the cost effectiveness estimation remained favorable. the MICADO model well captured the benefit of SGLT2i, with less than 25% mean absolute percentage error in incidence prediction of complications over the trial’s follow-up period.

The MICADO model was a useful tool to simulate the disease progression of individuals with diabetes, keep track of costs and quality-adjusted life years, and support decision-making. Although the reimbursement criteria will result in a different target group compared with trials, SGLT2i can be considered cost effective in a routine care population.

1 Introduction

Global health expenditures due to diabetes mellitus have increased from $232 billion in 2007 to $966 billion in 2021 for adults aged 20–79 years, representing a 316% increase over 15 years and accounting for 11.5% of total global health spending in 2021 [1]. Cardiovascular complications, accounting for nearly half of the total mortality rate in type 2 diabetes (T2D), are the main drivers of the diabetes-related economic burden [2], and around 30% of diabetes expenditure are related to treatments [3, 4]. Therefore, treatments reducing cardiovascular complication rates have great potential to achieve better cost-effectiveness than current treatment by reducing health-care expenditures and increasing quality-adjusted life years (QALYs).

Sodium-glucose cotransporter-2 inhibitors (SGLT2i) are a new class of oral antidiabetic medication that, among other mechanisms not yet fully known, inhibit glucose reabsorption in the kidneys, leading to increased glucose excretion in the urine [5]. SGLT2i, including empagliflozin, dapagliflozin, and canagliflozin, have been studied in several randomized clinical trials (RCTs), such as the Cardiovascular Outcome Event Trial in patients with Type 2 Diabetes Mellitus (EMPA-REG) for empagliflozin [6], the CANagliflozin cardioVascular Assessment Study (CANVAS) for canagliflozin [7], and the Multicenter Trial to Evaluate the Effect of Dapagliflozin on the Incidence of Cardiovascular Events (DECLARE-TIMI58) for dapagliflozin [8]. Based on these RCTs, SGLT2i are proven effective in terms of reducing risks for cardiovascular disease, including heart failure or atherosclerotic cardiovascular disease (ASCVD), in addition to their effect as oral antidiabetic agents [9, 10]. The cost and effectiveness of the three SGLT2i were comparable [11].

SGLT2i are now widely approved as antihyperglycemic therapies for individuals with T2D [12] and are recommended for individuals with T2D and established ASCVD in American [13] and European guidelines [14] in almost all cases, except when the cost is a major concern [13]. Compared to conventional glucose-lowering treatment such as metformin and sulfonylurea, SGLT2i are expensive. They cost on average €256.2 per patient per year (more than seven times as much as metformin or sulfonylurea) in 2021 in The Netherlands [15]. Nevertheless, they have been consistently shown to be cost effective in several healthcare systems (e.g., USA, UK, and Greece, etc.) [16].

However, three main challenges limit the generalizability of the current cost-effectiveness analyses. First, the inclusion criteria of the trial-recruited study population limit their generalizability to other populations. Although using trial-based evidence on treatment effects to support new reimbursement requests is conventional, trial-based economic evaluation analysis might not represent the cost-effectiveness of a different target population [17]. This general problem with trials holds for the SGLT2i studies because their study populations were quite specific [17]. For example, EMPA-REG included individuals with established ASCVD [6], which represents only 21% of the European T2D population [18].

Second, within diabetes models, cardiovascular risks are commonly estimated based on published risk equations (e.g., UKPDS risk engine [19] or Qrisk3 [20]), and treatment effects are conveyed through their impact on risk factors. However, the cardiovascular benefits of SGLT2i are beyond the modification of HbA1c and weight [14, 21]. Thus, the cardiovascular effects cannot be fully evaluated by treatment-induced risk-factor level changes in prediction models [22, 23]. Therefore, it is essential to investigate the impact on cost effectiveness in a model-based health economic evaluation where treatment effects are modeled by not only changes in risk factors, but also effects that are independent from the risk-factor changes.

Third, a recently published review showed discrepancies in effectiveness of SGLT2i in RCTs and real-world evidence (RWE). RCTs did not show preventive effects on major adverse cardiovascular events, while RWE did [24]. RCTs provide the best evidence for efficacy and causality, while RWE gives insights into effectiveness for target population. The question becomes how these discrepancies might affect the cost-effectiveness result.

Therefore, the current study explores the cost effectiveness of SGLT2i for a routine care population that reflects current Dutch reimbursement criteria (version 2022 [25]) and compares these individuals to populations that satisfy the selection criteria of the trials, considering RWE and hybrid treatment effects (i.e., both risk-factor level changes and effects that are independent from the risk-factor changes). To ensure the proper extrapolation, the health economic model, i.e., the Modeling Integrated Care for Diabetes based on Observational data (MICADO) model, will be validated over the trial follow-up periods.

2 Methods

We reported the economic evaluation following the Consolidated Health Economic Evaluation Reporting Standard 2022 [26], and reported the model input and characteristics based on the Mount Hood Diabetes Challenge Network’s Checklist [27] to ensure the transparency of this simulation (Supplementary Appendix 1).

The flowchart overview of the study is presented in Fig. 1. The MICADO model [28,29,30], simulating the Dutch population of individuals with T2D, was used to evaluate the lifetime cost effectiveness of different scenarios. Scenarios were informed by real-world data from a Dutch routine care cohort [Hoorn Diabetes Care System (DCS)] [31] regarding patient characteristics, while information on the effectiveness of the medication was taken from the SGLT2i trials [32,33,34] and the review of RWE [24, 35]. For the effect on risk factors levels, the trajectories of treatment-induced differences of treatment compared with placebo over the follow-up time of the trial were extracted. For the effect on the incidence of complications, hazard ratios of complication incidences were extracted, and corrected for double counting within the model to capture only treatment effects unrelated to risk-factor level changes.

Fig. 1
figure 1

The flowchart of the study. We filtered the DCS population by the selection criteria of the respective SGLT2i trials or Dutch reimbursement criteria (full selection criteria are listed in Supplementary Appendix 2), and compared the baseline characteristics of filtered cohorts. The MICADO model uses the baseline characteristics of filtered cohorts and the treatment effects observed and estimated in each trial (i.e., CANVAS, DECLARE, and EMPA) to simulate the incidence and progression of complications, risk factor progression (i.e., the progression of HbA1c and cholesterol etc.), QALYs, and costs. For the filtered cohorts fulfilling the reimbursement criteria, the weighted average of all trials was used for the effectiveness and compared with using RWE-based treatment effect estimates [24]. The model was then validated for the placebo and treatment groups of the trials by investigating the model’s ability to predict the cumulative incidence of complications and relative risks between treatment and control groups over the trial follow-up time, using each trial’s filtered cohort. Finally, the cost effectiveness for each filtered group was calculated from lifetime simulations (40 years) of QALYs and costs. In summary, this study compared the baseline characteristics of trial-filtered routine care individuals, validated the MICADO model, and evaluated the cost effectiveness of SGLT2i for different routine care filtered cohorts. ACR albumin: creatinine ratio, CANVAS CANagliflozin cardioVascular Assessment Study, CI confidence interval, CVD cardiovascular disease, DECLARE Multicenter Trial to Evaluate the Effect of Dapagliflozin on the Incidence of Cardiovascular Events, eGFR estimated glomerular filtration rate, EMPA Cardiovascular Outcome Event Trial in Type 2 Diabetes Mellitus Patient, DCS The Hoorn Diabetes Care System, HbA1c hemoglobin A1c, IHD ischemic heart disease, MI myocardial infarction, PAD peripheral arterial disease, PVD peripheral vascular disease, QALY quality-adjusted life year, RMSE root mean square error, SBP systolic blood pressure, SGLT2i sodium–glucose cotransporter-2 inhibitors

2.1 Study Population

The DCS is a dynamic prospective cohort study of individuals with T2D treated by 103 general practitioners (GPs) in the West Friesland region of The Netherlands [31]. The DCS cohort started in 1998 with currently over 15,000 individuals with T2D. Detailed laboratory measurements have been described in previous studies [31]. The study has been approved by the Ethical Review Committee of the Vrije Universiteit University Medical Center, Amsterdam. Written informed consent was obtained.

To analyze the generalizability of cost-effectiveness result and the impact of selection criteria, we filtered the DCS population (called DCS-Overall) by the selection criteria of the respective SGLT2i trials, including CANVAS [7], DECLARE-TIMI58 [8], EMPA-REG [6], and by the indication criteria of Dutch reimbursement criteria (version October 2022 [25]). This resulted in four DCS-based filtered cohorts, satisfying these selection criteria and called DCS-CANVAS, DCS-DECLARE, DCS-EMPA, and DCS-ZIN, respectively (selection process listed in Supplementary Appendix 2).

2.2 Statistical Analysis

We compared the baseline characteristics [e.g., age, HbA1c, body mass index (BMI), etc.] among the four filtered cohorts and the overall DCS population (DCS-CANVAS, DCS-EMPA, DCS-DECLARE, DCS-ZIN, and DCS-Overall) by chi-squared test and pairwise test, i.e., Games–Howell test with false discovery rate correction for multiple comparisons [36]. We also tested for differences in baseline characteristics between each trial-filtered DCS-based cohort and its corresponding trial cohort (based on risk factors’ mean and standard deviation in published evidence) by two-sample t tests. We omitted missing values because they were sufficiently low [37, 38] (1.41% on average and less than 5% observations for each variable were missing in DCS; see details in Supplementary Table 2.5).

2.3 The MICADO Diabetes Model

The MICADO diabetes model is a state-transition simulation model based on the multistate life table method using an annual cycle and based on the Dutch Chronic Disease model developed by the Dutch National Institute for Public Health and the Environment [28,29,30]. The MICADO model simulates microvascular complications (e.g., diabetic foot, nephropathy, and retinopathy) and macrovascular complications [e.g., acute myocardial infarction (AMI), cerebrovascular disease (CVA), and congestive heart failure (CHF)] in relation to their risk factors [e.g., category of age, sex, HbA1c, smoking status, BMI, systolic blood pressure (SBP), and total cholesterol]. The MICADO model has been validated both internally and externally [29, 39] and was cross validated in several Mount Hood Challenges [27, 40]. The current model is an update and revision of the 2010 version [29], which described the change of marginal distributions, whereas the current version of the model describes joint distributions.

2.4 Model Input and Outcomes

Key assumptions and inputs are presented in Table 1. Other input parameters including quality of life estimates and costs are listed in Supplementary Appendix 3. The baseline characteristics of each DCS-based cohort were entered into the MICADO model to predict future outcomes.

Table 1 Key assumptions

Outcomes are the lifetime incidence of complications (e.g., AMI, CVA, and CHF), costs, QALYs, and incremental cost-effectiveness ratio (ICERs). SGLT2i are considered cost effective if the ICER is less than the Dutch willingness-to-pay threshold of €20,000/QALY or €50,000/QALY when considering the burden of disease for diabetes using the proportional shortfall method [41, 42].

2.5 Model Validation

Model validation was performed by comparing MICADO predictions to observations of the cumulative incidences at the endpoint of each trial follow-up, that is, the “calibration-in-the-large” [43]. The differences between observed and predicted cumulative incidence were compared by two measures, including (1) absolute difference and (2) mean absolute percentage error (MAPE; the average of the error in percentage terms) [43]. MAPE is a relative measure, and values closer to zero indicate better accuracy [43]. We generated plots to visualize both predicted and observed cumulative incidence of events and relative risks of events in the treatment group as compared with the control group, and statistical validation was assessed by two methods, including (1) the prediction fits within the 95% confidence interval (CI) of observation [44] and (2) a non-significant test indicates good calibration, i.e., the deviation of the intercept and the slope of the calibration line (prediction against observation) from the ideal values of 0 and 1, respectively [45].

2.6 Scenario and Sensitivity Analyses

To account for different scenarios of diminishing treatment effects over time, for treatment effects estimated from both RCT and RWE, we evaluated four scenarios (see Supplementary Appendix 4), including (1) a base case scenario—assuming treatment effects diminish gradually over time (based on the estimated trajectory), (2) a worst-case scenario—assuming treatment effects diminish immediately after the trial period, (3) a best-case scenario—assuming treatment effects remain present relatively long (based on the longest estimated duration), and (4) a scenario ignoring all direct treatment effects on hazard ratios—that is, the treatment effect is only conveyed by risk-factor level changes and is assumed not to have an effect on disease incidences directly, reflecting the approach in the majority of published model-based evaluations [17].

A subgroup analysis for GP-practice was also considered; because not everyone who meets the reimbursement criteria will ultimately use SGLT2i, a subgroup of reimbursed population was defined based on current Dutch GP-practice to only prescribe SGLT2i to individuals with a remaining life expectancy of more than 5 years and an HbA1c larger than 7% or 53 mmol/mol [46].

Deterministic sensitivity analysis (DSA) was performed for RWE on the risk-factor level changes of HbA1c, cholesterol, SBP, and BMI, and on the cost of SGLT2i with ± 20% relative changes. Hazard ratios of cardiovascular events were varied using upper or lower bound value of their 95% CI. The results were summarized in a tornado diagram, ranking parameters from high to low according to their effect on the ICER.

Following publications calling for greater focus on short- and intermediate-term outcomes in economic evaluations [47], different time horizons from 1 year to 39 years were applied to investigate the impact of the time horizon.

All analyses were performed in R (version 4·1·0: https://www.r-project.org/) and R studio (version 1·4·1717: https://www.rstudio.com/).

3 Results

3.1 Baseline Characteristics

Table 2, Supplementary Table 3.2, and Supplementary Figs. 3.1–3.3 show the baseline characteristics of the DCS populations filtered for each trial. We found that 8.17%, 36.98%, 2.67%, and 15.79% of the individuals in the DCS cohort meet the selection criteria of CANVAS, DECLARE-TIMI58, EMPA-REG, and Dutch reimbursement, respectively. When comparing the three trial-filtered cohorts, we found significant differences in the baseline characteristics of all risk factors, except LDL cholesterol (Supplementary Fig. 3.1). Comparing the reimbursement criteria-filtered cohort (DCS-ZIN) with the trial-filtered cohorts (Supplementary Fig. 3.2), the DCS-ZIN cohort had significantly lower HbA1c and higher age than the other three trial-filtered cohorts.

Table 2 Baseline characteristics

Comparing each trial’s published baseline characteristics, we also found significant differences in the baseline characteristics between the filtered cohorts and original study populations (Supplementary Figs. 3.4–3.6). Filtered cohorts had a significantly higher baseline SBP and age but significantly lower HbA1c, estimated glomerular filtration rate (eGFR), and BMI compared to their corresponding trials.

3.2 Model Validation

Figure 2 demonstrates a good alignment between the simulated and observed relative risks along a 45-degree perfect calibration line. In most cases, the simulation fell within the 95% confidence interval of the observation, indicating proper calibration. Overall, the weighted average bias was 1.90 and 2.05 per 1000 patient-years, and MAPE was around 20% for both the placebo and treatment arms (Supplementary Table 5.1). Of note, AMI was overestimated in both placebo and treatment arm with an average bias of 6.09 and 5.72 per 1000 patient-years, respectively (Supplementary Table 5.2). The calibration intercept and slope were not significantly different from 0 and 1 when excluding the estimation of AMI (Supplementary Fig. 5.1).

Fig. 2
figure 2

The validation of relative risks between treatment and control group. The black line at a 45-degree angle indicates perfect calibration, where the predictions match the simulations. Dashed grey line and ribbon indicate the calibration line with 95% confidence interval. The calibration line being closer to the 45-degree angle perfection calibration line indicates greater validation. Estimated equations and squared linear correlation coefficient of the calibration (R2) are listed in the top-left of both graphs. Point and error bar indicate the respective simulation and observation with its 95% confidence interval. More error bars crossing the perfect calibration line (i.e., the simulation is located within the 95% confidence interval of the observation) indicate greater validation. AMI acute myocardial infarction, CVA cerebrovascular disease, CHF congestive heart failure, CANVAS CANagliflozin cardioVascular Assessment Study, DECLARE Multicenter Trial to Evaluate the Effect of Dapagliflozin on the Incidence of Cardiovascular Events, EMPA Cardiovascular Outcome Event Trial in Type 2 Diabetes Mellitus Patient

3.3 Scenario Analysis of Cost-effectiveness

Table 3 shows the cost-effectiveness results for DCS-filtered cohorts of SGLT2i compared with current standard of care. Risk-factor level changes for each scenario are presented in Supplementary Figs. 3.7–3.9. On average, the incremental QALY and cost of SGLT2i compared with care-as-usual is 1.36 and €7015, respectively. In all cases, SGLT2i were cost effective. Respectively, the ICER was €5382, €5001, and €7822 per QALY for CANVAS, DECLARE-TIMI58, and EMPA-REG in the base case.

Table 3 Cost-effectiveness of each DCS sub-cohorts (in 2021, euro, Dutch unit costs, Dutch setting, 40 years)

On average, individuals who qualified for the Dutch reimbursement criteria of SGLT2i showed on average a 11% higher QALY gain and a 7% lower ICER than the trial-filtered cohorts. For reimbursed individuals, the ICER was €5440/QALY using the trial-weighted average effectiveness, or €5495/QALY for canagliflozin (CANVAS), €5476/QALY for dapagliflozin (DECLARE-TIMI58), and €5320/ QALY for empagliflozin (EMPA-REG), respectively. Of note, using RWE-based effectiveness estimates resulted in the lowest ICER (€4873/QALY), since the effects were largest.

The ICER of worst-case (i.e., immediately diminish treatment effect) and best-case scenarios (i.e., long-last treatment effect) did not substantially differ from the base case (less than 20% difference on both sides on average). Not incorporating hazard ratios of events (scenario 4) estimated an average of 41% higher ICER. Nevertheless, SGLT2i remained cost effective. The subgroup analysis with reimbursed individuals who might prescribe SGLT2i based on GP practice resulted in lower ICER (€4530/QALY for trial-weighted average and €4098/QALY for RWE) than the general reimbursement.

3.4 Sensitivity Analysis of Cost Effectiveness

Supplementary Fig. 5.2 presents the tornado diagram (95% CI or ± 20% of input values) and the line graph of cost effectiveness for multiple time horizons as the DSA results. Drug cost had the most significant impact on the ICER estimation. In both the tornado diagram and line graph, all ICERs remained under the €20,000/QALY threshold. Thus, our conclusion that SGLT2i are cost effective compared with care-as-usual for the Dutch routine care population is robust for all parameters’ estimates. This conclusion holds not only in the period of a lifetime, but also in the short term (e.g., 5 years) and intermediate term (e.g., 10 years).

4 Discussion

Although the reimbursement criteria resulted in a target group with significantly different characteristics than the populations used in the clinical trials, SGLT2i could be considered cost effective at €5440/QALY, using effectiveness estimates from RCTs, from a third-party payer perspective using a lifetime horizon. Results were robust in various sensitivity analyses.

Inclusion criteria were strictest in the EMPA-REG trial, requiring established ASCVD individuals and representing only 3% of the Dutch diabetes population. This proportion is lower compared with previous findings [18], mainly because DCS is a well-managed diabetes routine care cohort with extra annual assessments of diabetes-related risk factors following the Dutch College of GP’s treatment guidelines (e.g., targeting HbA1c < 7%), while the trial excluded individuals with HbA1c < 7%. The inclusion criteria are broader in the CANVAS trial, allowing the inclusion of individuals with high cardiovascular risk (indicated by past events) and representing 8% of the Dutch routine care population. DECLARE-TIMI58 has the broadest criteria, allowing the inclusion of individuals with multiple risk factors for cardiovascular disease (e.g., dyslipidemia, hypertension, and tobacco use) and representing 37% of the Dutch routine care population. This confirms the previous finding that the DECLARE-TIMI58 trial had the largest representation of general T2D individuals in The Netherlands [18], compared with the other two trials. The current Dutch reimbursement criteria allow individuals with high cardiovascular risk (indicated by past events and eGFR) and resulted in qualifying 16% of individuals in routine care and as such seems in between DECLARE-TIMI58 and CANVAS.

Even when the same selection criteria were used, we found significant differences in the baseline characteristics between trial-filtered routine care cohorts and trial study populations. Routine care practice, as reflected in the (centrally organized) DCS cohort, tends to perform better in the management of HbA1c and BMI at baseline compared with trials, despite an older population, supporting the fact that trials frequently exclude older individuals [48]. We found QALY gains were higher and ICERs were lower for the older DCS-ZIN population than for the trial-filtered cohorts, offering economic support of a previous claim that SGLT2i are good therapeutic options for older individuals with diabetes [49].

Differences in baseline characteristics partly explain the discrepancies we found in ICERs compared with previously published evidence (e.g., €5476/QALY and €5502/QALY for dapagliflozin in our study and published evidence [50], respectively, both in a Dutch setting). Although a similar cost-effectiveness conclusion was found, our study indicated that using baseline characteristics of the patient cohort who qualified for the reimbursement criteria in routine care might estimate an average of 7% lower ICER compared with trial-filtered cohort. This finding deserves further attention when conducting economic evaluation for reimbursement purposes, not only for SGLT2i, but also for other new drugs which apply trial evidence for a target reimbursement audience [51]. Decision makers need to be aware of this difference between trial-based and reimbursement-based cost effectiveness for price negotiation.

In 2007, the International Society for Pharmacoeconomics and Outcomes Research Real-World Data Task Force published a statement supporting the use of RWE for coverage and reimbursement decision-making [52]. However, the majority of the current cost-effectiveness analysis of SGLT2i has been developed on the basis of RCT rather than RWE, due to the lack of real-world effectiveness evidence [16]. Also for SGLT2i, evidence on effectiveness in RWE is rare, with only two reviews on RWE [35, 53] having been identified in an umbrella review published in 2022 [24]. In those two reviews on RWE, 14 studies (3,157,259 patients) and 8 studies (1,536,339 patients) were included, respectively. This scarce RWE seemed to show a greater benefit of SGLT2i than RCT [24, 35, 54], which explains our finding that incorporating RWE leads to greater health benefit and cost effectiveness (i.e., 156% higher QALY gains and 10% lower ICER in the base case). However, these results should be interpreted with caution. It may seem counterintuitive to find RWE performing better than RCT. One explanation is that although propensity scores analytic approaches are extensively used in RWE studies to form comparable groups, they cannot eliminate the possible effect of confounding due to uncontrolled and unmeasured factors [24]. For example, possibly in RWE, only individuals with a good prognosis received the treatment and as a result treatment effects evaluated in RWE were larger than the RCT effects [54]. Our subgroup analysis of individuals with a good prognosis—defined as those with a remaining life expectancy of more than 5 years and an HbA1c greater than 7% according to current Dutch GP practice [46]—supports this finding. We found that SGLT2i provided greater health benefits (i.e., 18% higher incremental QALY on average for all scenarios) for individuals with good prognosis than for those who were reimbursed in general. These results may suggest that the observed benefit in RWE may be partially attributed to selection bias.

The cost-effective conclusion is nevertheless robust to all scenarios and sensitivity analysis. We found drug cost has the largest influence on the ICER of SGLT2i, which is consistent with a previous finding [55]. The inflation of drug cost over time or renegotiation of drug prices might change the conclusion on the cost effectiveness, but as long as the annual price is lower than €4105 in the real-word setting (Supplementary Fig. 5.3), our conclusion, i.e., SGLT2i are cost effective, will not be affected.

Previously published model-based cost-effectiveness analyses of SGLT2i mainly applied patient-level simulation models [56], such as the IQVIA/CORE model [57] and UKPDS-OM2 [19]. Our study indicated that the MICADO model, as a cohort-level model, also effectively simulated the placebo and treatment arm of SGLT2i in EMPA-REG, CANVAS, and DECLARE-TIMI58 and predicted their beneficial effect (e.g., reduced CHF). Specifically, the MICADO model overestimated the incidence of AMI, but it showed a good fit for CVA and CHF, confirming the results of its previous validation research [39].

Only a few cost-effectiveness analyses or diabetes models have incorporated direct evidence (e.g., hazard ratios) of treatment effects on event rates [17]. Most analyses modeled treatment effects only through risk-factor level changes (e.g., HbA1c and BMI) [17]. Our study used a hybrid approach to capture benefits that might be independent of changes in HbA1c and other risk factors. We found that treatment scenarios without incorporating hazard ratios in either RWE or RCT might lead to an ICER that is on average 41% higher compared with the base case. This finding highlights the necessity of incorporating hazard ratios for future decision-makers or modelers, especially regarding treatments that show a special ability to reduce cardiovascular risk.

The cost-effectiveness analysis of SGLT2i in a real-world setting is important, because SGLT2i have been proven to be potential preferred agents for T2D owing to their cardiovascular and renal benefit. However, SGLT2i are underused in current routine practice in many countries, including The Netherlands [55, 58, 59], partly due to their high cost and lack of RWE [55]. Our findings confirmed that SGLT2i were cost effective compared with care-as-usual in routine care individuals, which is consistent with earlier papers [16, 55]. However, we applied real-world reimbursement criteria, evaluated the lifetime outcomes of a routine care cohort, and attempted to incorporate RWE to the greatest extent. Moreover, Yoshida’s and Rahman’s reviews indicated that the majority of studies were conducted in a UK or US settings, and only one study evaluated the cost effectiveness of adding dapagliflozin to insulin in The Netherlands [16, 50, 55]. Our study filled the knowledge gap in the cost effectiveness of adding SGLT2i compared with care-as-usual in The Netherlands by focusing on the Dutch routine care cohort based on the Dutch model. Considering the Dutch population’s characteristics are more similar to those of other European populations, such as higher average age [60] and SBP [61] and milder obesity [62] than USA and UK, our study might provide cost-effectiveness evidence that is more generalizable to European countries.

There are several limitations to the current study. First, our study attempted to include RWE, but the evidence is scarce and insufficient (e.g., lack of details of risk factors trajectory). The larger effectiveness observed in the RWE than in the RCT was somewhat surprising. We evaluated one possible cause, i.e., treatment indication on good prognosis, in a subgroup analysis, but these findings might also be caused by publication bias or lower quantity and quality of RWE-based studies [24]. The accuracy of cost-effectiveness analysis using RWE is therefore limited. Rather than referencing previous publications, future studies could evaluate RWE-based treatment effects when more SGLT2i users and follow-up data are available (only 0.3% users in DCS until 2019). Second, we omitted some inclusion and exclusion criteria for filtering because of a lack of information (e.g., pregnancy). This likely did not affect our results, since we included the most important risk factors stratifying diabetes individuals, such as HbA1c and cardiovascular risks. Third, the potential side effects of SGLT2i, e.g., ketoacidosis, genital infection, and volume depletion, etc. [63], and the possibility that individuals stop taking SGLT2i when side effects occur, were not considered because the MICADO model did not include the corresponding relevant health states. This might lead to an underestimation of ICER; however, individuals that stop taking SGLT2i will no longer have costs and hence this will not affect the ICER to a large degree. Furthermore, our study utilized a third-party payer perspective, despite the Dutch guideline recommending a societal perspective [64]. Since we did not consider the impact of side effects, the only additional relevant outcome and costs would be gains in work productivity. These potential gains would result in additional savings, and therefore, the chosen perspective does not influence our conclusion that SGLT2i were cost effective. Fourth, we assumed treatment effects were the same in the reimbursement cohort as observed in RWE or RCTs, and this might be biased because we found significant differences in baseline characteristics between trial-filtered DCS cohorts and original trial cohorts. However, we aimed to compare the difference in ICER between trial-alike and reimburse-alike populations for Dutch diabetes individuals, and the best estimator of treatment effects for a trial-alike population is the trial-based effectiveness. Previous studies [2, 65] could be referred to if trial-based cost effectiveness (i.e., same baseline characteristics as trials) is of interest. Furthermore, we did not include probabilistic sensitivity analysis due to the unavailability of model parameters’ distribution, especially regarding the prevalence and incidence of diabetes events and their risks and computational burden. Our current study was not designed to support specific decision making, but rather to compare current Dutch reimbursement criteria with trial populations and considering the influence of RWE. Previous studies [2, 5, 66, 67] have already shown, more elaborately, the decision uncertainty regarding the cost effectiveness of SGLT2i compared with usual care. It is worth noting that these studies often consider probabilistic sensitivity analysis by assuming that cost parameters follow a gamma distribution and utility parameters follow a beta distribution [2, 5]. However, they may leave out some other important model parameters, such as the incidence of events, in their probabilistic sensitivity analysis [2, 5]. Finally, the current version of MICADO model could not directly capture the renal benefits of SGLT2i. However, via effects on HbA1c some of these effects have been included. Hence, we might somewhat underestimate the incremental QALYs and cost savings. Nevertheless, this limitation does not impact our conclusion that SGLT2i are cost effective.

5 Conclusions

The MICADO model developed from Dutch general practice registries is capable of capturing and predicting the benefit of SGLT2i with satisfactory accuracy. The Dutch reimbursement criteria for SGLT2i will result in a target group, which tends to be older, but with better controlled HbA1c and BMI than trial populations. SGLT2i can be considered cost effective in a routine care population in The Netherlands, reflecting current Dutch reimbursement criteria.