FormalPara Key Points for Decision Makers 
There is a consistent relationship between treatment-effect assumptions on glycosylated haemoglobin and simulated outcomes in type 2 diabetes mellitus cost-effectiveness studies.
A 1% glycosylated haemoglobin decrease in intervention results in an increase in life-time quality-adjusted life years and life expectancy of 0.371 and 0.642, respectively.
This relationship can be used as a benchmark to identify studies deviating from others, and generate preliminary long-term effectiveness predictions when insufficient resources are available to use a simulation model.

Introduction

Simulation modelling is a useful tool in health economic evaluation, especially for interventions targeted at treating chronic diseases such as diabetes mellitus. Clinical trials of new therapies and behavioural interventions in type 2 diabetes often involve estimation on changes in intermediate-risk factors such as glycosylated haemoglobin (HbA1c) [1, 2]. For these interventions to be evaluated using common health economic outcomes such as quality-adjusted life-years (QALYs), the potential benefits of observed improvements in metabolic control need to be extrapolated over longer time periods to capture impacts on the rates of complications of diabetes as well as mortality.

There is a long history in the use of simulation models to evaluate interventions for people with type 2 diabetes. The first simulation model for type 2 diabetes was built in 1997 by Eastman et al. [3, 4], which was based on clinical data from both type 2 and type 1 diabetes. Many models have been developed since then [5], and while many are based on the United Kingdom Prospective Diabetes Study (UKPDS) Outcomes Model [6], there are differences in terms of model structure, particularly when they incorporate additional epidemiological and trial evidence to capture additional complications (e.g. impact of hypoglycaemia). A common feature of all these models is that they extrapolate changes in intermediate outcomes such as HbA1c to metrics such as QALYs or life expectancy (LE), which are most commonly used to capture outcomes in economic evaluations.

While most diabetes simulation models have been developed independently, the field has benefited from the diabetes simulation modelling conference ‘Mount Hood Diabetes Challenge’, which has been regularly held since 2000 to compare and contrast the outputs of models in a set series of simulations [7]. The last Mount Hood Diabetes Challenge meeting was held in 2014 and developers of 11 type 2 diabetes models participated, highlighting the development of simulation modelling in type 2 diabetes in recent years [8]. Diabetes simulation models are now widely used in cost-effectiveness studies and play an important role in defining clinical guidelines and the evaluation of new drugs [9].

Former reviews in type 2 diabetes cost-effectiveness studies have focused on summarising and describing the characteristics of different models [5, 10]. Currently, there are no studies that use quantitative techniques to evaluate the relationship between treatment-effect assumptions (e.g. a reduction in the level of HbA1c) and simulated outcomes (e.g. a gain in QALYs or LE). In this article, we systematically review cost-effectiveness studies of glycaemic control interventions that use type 2 diabetes simulation models to estimate long-term outcomes of QALYs or LE. Using data extracted from those studies, we use regression analysis to estimate the relationship between initial changes in HbA1c following the commencement of an intervention and model outputs. The objective of this research is to explore whether there is a consistent relationship between a widely reported intermediate outcome that is often used as an input in simulation models and model outputs across type 2 diabetes cost-effectiveness studies, and across different models.

Methods

Data Sources and Searches

We reviewed studies that involved use of a type 2 diabetes simulation model to inform a cost-effectiveness analysis (CEA) or cost-utility analysis (CUA) of blood glucose-lowering interventions that measured a change in HbA1c. All ten type 2 diabetes models that participated in the fourth (2004) or fifth (2010) Mount Hood Diabetes Challenge were considered to be eligible for inclusion in this study [11, 12], including the UKPDS Outcomes Model, IMS CORE Diabetes Model, Cardiff Model, Sheffield Diabetes Model, EAGLE Model, CDC-RTI Diabetes Cost-effectiveness Model, Archimedes Model, Michigan Model, ECHO-T2DM and the Evidence-Based Medicine Integrator Simulator. The UKPDS risk engine participated in the fourth Mount Hood Diabetes Challenge but was not included in this study as it does not quantify lifetime outcomes in terms of QALYs or LE [13]. Descriptions and further details of these models can be found in the Mount Hood Diabetes Challenge reports [11, 12].

The Preferred Reporting Items for Systematic Reviews and Meta-analyses recommendations and checklist were followed to conduct the systematic review [14]. Studies were identified by searching electronic databases, supplemented by scanning citations of the original publication of the ten targeted models and finally by contacting the model groups individually. The search was applied in two electronic databases, MEDLINE (1946 to present) and EMBASE (1947 to present) on Ovid. The subject heading ‘Diabetes Mellitus, Type2’ and other search terms including T2DM, cost effective*, cost utilit*, long term outcome*, long term consequence*, health economic*, health evaluation, economic evaluation, model*, simula*, QALY*, life year* and LE were used (see Supplementary Material 1 for full details on the search strategy).

Study Selection

The search was completed on 1 June, 2015 and included all published studies prior to that date. The inclusion criteria were as follows:

  • language in English;

  • reported outcomes were part of a CEA or CUA;

  • the study simulated outcomes based on one of the ten diabetes models mentioned above;

  • the population simulated in the model involved only people with type 2 diabetes;

  • the intervention focused on improvements in blood glucose control;

  • the primary treatment effect being modelled was the difference in HbA1c;

  • the study reported long-term (≥20 years) outcomes in either QALYs or LE.

Studies were excluded if:

  • language was not English;

  • it did not conduct CEA or CUA;

  • it used other established models or self-built models;

  • it included other populations (type 1 diabetes, pre-diabetes);

  • the intervention did not focus on improvements in blood glucose control or was a multifactorial intervention;

  • HbA1c was not used as the treatment effect or there was no difference in the reduction of HbA1c levels between the intervention and control group;

  • it did not report QALYs or LE as outcomes;

  • the outcomes were estimated only in the short term;

  • no discount rate was reported for the outcomes;

  • the study did not report the data necessary for this research (difference in HbA1c and incremental QALYs/LE).

Two reviewers (XH and LS) separately reviewed the studies by reading their abstracts and full texts. A third reviewer’s (PC) opinion was considered when there was a conflict of opinion between the first two. All but one of the Mount Hood modelling groups were contacted through email with a request for publications or a list of studies fitting our criteria (the EAGLE model group was excluded as no current contact details are available for its developers). A published model description paper was found for eight models (UKPDS [6], CORE [15], Cardiff [16], EAGLE [17], CDC [18], Archimedes [19], Michigan [20] and ECHO [21]). Studies that cited the aforementioned papers were scanned in Google Scholar, Web of Science and PubMed and added if they met our inclusion criteria.

Data Extraction

The intervention’s treatment effects on risk factors were extracted from each included study. Risk factors include HbA1c (%), body mass index (BMI), systolic blood pressure (mm Hg), total cholesterol (mmol/L) and hypoglycaemic events (patient-year). The treatment effect is the difference in effect between the two groups. In some studies, this difference is directly reported; in other studies, they report the change in risk factors separately for each group. In the latter situation, we calculated the treatment effect by subtracting the two change values. Proper transformation was made when risk factors were not reported in the required unit (1 mmol/L = 38.6 mg/dL for total cholesterol). For eight studies [2229], that only reported change in weight but not BMI as a treatment effect, the change in BMI was imputed using the cohort baseline height or 1.67 m as the default height (the average height for patients in the UKPDS [30]) if the cohort height was not reported in the study.

The model outcomes of interest in this study are QALYs and LE. Differences in QALYs (∆QALYs) and LE (∆LE) between the two groups in the base case were extracted from each study, as well as the discount rate used for the base case. Differences in undiscounted outcomes were also collected if they were available in the sensitivity analysis. We also extracted summary statistics of the study cohort, including the year of the study, comparators, cohort baseline age, diabetes duration and post-treatment HbA1c level in the control group. For studies that evaluated multiple comparisons, data of all these comparisons were included and collected. Data extraction was conducted independently by two reviewers (XH and LS). Disagreements were resolved through discussion.

Data Synthesis and Analysis

We conducted a basic descriptive analysis on the type of interventions, comparators and risk factors involved to summarise the identified studies. All analyses were conducted on undiscounted outcomes. Where studies only reported discounted outcomes, estimates of the undiscounted outcomes were imputed based on an algorithm that was developed from studies that reported both discounted and undiscounted outcomes (see Supplementary Material 4 for further description).

The relationship between ∆HbA1c and the difference in outcomes for comparators was examined using scatterplot and linear regression. Univariate regression was conducted initially for studies using different models separately. Then, a multivariable regression analysis was undertaken for all the studies. We used eight independent variables in the multivariable regression: difference in HbA1c, BMI, systolic blood pressure, total cholesterol, hypoglycaemic events; cohort baseline age; diabetes duration and post-treatment HbA1c level in the control group. Studies lacking information on these variables were excluded from the multivariable regression. Linearity of explanatory variables was checked and confirmed using multivariable fractional polynomials [31]. Interactions of each explanatory variable with HbA1c were checked by adding a multiplicative term into the regressions and no interactions were found. Statistical analysis took into account clustering by using the clustered robust standard errors method, owing to multiple comparisons coming from the same study. Before pooling studies that used different models together, the interaction effect between model type and ∆HbA1c on ∆QALYs and ∆LE was tested.

To test the coefficients of ∆HbA1c across multivariable regressions involving QALYs and LE, the equations were jointly estimated using STATA command MVREG (StataCorp LP, College Station, TX, USA) [32] and a Chi-square test was made. A significant difference between the two coefficients was confirmed. The ratio between ∆QALYs and ∆LE was then calculated and a scatterplot between the ratio and ∆HbA1c was built to explore the relationship between these two. Logarithm transformation of the ratio was made in an effort to fit the scatterplot. All statistical analyses were conducted using STATA 13.1 IC (StataCorp LP, College Station, TX, USA).

Results

Summary of the Studies Included in the Analysis

Two hundred and eighty-eight studies were identified after applying our search strategy on MEDLINE and EMBASE and after abstract and full-text review, 65 publications were included in our study, as shown in Fig. 1. Our e-mail request resulted in ten extra studies, one from the ECHO model and nine using the CORE model and one additional study using the UKPDS Outcomes Model was identified through citation scanning. In total, 76 studies were included in this research, resulting in 124 pair of comparators [18, 22, 23, 2529, 3542, 44103].

Fig. 1
figure 1

Flow diagram of publications included and excluded from the review. CEA cost-effectiveness analysis, CUA cost-utility analysis, HbA 1c glycosylated haemoglobin, LE life expectancy, QALY quality-adjusted life-year, UKPDS United Kingdom Prospective Diabetes Study

A summary of these studies can be found in Supplementary Material 2. Of the included studies, 43.4% evaluated the cost effectiveness of oral therapy drugs, 29.0% for insulin, 23.7% for management interventions and 3.9% (three studies) were not intervention specific. In addition, 22.4% of all included studies used HbA1c as the only treatment effect of the intervention. In recent years, other treatment effects such as BMI, blood pressure, hypoglycaemic events and lipid levels have increasingly been used, mainly in studies that evaluate cost effectiveness of new oral therapies; BMI and hypoglycaemic events are the two common effects besides HbA1c in insulin evaluation studies (see Figures S1 and S2 in Supplementary Material 3).

Relationship between Difference in HbA1c and the Outcomes

For the 76 studies included in this analysis, 75 reported QALYs as their model outcome and 59 studies reported LE as a simulation outcome. Forty-five studies (59.2%) reported undiscounted QALYs in their sensitivity analysis. The ratio values used for different discount rates to calculate undiscounted outcomes for the rest of the 31 studies (59/124 data points) can be found in Supplementary Material 4. One study [33] with four comparators was excluded from the QALYs regression analysis because it involved a large utility change in year 1 (0.152–0.312) as a direct treatment effect stemming from the intervention. This makes the relationship between HbA1c and QALY gain in this study not comparable to others, as other studies generally assumed changes in utility were mediated through changes in other risk factors such as BMI.

As only 19 studies used models other than the CORE model, those studies were combined together for the following regression analysis. Figure 2 depicts the scatterplots of differences in HbA1c and the difference in QALYs or LE in studies that use the CORE model and other models. A linear relationship could be found in these scatterplots.

Fig. 2
figure 2

Relationships between ∆HbA1c and ∆QALYs or ∆LE, scatter and fitted linear regression. HbA1c glycosylated haemoglobin, LE life expectancy, QALY quality-adjusted life-year

The mean difference in HbA1c is 0.51%, while the mean increment in QALYs and LE are 0.409 and 0.389, respectively. The univariate regression results (Table 1) showed that for studies that used the CORE model, every 1% decrease in HbA1c from the intervention resulted in an increase of 0.455 and 0.808 for QALYs and LE, respectively. For studies that used the other models, every 1% decrease in HbA1c from the intervention resulted in a 0.352 increase in QALYs and a 0.696 increase in LE. No interaction effect was found between the models and a change in HbA1c (p = 0.557 for QALY; p = 0.234 for LE). After pooling all studies together, every 1% decrease in HbA1c from the intervention resulted in a 0.434 increase in QALYs and a 0.794 increase in LE. All the coefficients for HbA1c are significant at the 5% level.

Table 1 Results of univariate regression between ∆HbA1c and ∆QALY as well as ∆LE

Six studies were excluded from the multivariable regression because of a lack of information on age, diabetes duration or post-treatment HbA1c level in the control group [18, 3842]. After controlling for all five risk factors and age, diabetes duration and post-treatment HbA1c level in the control group and pooling all studies together, ∆HbA1c, ∆BMI, ∆blood pressure and ∆hypoglycaemia events were the four variables with significant coefficients for ∆QALYs. Every 1% decrease in HbA1c from the intervention resulted in a 0.371 increase in QALYs. ∆HbA1c, ∆blood pressure and ∆total cholesterol were significant coefficients in the multivariable regression for LE. Every 1% decrease in HbA1c from the intervention resulted in a 0.642 increase in LE (Table 2).

Table 2 Results of multivariable regression

The relationship between ∆HbA1c and the ratio of ∆QALYs and ∆LE can be found in Fig. 3. When the difference in HbA1c is small, the ratio between ∆QALYs and ∆LE increases dramatically. By transforming the dependent variable into ln(y), an inverse exponential relationship was found and fitted to the scatter graph.

Fig. 3
figure 3

Relationship between ∆HbA1c and the ratio of ∆QALYs and ∆LE, scatter and fitted regression. HbA1c glycosylated haemoglobin, LE life expectancy, QALY quality-adjusted life-year

Discussion

This study used regression analysis to estimate the association between changes in risk factors (e.g. HbA1c), which are common inputs for simulation models, and the estimated outcomes. The analysis is based on data from 76 studies obtained through a systematic review of published cost-effectiveness studies of blood glucose-lowering interventions for people with type 2 diabetes. Based on multiple linear regression that adjusted for a variety of metabolic risk factors, it found that the marginal effect of a 1% HbA1c decrease could result in life-time increases in QALYs and LE of 0.371 and 0.642, respectively. There was no evidence of heterogeneity between models. Studies reporting small differences in HbA1c tend to report larger gains for QALYs than LE, which implies that when the treatment effect on HbA1c is limited, the increase in QALYs mainly comes from utility gain, rather than longer life-years.

In this study, we mainly focus on the treatment effect of changes in HbA1c, as it is used in almost all analyses of blood glucose control interventions and is an input for all diabetes simulation models. Our review result suggests that recent economic evaluations of blood glucose-lowering interventions now use a wider variety of risk factors. For example, hypoglycaemic events, which have been incorporated into several models (CORE, Cardiff and ECHO) as an outcome of interest, were not captured in economic evaluations prior to 2006, but influence the outcomes of over 60% of the model simulations of blood glucose lowering in published studies during 2013–15 (Figure S1 in the Supplemental Material). We included these treatment effects into our multivariable regressions and found a significant relationship between them and model outputs as well. Furthermore, the relationship between the ratio of increase in QALYs and LE and a difference in HbA1c could be explained by the treatment effect on BMI and hypoglycaemia events. In addition to the impact on complications and death, the decrease in BMI and avoidance in adverse events themselves could also increase people’s quality of life [34] and appear to play a significant role in the outcomes of some recent economic evaluations of blood glucose control therapies [35, 36].

There are several potential practical applications of the relationship between a change in initial HbA1c and model outcomes found in this study. First, it can be used as a diagnostic tool or benchmark for decision makers, enabling them to identify analyses that deviate from the general trend and investigate whether there are other factors that may have led to the discrepancy and whether they are reasonable. Second, with limited information and resources to run a diabetes simulation model, the regression estimated in this study can be used to give a rough prediction of the long-term effectiveness that could be expected from an intervention in its early stages. In addition, beyond the specific results, this study provides a potential methodology for a meta-analytic approach to combining the results of cost-effectiveness studies based on simulated outcomes in the future.

This study has also highlighted inconsistencies in the reporting of assumptions regarding the treatment effect and in results of model simulations. To make cost-effectiveness simulation results transparent, the effect of treatment on all major risk factors should be reported over time. Currently, there is no standard way of reporting assumptions regarding the duration of an intervention effect on risk factors (e.g. some studies report an annual decay [37], while others a change at some future time [26]). The lack of consistent reporting has made it hard for us to incorporate this information into our regressions at this stage, thus we have been limited to using the initial change in HbA1c and other risk factors in our regression models. Three studies were excluded because the initial change in risk factors was not clearly reported. However, the main issue with the reporting of outcomes was that 40% of studies included in this analysis did not report their undiscounted results. Studies from different countries usually use different discount rates for their base case results, which makes the comparison between studies difficult to conduct. To address this issue, we calculated an average ratio between discounted and undiscounted ∆QALYs and used this to infer undiscounted outcomes when these were not reported. Basic cohort characteristics (age, duration of diabetes, proportion of male, ethnicity and other baseline risk factors) and model assumptions (time horizon) were also sometimes not reported. In this analysis, six studies were excluded from the multivariable regression because of a lack of information on age, diabetes duration or post-treatment HbA1c level in the control group [18, 3842]. There is a clear need for general reporting standards of diabetes cost-effectiveness studies to be developed to promote transparency and facilitate future model comparisons. A starting point for this is the use of the Consolidated Health Economic Evaluation Reporting Standards [43].

This study is subject to a number of limitations. First, the difference in HbA1c collected in this study is the initial difference between two simulated cohorts, which is often the only measure of glycaemia that many simulation modelling studies included. While we note some simulation models make additional assumptions about the relative trajectory of HbA1c, most economic evaluations of diabetes therapies do not report these in a uniform manner and with insufficient details to be incorporated into the current analysis. While there is a strong association between initial HbA1c and long-term outcomes, the consistency of regression relationship depends on independence of the initial HbA1c and the error term. It would be useful to re-examine this assumption in future work, particularly as the transparency of reporting of simulation models improves over time.

Second, a lack of statistical power (124 pair of comparators) meant that we were unable to include many variables in the multivariable regression. Although other factors such as sex percentage, ethnicity and baseline values for other risk factors were also collected they were not included. Further, the limited number of studies using simulation models other than CORE meant we were unable to investigate the consistency between models separately. We found no evidence of heterogeneity between models by comparing studies that used the CORE model and studies that used any of the other type 2 diabetes simulation models. Again, there is scope for future work to include more control variables and further explore the consistency between models when more studies are available.

Finally, we have not taken account of reported uncertainty surrounding model estimates because of the limitations in the reporting of these measures in published studies. A future analysis could focus on comparisons of uncertainty by different simulation models.

Conclusion

We found a linear relationship between the difference in HbA1c and the difference in QALYs and LE based on published studies using type 2 diabetes models. There was no evidence of heterogeneity among models. When the difference in HbA1c is small, the gain in QALYs largely exceeds the gain in LE, suggesting when the treatment effect on HbA1c is limited, the increase in QALYs mainly comes from utility gain, rather than longer life-years. Our study provides a benchmark for decision makers to identify studies deviating from others, and potentially generate preliminary long-term effectiveness predictions when insufficient resources are available to use a simulation model.