Introduction

Diabetes is a major health issue that has reached alarming levels. Today, nearly half a billion people are living with diabetes worldwide. In 2017, it was estimated that 425 million people had diabetes (types 1 and 2 combined), increasing to 463 million in 2019, and this number is projected to reach 578 million by 2030 [1]. Due to population growth and aging, the Global Burden of Disease Study showed that all-age disability-adjusted life-years (DALYs) of people with diabetes in 2016 were 57,233.7, which increased by 24.4% from 1990 to 2016 [2]. To decrease the high disease burden [3,4,5], efficient prevention and treatment of diabetes and its complications are major tasks for health policy. In these situations, disease decision models play a vital role in supporting decision-making for evaluating the long-term health and economic outcomes of interventions in the public and private health sectors [6].

Disease decision models are logical mathematical frameworks that synthesize the available data (e.g., short-run clinical trial outcomes, risk equations, and progression rates) and known physiologic relationships into a coherent internally consistent framework that can be extrapolated over time [7, 8]. Many models have been developed and validated for type 2 diabetes mellitus (T2DM) populations and used in a variety of ways, such as estimating long-term clinical outcomes and costs of a clinical trial and aiding decision makers in choosing between available interventions in these populations [9,10,11,12]. For instance, the Centers for Disease Control (CDC) Diabetes Cost-effectiveness Group used the Diabetes Cost-Effectiveness Model (DCEM) to estimate the incremental cost-effectiveness of intensive glycemic control (relative to conventional control), intensified hypertension control, and reduction in serum cholesterol levels in patients with T2DM [12]. From a modeling standpoint, T2DM ranks among the most challenging disease areas because of its impact on multiple interrelated organ systems and multiple treatment goals (including blood glucose, blood pressure, and blood lipids) [13]. However, unlike models in type 1 diabetes mellitus (T1DM) and prediabetes [14, 15], there are few comprehensive summaries and assessments of the existing decision models for T2DM.

Our research provides an overview of the characteristics and capabilities of published decision models in T2DM. We also discuss which models are more suitable for different study demands.

Methods

Search strategy and selection criteria

This systematic review was conducted and reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [16].

Four databases (PubMed, Web of Science, Embase, and the Cochrane Library) were electronically searched for papers that were published from inception to August 2020. The following search terms/MeSH terms were used: “Diabetes Mellitus”, “Type 2”, “cost-utility”, “quality of life”, and “decision model”. The integral search strategy is provided in Appendix 1. We also manually searched the reference lists of the included studies. References were managed using ENDNOTE X9 (Clarivate, Philadelphia, PA). Studies were eligible for inclusion if they met the following predefined criteria:

  1. 1.

    Population: Patients with T2DM; modeling studies conducted in a mixed population (T1DM and T2DM) were included only if the model adaptation for T2DM patients was reported separately in the full-text publication;

  2. 2.

    Intervention and comparators: No restrictions;

  3. 3.

    Outcomes: Studies with decision models in T2DM that reported health economics outcomes such as costs, (quality-adjusted) life expectancy, and diabetes-related complications;

  4. 4.

    Study design: All modeling studies capable of performing a full economic evaluation were included.

The exclusion criteria were as follows:

  1. 1.

    Population: T1DM only, or gestational diabetes or maturity-onset diabetes of the young (MODY);

  2. 2.

    Outcomes: Modeling studies with a limited focus on particular sub-components of T2DM (e.g., only one complication of T2DM), or modeling application studies with a time horizon of ≤ 5 years;

  3. 3.

    Study design: Abstracts or full-text unavailable.

Two reviewers (L.J. and C.X) independently screened the titles and abstracts according to the inclusion criteria. If there was insufficient information to include or exclude a study, then a full-text version was sought. A consensus between both reviewers was required. Full-text versions of all the relevant studies were also obtained and read by two independent reviewers (L.J. and B.Y.) to ensure that the inclusion criteria were met. Any disagreement between the two reviewers was resolved by a third reviewer for assessment. If there was insufficient information to include a study, then the authors were contacted when possible.

Quality assessment

Two reviewers (L.J. and B.Y.) independently assessed the quality of all the included studies by using the Philips et al. [17] checklist, which assesses the quality of reporting of the decision models and model-based economic evaluations, as recommended in the Cochrane Handbook for Systematic Reviews of Interventions [18]. Any disagreement between the two reviewers was resolved by a third reviewer for the assessment. The checklist by Philips et al. evaluates three domains of a model: (1) structure, (2) data, and (3) consistency.

Data extraction and analysis

If a decision model was found to be associated with multiple studies, these studies were assessed as sharing the same parent model: Only the primary study (the study that described the model in greater detail) for each model was considered for the review, while supplementary and subsequent studies were documented as secondary studies. Data from secondary studies were not extracted. Data from the identified studies included in the review were extracted into data extraction grids (supplementary material Appendix 2) by two independent reviewers (L.J. and B.Y.). The extracted information included basic information, study details, population characteristics, basic modeling methodologies, model structure, data inputs for the included applications, model outcomes, model validation, and uncertainty.

Results

A total of 25,995 related studies were searched in this systematic review; 10,102 duplicates were removed, and 15,893 studies were excluded based on first-pass screening using the title and abstract. Following the full-text review, 140 identified studies involving 14 decision models in T2DM were identified. Figure 1 shows the flow of studies throughout the review. Among the 140 identified studies, 79 used the CORE Diabetes Model (CDM), 17 used the Cardiff model, 13 used the United Kingdom Prospective Diabetes Study Outcomes Model 1 (UKPDS-OM1), 5 used the Archimedes model, 4 used the UKPDS-OM2, 4 used the Swedish Institute of Health Economics Cohort Model of Type 2 Diabetes (IHE), 3 used the Economic and Health Outcomes Model for T2DM (ECHO), 3 used the Michigan model, 3 used the Diabetes Cost-Effectiveness Model (DCEM), 2 used the Chinese Outcomes Model for T2DM (COMT), 2 used the Non-Insulin-Dependent Diabetes Mellitus model (NIDDM), 2 used the Sheffield model, 2 used the Ontario Diabetes Economic Model (ODEM), and 1 used the Cornerstone Diabetes Simulation model (CDS). For each model, only the primary studies that described the model in greater detail were considered for review, and supplementary and subsequent studies were documented as secondary studies. The list of secondary studies is summarized in supplementary material Appendix 3. Models were set in the USA (n = 3) [9, 19, 20], UK (n = 3) [10, 21, 22], Sweden (n = 2) [23, 24], Canada (n = 2) [11, 25], China (n = 1) [26], Switzerland (n = 1) [27], Australia (n = 1) [28], and in multiple countries (n = 1) [12]. Four models [9, 12, 20, 27] solely utilized Markov chains, seven models [11, 19, 21, 22, 25, 26, 28] solely utilized risk equations, and three models [10, 23, 24] utilized both of them. Except for the Archimedes model, all other models (n = 13) implemented an annual cycle length. The time horizon of most models is flexible, up to the course of a lifetime. Almost all models involved cost-utility or cost-effectiveness analysis. An overview of each model is outlined in Tables 1 and 2 sorted by year of publication.

Fig. 1
figure 1

Flow diagram of literature search

Table 1 Overview of characteristic of decision models in type 2 diabetes (sorted by year of publication)
Table 2 Overview of characteristic of decision models in type 2 diabetes (sorted by year of publication)

Model structure

Tables 1 and 2 show aspects of model structures. Eight model structures [10,11,12, 22, 23, 25, 26, 28] were constructed in reference to pre-existing models. Models had certain differences in how health states were divided (Tables 3 and 4). The DCEM model placed greater emphasis on macrovascular complications, whereas the NIDDM and Michigan models placed greater emphasis on microvascular complications. Other models, apart from the Archimedes model, emphasized both macrovascular and microvascular complications (CDM, UKPDS OM1/2, IHE, ODEM, Cardiff, Sheffield, CDS, COMT, ECHO). The Archimedes model has no clear-cut health states, as it is continuous in time, with no discrete time steps, and any event could occur at any time. The IHE model included numerous health states for complications and used two parallel Markov chains. The first chain consisted of 120 different microvascular health states, and the second chain was made up of 100 different macrovascular health states. Six models [19, 22,23,24, 26, 27] included adverse events. Almost all these models classified them as treatment outcomes, not as independent health states. However, the CDM model incorporated adverse events into the model as independent health states. All models included death as a health state, while each model had different levels of detail in this state.

Table 3 Summary of model health states and adverse events
Table 4 Summary of model health states and adverse events

Eleven identified models were patient-level simulation models, while cohorts were used in the DCEM and IHE models. Either the patient -or cohort-level simulation method can be used in the CDM model. Except for the Archimedes model and the ECHO model, others illustrated the model perspective in the primary citations. Ten models considered a healthcare-related perspective in the base case (7 models [9,10,11,12, 21, 26, 28] used a healthcare-system perspective, 2 models [23, 25] used a healthcare decision-maker perspective, and 1 model [27] used a healthcare-payer perspective), while the NIDDM and Sheffield models considered a patient perspective and a social perspective, respectively.

Thirteen models used an annual cycle length, while the Archimedes model was continuous in time. Three models [21, 26, 27] did not use an annual cycle length for specific health states. The time horizon of 9 models [9,10,11, 19, 20, 23,24,25, 27] was defined by users, up to one’s lifetime, while the time horizon of 5 models [12, 21, 22, 26, 28] was set to one’s lifetime. The transition probabilities between models varied in complexity. Risk equations were applied in most models to handle transition probabilities depending on the epidemiology of T2DM, the risk factors, the incidence and prevalence of diabetic complications, and comorbidities.

Incorporation of risk factors

Eleven models [10, 11, 20,21,22,23,24,25,26,27,28] simulated annual changes in risk factors such as body mass index (BMI), glycemia, HbA1c, blood pressure (systolic and/or diastolic), and lipids (total cholesterol and/or high-density lipoprotein) (Table 2). The simulated trajectory of risk factors could affect the subsequent occurrence or development of diabetes and its complications. The DCEM and COMT models precisely controlled risk factors to reduce the onset and development of diabetes and its complications.

Model outcomes

The major model outcomes are summarized as follows (Table 5):

Table 5 Summary of model outcomes

Twelve models [11, 12, 19,20,21,22,23,24,25,26,27,28] reported life-years (LYs), ten model [11, 12, 19, 20, 22,23,24,25,26,27] reported incremental cost-effectiveness ratios (ICERs), and thirteen models [10,11,12, 19,20,21,22,23,24,25,26,27,28] reported quality-adjusted life years (QALYs). The ECHO and IHE models also reported net monetary benefits (NMBs). Some models [9, 10, 12, 19, 22, 24, 26, 27] also reported other outcomes.

Cost

All models reported costs, albeit at different levels of detail. Eleven models [9, 11, 12, 19, 20, 22,23,24,25,26,27] reported direct costs, whereas the CDM and IHE models reported both direct and indirect costs. Three models (UKPDS OM1/2 and the Michigan model) did not describe cost in detail. The outcomes of three models (UKPDS OM1/2 model and the Michigan model) included costs, but none of the included studies classified costs into direct and indirect costs.

Health utility

All models reported utility values as outcomes. Thus, subsequent cost-utility analyses (CUA) could be performed. Each health state in a model had a corresponding utility value. Utility values for complications were obtained with the EQ-5D health status questionnaire [10, 21, 28] and the Quality of Well Being–Self-Administered questionnaire (QWB-SA) [9]. Most CUA were made by calculating QALYs. Some models [11, 12, 19, 20, 22,23,24,25,26,27] also took ICERs into account and thus could perform incremental analyses.

Main data sources for complications

All models reported some main data sources used to develop the health states of complications. The data commonly used to develop macrovascular complications included the Framingham datasets [20, 27] and the UKPDS [9, 10, 12, 19, 21,22,23, 27, 28]. For microvascular complications, the data sources were more complicated, and the commonly used sources were the Wisconsin Epidemiological Study of Diabetic Retinopathy (WESDR) [20, 27] and the UKPDS [27]. More than half of the models applied multiple data sources for each complication, while the remaining models only contained one or two data resources (Table 6).

Table 6 Summary of main data sources for diabetic complications

Model validation

Eleven of fourteen primary studies reported that one or more validation checks had been performed. Four studies [10, 24, 26, 28] presented model face validation, eleven studies [9, 10, 19,20,21, 23,24,25,26,27,28] presented internal validation, ten studies [10, 19,20,21, 23,24,25,26,27,28] presented external validation, while cross-validation was conducted by three studies [24, 25, 28]. However, none of the 14 studies demonstrated predictive validation. Primary studies using the DCEM, ODEM, and Sheffield models did not report aspects of model validation (Table 7).

Table 7 Summary of model validation (data only extracted from 14 primary citations: for baseline cases)

Model uncertainty

Eleven models [9,10,11,12, 20,21,22,23, 25, 27, 28] were able to deal with model uncertainty, which was described in varying levels of detail in the primary studies. One-way sensitivity analysis was run in the Cardiff, DCEM, ODEM, and UKPDS-OM2 models. Based on 14 primary studies, none of the models reported a multi-way sensitivity analysis. Probabilistic sensitivity analysis (PSA) capabilities were reported by 9 models (NIDDM, DCEM, CDM, UKPDS-OM1/2, Michigan, Sheffield, IHE, COMT). Five models [9, 20, 25, 27, 28] used the Monte Carlo technique for PSA, while three models [12, 21, 27] used the nonparametric bootstrap method. Only 3 model [23, 27, 28] clearly indicated whether first-order or second-order uncertainty was performed (Table 8).

Table 8 Summary of model uncertainty (data only extracted from 14 primary citations: for baseline cases)

Model quality

In accordance with the checklist from Philips et al. [17], the percentage of fulfilled criteria was unequally distributed across studies and dimensions of quality (model structure, data, and consistency). Overall, 45% of the criteria were met, 26% were not met, and 29% were not applicable in the 14 primary studies. Figure 2 shows that on average across all included studies, model structure ranked the highest, with 65% of criteria for quality being met, followed by model consistency (43%) and model data (32%) (Tables 9, 10, and 11).

Fig. 2
figure 2

Quality of modeling studies according to the Phillips checklist. Legend: A “yes” answer was assigned if a criterion was fulfilled. A “No” answer was assigned to criteria that were not fulfilled. NA indicates not applicable

Table 9 Philips checklist results
Table 10 Philips checklist results
Table 11 Philips checklist results

Discussion

Our systematic review included 140 studies describing 14 decision models in T2DM. We extracted data from the primary studies for each model, and the remaining 126 studies were identified as secondary studies (Supplementary material Appendix 2). We found that there were fairly mature modeling technologies and relatively fixed model structures for existing decision models for T2DM. Overall, the 13 identified models (except for the Archimedes model) divided the disease into discrete health states, followed by establishing Markov chains or risk equations to simulate the lifelong course of the disease. However, the review of these studies showed that the existing T2DM models still had certain limitations in terms of quality and extrapolation.

Previous systematic reviews of T2DM models [29,30,31,32] have focused more on model outputs than on their capabilities. However, the primary focus of this systematic review was the capabilities of these models. Based on the characteristics of each model, we briefly summarized the more suitable models for different study demands as follows:

  1. 1.

    If a study focused on simulating the trajectory of T2DM and/or diabetic macrovascular complications (e.g., cardiovascular disease, angina, myocardial infarction, or cardiac arrest), the best choice is the DCEM model.

  2. 2.

    If the study focused on simulating the trajectory of T2DM and/or diabetic microvascular complications (e.g., retinopathy and/or nephropathy), the best choices are the NIDDM model or the Michigan model. It is worth noting that the NIDDM model was the first diabetes model and it is rarely used now, but it is still of great value in the development of diabetes models. Many current models were constructed based on the NIDDM model.

  3. 3.

    If the objective is to conduct a comprehensive study of the trajectory of T2DM and its various complications, the best choices are the CDM model, the UKPDS OM1/2 model, the IHE model, the ODEM model, the Cardiff model, the Sheffield model, CDS model, COMT model, or the ECHO model.

  4. 4.

    If the objective is to simulate a continuous trajectory of diabetes and its complications, the Archimedes model is the best choice.

  5. 5.

    If the study is aimed at Chinese and Asian populations, it is recommended to use the COMT model.

  6. 6.

    If the study focuses on risk factors, the UKDPS-OM1 or UKDPS-OM2 models can be considered for simulation.

  7. 7.

    To evaluate T2DM interventions where hundreds of simulations are routinely required (e.g., given multiple indications and treatment comparators and the need for extensive sensitivity analysis), the IHE model can be considered first, because the run times for the IHE model were short when compared to most T2DM microsimulation models.

In this systematic review, the 14 identified models were rather heterogeneous in terms of model structures, the main data sources used by models, and model uncertainty.

We observed that most model structures were composed of discrete health states, and each discrete state was simulated annually through transition probabilities. However, the Archimedes model applied a comprehensive approach to model structure by simulating the disease at the organ level; it has no clear-cut health states. The level of detail in the classification of health states was different between models, and not all models had a clear definition of each health state it contained. However, the desired level of complexity must be balanced with the required transparency. Despite variations in model structure and scope, there should be a reasonably clear consensus of what broad categories of health states should be considered in the same type of T2DM models.

Many of the data sources used in model development are older data sets, such as the UKPDS and Framingham datasets; this limitation also exists in T1DM models. Although this limitation is well known, these data sources are currently recognized as the best available sources for modeling. This review also found that most of the data inputted to models were based on European populations; only 1 of the 14 models was developed based on Asian population data (the COMT model). However, in the era of real-world evidence, with an increasing availability of registry data from clinical practice settings, model validation incorporating modern T2DM epidemiological data into disease progression equations for simulation will be important. The development of this technology may resolve the impacts of limitations on model simulation.

The level of description of model uncertainty varied among the included studies, and there is a lack of standardized terminology regarding model uncertainty in these studies. This may hinder the understanding of what has actually been carried out. For example, in studies conducting Monte Carlo simulation or PSA, it was not always clear whether the report considered first- or second-order uncertainty. This should be noted because many health technology assessment (HTA) agencies demand that second-order uncertainty be captured in PSA. However, it does require multiple and complex computer calculations to solve second-order uncertainty through the PSA of the microsimulation models. This may be why some studies have not clearly stated their uncertainty.

Although a rigorous systematic review was undertaken to identify all relevant studies of decision models in T2DM, some limitations of this review should be acknowledged. First, the data were extracted mainly through the primary study for each model, rather than the latest study, which may cause some of the latest views on models to be ignored. In general, ICERs were also obtained when calculating QALYs to perform CUA. However, in model outcomes, 13 models reported QALYs, and only 10 of these models reported ICERs. This may be due to the lack of data from secondary studies. A similar review should be conducted on secondary studies of each model to provide a more comprehensive evaluation of the included models. Second, models with a limited focus on particular sub-components of T2DM were excluded. Models focused on particular sub-components of T2DM may provide a more meticulous and complex simulation method. However, these models only involved specific components of T2DM, which may lead to failure to consider the connection of the various components of diabetes in modeling. Finally, the assessment of study quality may be biased, as some studies were not described in full detail because of word limits for publications.

Conclusion

We conducted a comprehensive systematic review focusing on capabilities of the existing decision models for T2DM, and briefly summarized the more suitable models for different study demands. It is necessary to use decision models to simulate the lifelong course of diseases, especially for chronic diseases, to evaluate whether new technologies or interventions have values. A general conclusion from the review is that the existing decision models for T2DM were rather heterogeneous on the level of detail in the classification of health states. Thus, more attention should be focused on balancing the desired level of complexity against the required level of transparency in the development of T2DM decision models. Furthermore, we should consider including secondary studies for a more comprehensive systematic review.

Registration

This systematic review was registered in the PROSPERO database (CRD42020171838).https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42020171838