Background

Diabetes mellitus (DM), characterised by increased blood glucose concentration, is a major health care challenge: more than 415 million adults (20–79 years old), or 8.8% in the given age group, were living with this chronic condition in 2015 according to the latest estimates of the International Diabetes Federation (IDF). This figure, even by conservative projections, will reach 642 million in 2040 [1]. While there is a spectrum of metabolic disorders under the label ‘diabetes’, the majority of cases (90 to 95% [2]) may be classified as type 2 diabetes mellitus (T2DM).

T2DM is a chronic condition that may often remain undiagnosed over several years. Medical conditions related to T2DM, such as retinopathy, often emerge before the time of the clinical diagnosis of T2DM [3, 4]. Moreover, final stage complications such as blindness, limb amputation, renal failure, stroke or myocardial infraction result in high health care expenditures and loss of healthy life years that put an unstringed burden on a patient’s health and the health care system in general [5].

Computer simulation models are often employed to evaluate the clinical and economic effectiveness of various interventions that are considered for implementation or have already been implemented in a health care system [6,7,8]. Using computer simulation models allows addressing characteristics of different patient populations extensively under multiple independent conditions and treatments. Furthermore, models can simulate health outcomes over periods of more than 5–10 years, which is useful for chronic conditions when considering the long-term clinical and economic impact of different intervention scenarios [9]. Modelling assists clinicians or policy makers in the decision making process by synthesizing the existing evidence base in a transparent way with regard to complexity, variability and uncertainty of health and disease progression [9].

Diabetes modelling is a relatively recent technique in health economic analysis. The first publication of a fully integrated (including the full range of complications) economic model of type 1 diabetes mellitus (T1DM) was published in 1996 [10]. As shown in systematic reviews regarding this topic [11], the first influential work on T2DM modelling was the article ‘Model of Complications of NIDDM: I. Model construction and assumptions’ by Eastman et al. published in 1997 [12]. Now, computer simulation is recognized as a well-established method to harmonize and personalize the abundance of evidence on long-term effects of diabetes, which helps to answer ‘what if’ questions about treatment effects. Researchers intend using the models in the analysis of clinical and cost-effectiveness of different, mostly pharmacological, interventions [11, 13, 14].

Therefore, it is of utmost importance that there is confidence in the models to provide an accurate reflection of disease progression in real life [15, 16]. This is particularly true for chronic diseases such as T2DM, which develop over a long period of time and are associated with significant morbidity and mortality and a substantial burden to the health care system and to society.

Several ways of assessing and controlling for the quality of these models are in use, in which validation is a crucial part [15,16,17]. According to the recommendations given in the report of the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Good Research Practice Task Force, validation of models is categorized in three main groups: internal validation, between-model validation, and external validation [18]. The American Diabetes Association (ADA) released guidelines [9] to standardize the description and validation of diabetes models. They defined criteria that, if followed, would build confidence for a model to accurately perform its intended function: the steps that can be taken by model developers to ensure that others can reproduce the results and build confidence that the models are accurate, useful and reliable.

‘External validation’ refers to the ability of the model to accurately predict or replicate the results of studies that were not used to build the model [18, 19]. This ability of replicated external studies is appreciated as a model’s intended purpose [9]. Eddy et al. [19] gave the best practices of a formal process to conduct external validation. They suggested that modellers should make a description of the external validation process and its results available on request. Moreover, they should identify parts of the model that cannot be validated given a lack of suitable sources and describe how the related uncertainty was addressed.

The Mount Hood Challenge meetings are organized to promote validity and reliability in diabetes modelling [16]. The first meeting of health economic simulation modelling groups focusing on diabetes took place in 1999 and challenged only two simulation models [20]. Several of the later meetings were focusing on the external validation of models but only for models developed by researchers attending the meeting [15, 16]. However, the organisers stated that there was still no clear consensus on what precisely model validation means [15]: appropriate statistical approaches should be defined to assess correlation between model and clinical trial outcomes, and limits could be predefined for model accuracy and precision.

Several systematic literature reviews comparing and assessing the quality of diabetes models are available. In 2010, two reviews included DM models published before 2008 and focused on health-economic aspects of diabetes models, specifically, their use for pharmacological treatment evaluations [11, 13]. Three years later, a review by Charokopou et al. updated results by including studies published between 2008 and 2013 [14]. Moreover, a systematic health economic assessment of three models (the US Centers for Disease Control and Prevention (CDC-RTI) Diabetes Cost-effectiveness Model [21], the Quintiles IMS CORE Diabetes Model (CORE) [22], and the Archimedes model [23]) was conducted by Becker et al. in 2011 [24]. In 2016, Henriksson et al. [25] focused on economic models of T1DM and provided an overview of the characteristics and capabilities of available models. Also, Kirsch published a systematic review of Markov models evaluating multicomponent disease management programs for DM in 2015 [26].

In four of six literature reviews (Yi et al. [13], Becker et al. [24], Tarride et al. [11], Charokopou et al. [14]), the authors evaluated whether internal and external validation of the models was reported. Only in one review [24] did the authors appraise the validation approaches by applying the criteria to assess a models’ quality recommended by the panel of the ADA [9]. Becker et al. [24] found that while there were reports on extensive validations for all of the assessed models, results were not directly comparable due to different outcomes, studies or populations used. By focusing on the use of DM models in practice, these reviews may not have identified all published models in the field of T2DM per se and may not have identified all evidence associated with the validation of the identified models.

Despite the broad consensus about the general importance of the validation of models presented in the literature, there are to our knowledge no systematic reviews comparing the practice of validating diabetes models. Such a systematic review might serve to understand, improve and critically appraise current practices, in particular, in the context of the external validation.

The lack of extensive reviews in regards to validation of the models can be explained by the specific challenges faced in the field. The models are diverse in their structure and the underlying assumptions and use of inhomogeneous methods to conduct the validation and to report their results and respective methodology. Furthermore, most of the diabetes models are built on the same data sources (e.g. The United Kingdom Prospective Diabetes Study (UKPDS)) [11, 14, 25], or the availability of data sources is limited.

Study objective and rationale

The main objective of this systematic literature review is to identify and appraise the quality of approaches that are being used for the external validation of existing computer models covering the development and progression of T2DM in human populations.

We will review identified models with regard to the validation efforts to provide an overview of current practices reported in the literature and to find out if these practices are in line with the recommendations given elsewhere [9, 15,16,17].

We will summarise the findings of the studies by each model identified. The information from articles and reports on each model will be aggregated and compared with other models.

Methods

Protocol

This protocol adheres to the Preferred Reporting Items in Systematic Reviews and Meta-analyses (PRISMA) statement [27] and PRISMA for systematic review protocols (PRISMA-P) statement [28, 29]. The PRISMA-P checklist is given in Table 3 in the Appendix. The protocol is registered in the International Prospective Register of Systematic Reviews (PROSPERO) CRD42017069983.

Eligibility criteria

We will include studies reporting and describing the use of a computer simulation model with the characteristics given in Table 1.

Table 1 Eligibility criteria in study selection

We will exclude studies published before 1995 from our review because computer simulation of diabetes is a relatively new concept. Two previous systematic literature reviews [25, 26] also restricted the time frame of their searches to the year 1995: the authors justified the limitation by the fact that the first two relevant publications were published in 1996 [10] and in 1997 [12], respectively.

Information sources

We will search the following literature databases: MEDLINE (via NLM, PubMed), CENTRAL (via Wiley), EMBASE (via Ovid SP), EconLit (via EBSCOhost), Web of Science (via Thomson Reuters), PsycINFO (via Ovid SP), Scopus (via Elsevier) and NHS Economic Evaluation Database (NHS EED) (via Wiley).

To identify potential existing systematic reviews with regard to our research topic, we will also search the Cochrane Database of Systematic Reviews (CDSR) (via Wiley) and the Database of Abstracts of Reviews of Effects (DARE) (via Wiley).

We will also include grey literature databases in our search: ProQuest Dissertations & Theses Database (PQDT) (via ProQuest), System for Information on Grey Literature in Europe (OpenGrey) (via INIST/CNRS), The Directory of Open Access Repositories (OpenDOAR) (via CRC) and CINAHL (via EBSCOhost). In PQDT, the search will be limited to titles only since this database contains very unspecific abstracts. In OpenGrey and OpenDOAR, the results will be limited to the first 100 hits due to the Google Custom Search-based procedure. The grey literature will be concerned to widen the search scope and to capture models that were not highlighted in previous systematic literature reviews.

Besides references identified through databases and systematic reviews, we will include models published in the Mount Hood Challenge meetings proceedings [15, 16].

Search strategy

We composed the ‘primary’ bibliographic search strategy according to MEDLINE search rules, which will be later translated to other database syntaxes. We have applied the search strategy to the relevant references of studies found in the previous literature reviews and/or known in the research field to check for the search strategy’s sensitivity. This primary search strategy has been peer-reviewed using a structured checklist given in the PRESS Peer Review of Electronic Search Strategy: 2015 Guideline Statement [30]. The peer review assessment is given in Additional file 1.

The details of this search strategy are given in Table 2.

Table 2 Search strategy for MEDLINE (via PubMed)

Technical tools

The references will be collected and stored in the reference manager Mendeley using its cloud-based platform [31]. The abstract and full text screening will be carried out using the web application Rayyan [32]. During the data extraction, we will fill in Google-doc spreadsheet forms that are stored on Google’s cloud storage [33]. All information will be shared and be made available within the working group. The access will be restricted to a private mode.

Study selection

We will select data studies in several steps. First, we will identify all relevant studies by conducting the search in the scientific publication databases listed in the ‘Information sources’ section. We will remove duplicates to reduce the reviewers’ workload associated with the next steps.

Next, we will screen titles and abstracts of all references selected so far. Two tandems of researchers will independently review and include or exclude references guided by the eligibility criteria mentioned above. The title and abstract screening will be piloted first within the tandems independently and then between the tandems to adjust the procedure. Any disagreement and inconsistency in the final decisions will be recorded and resolved in the further discussion among all researchers involved in the screening.

Full text screening will be the final step within our study selection process. Two independent researchers will review full texts of the studies selected in the previous step to collect the final list of studies for data extraction. The decision on inclusion or exclusion will be made on the basis of the eligibility criteria. The full text screening will be piloted first to adjust the procedure. Any apparent discrepancies appearing during the full text screening will be resolved by a third, independent reviewer.

The screening process will be reported accurately and in sufficient detail—including the title and abstract screening and the full text screening—to complete a PRISMA flowchart [27]. The number of the excluded studies (with reasons for exclusion for those excluded from the full text screening step) will be recorded. Tables for ‘Characteristics of excluded studies’ and ‘Characteristics of included studies’ will be provided.

Additional data source selection

During the title/abstract screening, we will collect systematic literature reviews addressing diabetes modelling as a potential reference source of interest.

After the full text screening, we will export the reference list of all publications identified for the data extraction. Moreover, we will check forward citations of these publications by tracking them in different citation databases (Scopus, Web of Science). Also, we will apply PubMed’s ‘related articles’ feature for the same papers to collect the first 20 hits. The restricted number of hits is voluntarily chosen to reduce the workload. All references gathered in this step will be subject to the same steps as described in the ‘Study selection’ section.

All researchers who conducted the eligible studies or built the selected models will be contacted by e-mail for information on unpublished or ongoing studies related to the objectives of the systematic review.

We will send a written request to experts in the field through open and private communication channels (i.e. emails, subscription newsletters and professional boards) to check if a pre-final list of the identified models is complete and sufficient. If some undetected models are mentioned, we will include them in our review.

Data extraction

Two reviewers will independently extract data from all included studies after the full text review by filling in a predefined data extraction spreadsheet. The data extraction spreadsheet was first piloted. The data extraction table is provided in addition to the paper documents (Additional file 2).

Data to be extracted will be arranged into the following categories:

  1. 1.

    Model characteristics;

  2. 2.

    Types of validation involved;

  3. 3.

    External validation: definition, description and use;

  4. 4.

    Data sources used in the external validation: description and use;

  5. 5.

    Results of the external validation: methods and reporting formats;

  6. 6.

    Miscellaneous (conflict of interest and factors not considered).

The data extraction sheet ‘extraction form’ is provided in addition to the paper documents.

Authors of studies included in the full text review that do not provide sufficient information on the external validation in the retrieved articles will be contacted for further information.

Quality appraisal

The data extraction sheet contains questions and categories that are related to the quality appraisal of the validation. These parameters are based on the collaborative ISPOR, the Academy of Managed Care Pharmacy (AMCP) and the National Pharmaceutical Council (NPC) (ISPOR-AMCP-NPC) guidance on assessing the relevance and credibility of modelling studies [17], the ADA Guidelines for Computer Modelling of Diabetes and Its Complications [9], the Mount Hood Challenge reports [15, 16] and other materials discussing the validation of computer models in health care research. In line with our main objective, the assessment of studies will be limited to the credibility section of the questionnaire. If information is not provided, we will assign ‘Not Applicable’, ‘Not Reported,’ ‘Not Enough Information’ or ‘Not Enough Training’ to the corresponding questions/categories.

Data synthesis

Since the models are different and the external validation approaches are largely diverse, we will concise results in a narrative descriptive analysis summarizing the definitions and approaches used in different models: data sources, techniques or methods used. In the end, we will provide a summary of what has been done in the external validation of diabetes models up to now.

Discussion

This protocol describes objectives, methods and steps of an upcoming systematic review that will include studies on external validation of simulation-based computer models designed to represent T2DM incidence and progression in humans. To the best of our knowledge, this contribution to the scientific community is novel since no publications are specially targeting the external validation of T2DM models systematically. The objectives are to identify and define the approaches that are being used for external validation of the T2DM models, appraise the quality of the validation procedures and their reporting and summarize findings to provide a broad overview of the current practice in the field.