FormalPara Key Points

Model users can accept health-economic (HE) decision models as valid without further examination, thereby reducing model confidence, or they can validate models themselves, implying overlap with the validation efforts of the modelling team. Existing modelling and validation guidelines give little guidance in setting priorities for validation, nor do they address the issue of overlapping work by model developers and users.

Assessment of the Validation Status of Health-Economic decision models (AdViSHE) allows model developers to provide model users with structured information regarding the validation status of their HE decision model. Its main purpose is to avoid some of the current overlap in validation efforts and provide information on a list of priority validation items, selected by a Delphi consensus process. AdViSHE can be used to reproduce stated results and guide complementary validation efforts, which is expected to increase model users’ understanding of, and confidence in, the model and its outcomes.

1 Introduction

The use of health-economic (HE) decision models can have extensive consequences for payers, patients, and practitioners alike. Since HE models have become a fixed part of the modern decision making process in healthcare policy [1], they should be validated before they are used. This is commonly done by the modelling team and sometimes extensively so. Since the cost of model validation can be significant, both in time and money [2], model users, that is, people using the outcomes of the model, such as reimbursement decision makers, could simply presume that the models are valid without further examination. However, this unquestioning acceptance may reduce the users’ overall confidence in the model, especially when the modelling team has an economic interest in favourable outcomes [3]. Model users therefore often validate models themselves, leading to a possibly improved validation status of the model but also an overlap of work between the modelling team and model users.

We are thus presented with a trade-off between building confidence in the model and the use of scarce resources. Several guidelines and publications address model validity and quality assessment, both for simulation models in general [46] and for HE decision models in particular [3, 711]. However, they do not address the trade-off referred to above or support modellers in setting validation priorities. A prioritized list of validation efforts with the general support of the research community may reduce possible waste of resources while improving the general validation status of HE models.

The aim of this study was therefore to create a practical tool for model developers to fill in during or shortly after model development. This tool provides model users with a structured view into the validation status of the model, according to a consensus on what good model validation entails. The tool may also provide guidance towards additional validation. This tool, called Assessment of the Validation Status of Health-Economic decision models (AdViSHE), may, for example, be part of dossiers sent to the (national) decision maker when applying for reimbursement, or it may be appended to manuscripts on modelling applications to support peer reviewers.

2 Methods

We defined validation as the act of evaluating whether a model is a proper and sufficient representation of the system it is intended to represent, in view of a specific application. Here, “proper” means that the model is in accordance with what is known about the system, and “sufficient” means that the results can serve as a solid basis for decision making [12].

2.1 Initial List

A literature search generated an initial gross list of validation techniques. Explicit attention was given to the inclusion of validation practices from outside the HE literature. Precise definitions were formulated to avoid confusion between terms that may be used interchangeably in daily practice.

2.2 Expert Input

In five e-mail rounds, HE experts commented on the initial list and drafts of AdViSHE. The setup of these rounds was based on the Delphi method, a structured communication technique in which experts answer questions in two or more rounds. The key element is that experts are encouraged to revise their earlier answers in light of the replies of other members of their panel in order to reach consensus [1315]. The design of each round was not fixed beforehand, but was based on the outcomes of the previous round. A summary of the commentary from previous rounds was provided and every participant was actively encouraged to comment and provide suggestions for additions; all experts were allowed to refine or change their opinion. Steps were taken to include a wide variety of nationalities, work environments, and expertise (Table 1). In between rounds, new experts were approached to enhance international diversity and to counter attrition.

Table 1 Source of contact information of health-economic experts

Comments on an early draft of AdViSHE were solicited from employees of Zorginstituut Nederland (the Dutch Healthcare Institute), the primary advisory council for the Dutch Ministry of Health regarding reimbursement. Zorginstituut Nederland is representative of the field of policy decision makers for whom AdViSHE might be useful.

A conference workshop was organized in Montreal, Canada, where attendees discussed the first full draft of the tool amongst themselves. Three of the authors (PV, GVV, ICR) actively approached groups of discussants. All participants were encouraged to comment using a questionnaire. All comments made during this workshop were collected and incorporated in the final draft, which was sent out to the Delphi panel in a final round of comments. It was then edited for language, after which the project group agreed on the final version of AdViSHE.

2.3 Case Studies

The applicability of AdViSHE was tested by applying it to two case studies. Both were HE decision models in which the study authors were involved. The first model was built specifically for a Diagnostic Assessment Report commissioned by the National Institute for Health Research Health Technology Assessment Programme [16]. It was programmed in Microsoft Excel and assessed devices used to assist with the diagnosis, management, and monitoring of haemostasis disorders. The second one, with a multi-use design and programmed in Wolfram Mathematica, is a dynamic Dutch population model of chronic obstructive pulmonary disease (COPD) progression [17]. AdViSHE was filled in using knowledge of the models and their development. The focus of this exercise was to identify any problems with AdViSHE that a model developer might encounter when applying it to a model.

3 Results

3.1 Building AdViSHE

The process of building AdViSHE is depicted in Fig. 1 and additional information is given in the online supplementary appendix (see electronic supplementary material, online resource 1). The literature search yielded 35 validation techniques [46, 9, 10, 1822], which were then divided into groups covering all aspects of model validation (Fig. 2). The Delphi panel ran from June 2013 until September 2014. Background information on the HE experts can be found in Table 2. The questions raised in each round are presented in Fig. 3.

Fig. 1
figure 1

Building the validation-assessment tool. Grey boxes display work by the project team; white boxes display input from outside sources. 1High non-response since the invitations were sent out to a very wide range of people with the aim of selecting a suitable panel; see Table 1. AdViSHE Assessment of the Validation Status of Health-Economic decision models

Fig. 2
figure 2

Typology of validation techniques, based on [4]

Table 2 Background information of participants who answered during at least one of the five Delphi rounds
Fig. 3
figure 3

HE expert questions. AdViSHE Assessment of the Validation Status of Health-Economic decision models, HE health-economic. 1 ISPOR International Society For Pharmacoeconomics and Outcomes Research

In the pilot and first rounds, every respondent could comment on the full initial list, but was asked to at least comment on the techniques grouped within two of the four groups. Based on these rounds, nine techniques were dropped from the initial list, since they were deemed unimportant by the panel (Figs. 1, 3). Nine new techniques proposed by panellists were included. Several items were reformulated and combined.

To limit the burden, each panellist in the second and third rounds received a subset of five validation techniques to comment on, using a factorial design. The purpose was to improve the definitions of the techniques, make necessary clarifications, and hold an open discussion on the usefulness of each item. Contrary to the first round, no quantitative scoring was performed in these rounds. Based on the first draft of AdViSHE built after the third round, Zorginstituut Nederland suggested that the investigation of outliers, which was excluded in a previous round due to an average “importance” of 3.8 (below 4) was explicitly mentioned in AdViSHE. The Delphi panel was asked to comment on the amended first draft.

The conference workshop also discussed the first draft. It was attended by approximately 50 participants; 19 filled-in questionnaires were returned. Three workshop participants indicated that they were also members of the Delphi panel.

The second draft was based on comments from the fourth round and the workshop. Based on the workshop, a final question was added, asking whether modellers have performed any validation techniques not covered in AdViSHE. The fifth Delphi round yielded no further substantial comments. The project team finalized the tool in October 2014.

3.2 Final Version

The final version of AdViSHE consists of 13 questions (Fig. 4). All questions are grouped to cover its various aspects: the conceptual model, the input data, the implemented software program, and the model outcomes (Fig. 2). The tool is designed to be filled in by modellers to report in a structured way on the efforts performed to improve the validation status of their HE decision model and the outcomes of these efforts. The information required to fill AdViSHE in is often available in other places, but is not collected systematically.

Fig. 4
figure 4figure 4figure 4figure 4

AdViSHE: Assessment of the Validation Status of Health-Economic decision models. 1 ISPOR International Society For Pharmacoeconomics and Outcomes Research, SMDM Society for Medical Decision making, CHEERS Consolidated Health Economic Evaluation Reporting Standards

3.3 Application to Case Studies

Filling in AdViSHE took a little over 1 h for each model. No further issues with the formulation, structure or usability of AdViSHE were found. It was noted that filling in AdViSHE is best done during model development or soon after.

4 Discussion

4.1 Application

The validation-assessment tool AdViSHE allows the developers to provide the users with structured information regarding the validation status of their HE decision model. Its main purpose is to reduce the workload of model users and avoid some of the current overlap in validation efforts, thus saving resources. It does so by reporting which validation efforts have been undertaken in a structured way, and giving the results of these efforts. AdViSHE is not intended to replace validation by model users, but rather to reproduce stated results and guide complementary validation efforts. By doing so, it is expected that the model users will gain a greater understanding of, and confidence in, the model and its outcomes.

AdViSHE can be particularly useful for decision makers who have to evaluate a reimbursement dossier. In that regard, the UK stands out internationally by providing independent experts with an 8-week window to validate HE decision models [23]; other jurisdictions have much shorter timelines. In the Netherlands, for instance, Zorginstituut Nederland has 3 weeks to comment on an HE model and its outcomes before the reimbursement submission is send to the assessment committee (Wetenschappelijke adviesraad, WAR) [24]. Since manufacturer submissions rarely report on model validity, this often has to be assessed independently. Due to time and money constraints, model validation is sometimes insufficient to establish confidence among users. To that end, inclusion of AdViSHE in the reimbursement process could improve this process, in particular, because it reports on validation criteria based on consensus.

Participants in both the Delphi panel and the workshop specifically asked for a short, quick-scan version. AdViSHE could serve that purpose, using the answer options “Yes” or “No” for each of the questions. This checklist could be useful during the modelling process to ascertain whether all important validation efforts have been considered. It could also be useful during the process of research dissemination, by accompanying academic articles or conference presentations.

AdViSHE gives neither a validity score, nor a threshold for one. There are several reasons for this choice: a model may receive a passing score and yet have a defect that needs to be corrected; the subjective nature of this approach tends to be hidden so the assessment appears to be objective; scores may cause overconfidence in a model; and scores can be used to argue that one model is better than another [4]. Just as models must be tailored to an application, validation efforts must be tailored to a specific model. Therefore, no a priori “red flag” or “must do” labels have been defined for AdViSHE. A validation effort that a model user deems indispensable for one application may not be considered necessary for another.

4.2 Methodology

The methodology used in this study is not a Delphi panel in a strict sense. In a Delphi panel, a group of experts, usually small, is given one specific question to answer. Each participant is free to request additional data, which is then shared with the rest, along with their opinions [1315]. In our study, no single question could be posed in a meaningful way. We therefore recruited a relatively large group of respondents, each being asked to answer a subset of questions. Thereby we mimicked the Delphi method as closely as possible. In some rounds, we applied a factorial design to reduce the number of questions presented to each respondent while keeping several respondents for each question. The added value of interaction between respondents that the Delphi method provides was explicitly incorporated.

AdViSHE assumes that the modelling process is performed with generally accepted modelling and reporting techniques. This could mean that the model builders are adhering to the International Society For Pharmacoeconomics and Outcomes Research (ISPOR)—Society for Medical Decision Making (SMDM) Modeling Good Research Practices and to the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) statement [1, 11]. AdViSHE does not evaluate the model itself, nor the building process.

The intention of our study was to obtain a useful tool that promotes the validation status of HE models by putting resources to good use and providing a consensus on the reporting on this effort. Close to 100 people have actively participated and discussed (parts of) AdViSHE. This large, diverse group of respondents was one of the biggest strengths of our study. In our Delphi panel, we have representatives of many different geographical regions and working fields relevant to HE modelling. Their input gave useful information both of a qualitative and a quantitative nature, and our design allowed for the suggestion of other methods not yet included in the list. The workload of the Delphi panel was relatively low due to the factorial design in the second and third rounds. Although this meant that participants did not comment on all validation techniques, it did keep participants interested.

One of the limitations of our study was that the time path to consensus was more than a year and it was labour intensive to filter all information after each round. Since this filtering of information was to some extent subjective, there is no complete certainty that consensus was unanimous, although the reactions to the full drafts were positive. A final limitation of the study is that the original search for validation techniques was not based on a systematic review of the literature. We started with a list of ten guidelines from inside and outside the HE field and listed all techniques mentioned in these guidelines. By allowing the Delphi panel and the workshop participants to add techniques they considered useful, we have used an alternative approach to sufficiently guarantee inclusion of all relevant techniques on the list of items considered.

4.3 Comparison to Other Tools

Several tools that have been published in the past few years deal with the quality assessment of HE decision models [3, 8]. Others deal with the quality of reporting of HE decision models [7, 9, 11]. Only one of these recent tools refers explicitly to validation, namely that of Caro et al., which briefly discusses validation as a part of the tool’s overall “credibility” [3]. The Drummond and Jefferson [7] and Consensus on Health Economic Criteria (CHEC) [8] checklists, and the Philips framework [9] were intended to be filled in by modellers to help them in the model development process, although it is implied that model users can fill them into evaluate models [7, 9]. The CHEERS checklist was built to be used by both model developers and model users, in particular, editors and peer reviewers evaluating publication potential of economic evaluations [11]. The checklist by Caro et al. was specially built to be filled in by model users [3]. Using these tools for their intended purpose will hence often add to the workload of model users and may overlap with work already done by the developers. In addition to the mentioned checklists, AdViSHE was specifically intended to be filled in by modellers, while its outcome is immediately useful to model users.

There are also several tools that deal with model validation for simulation models in general [46]. However, these present ideals rather than a priority list of feasible acceptability criteria. In addition, most recommendations are necessarily general and not geared towards validating HE models [46]. The limited number of validation techniques in AdViSHE is a consensus between what is feasible and what is necessary in HE modelling. For specific applications, additional items may of course be very important, which can be reported in the last part of AdViSHE. As a priority list, AdViSHE thus supplements existing tools and guidelines with different purposes.

Despite efforts to make the evidence as objective as possible, the judgment of model validity (and confidence) will ultimately be subjective. It is therefore of paramount importance that model users can make their own assessment. AdViSHE makes this possible in an efficient way: it asks not only which validation aspects were tested but also how they were tested and where the outcomes are reported. Other tools just provide general suggestions for which aspects should be discussed. The exception is the CHEERS checklist, which also asks specifically where certain items are reported [3, 79, 11].

4.4 Terminology

There is little if any consensus on terminology in the validation literature [25], even in the field of HE. The problem of ambiguity is exacerbated by the different meanings of the same terms in computer science and psychometrics. For example, conceptual model validation is sometimes called content validity [9, 18], but in psychometrics, this term indicates whether a measure represents all facets of a given social construct. Computerized model validity is sometimes called verification, internal validity, internal consistency, technical validity, and debugging; moreover, all of these terms have additional and divergent meanings. Notably, internal validity was interpreted differently by several members of the Delphi panel.

In AdViSHE, we have attempted to steer clear of terminology that may be considered confusing. We present a lucid overview of possible techniques, with clear definitions, to be used in the validation of HE decision models. This explains the discrepancy between our terms and the classification of validation types by the recent ISPOR-SMDM Modeling Good Research Practices Task Force [10].

5 Conclusion

A validation-assessment tool for HE models called Assessment of the Validation Status of Health-Economic decision models (AdViSHE) has been developed to address the trade-off that model users potentially experience, between a loss of confidence resulting from lacking or unreported validation efforts, and an inefficient use of resources resulting from overlapping validation efforts by the modelling team and model users. In addition, it presents a certain consensus among model users and model developers on what is good validation. The tool is tailored for the validation of HE models through the involvement of a large group of HE experts, coming from many backgrounds and countries. In AdViSHE, model developers comment on the validation efforts performed while building the underlying HE decision model. This information can subsequently be applied by model users, such as people involved in decision making or peer reviewers, to establish whether confidence in the model is warranted or additional validation efforts should be undertaken. The tool thus reduces the overlap between the validation efforts of model developers and those of model users without leading to a loss of confidence in the model or its outcomes.