FormalPara Key Points for Decision Makers

Twenty-three models evaluating the cost effectiveness of treatments for venous leg ulcers were found within this systematic review.

The most common modelling approach was a Markov model, with studies predominantly taking an NHS/payer perspective and evaluating wound dressing technologies.

The reporting quality of models was appraised using the Philips Checklist. This found that most studies did not adequately report all aspects of the model used. Particularly, limited information was given surrounding the modelling techniques and structure.

1 Introduction

Venous leg ulcers (VLU) are long-lasting wounds of the lower leg. They are usually formed after an injury, with slow healing due to increased blood pressure in the leg veins [1]. Typically, ulcers take 3–4 months to heal with appropriate treatment [2]. VLUs often cause pain, malodour and fluid leakage [3]. These symptoms affect mobility, sleep and daily living, impairing quality of life [3].

VLUs affect 2% of over 80-year-olds in the UK and account for 60–85% of all leg ulcers [4]. The incidence and prevalence are increasing due to ageing populations and the global trend of obesity [5,6,7]. With their high prevalence and prolonged healing time, VLUs were estimated to cost the UK NHS £921 million in the price year 2012/2013 [8].

Treatment for VLUs involves wound care and sustained graduated compression therapy [4], with most of the direct costs of treatment attributable to community nurse visits [9]. There are also newly emerging advanced treatment modalities for VLUs. These include bioengineered tissue, electric stimulation and a wide variety of dressing variations [10]. A recent Cochrane systematic review concluded that there is low-certainty evidence for the clinical efficacy of novel dressings and topical agents in VLUs [4, 7, 11].

However, due to limited healthcare budgets and the increasing demand for services, it is now no longer enough to demonstrate a treatment as clinically effective, it also must be shown to be value for money, or cost effective [12]. Economic evaluations seek to address this question, comparing alternative courses of actions in terms of their costs and consequences, to determine which represents the most effective use of resources. Often such evaluations are conducted alongside clinical trials; however, the time horizons of such trials are usually dictated by the primary clinical outcome and therefore may end before all of the costs and outcomes, which are of economic interest, have been fully observed. Instead, decision-analytic models can be used [13]; such models do not have a restriction on the time horizon over which they can model, can take into account multiple treatment options, use data from multiple sources and perhaps most importantly, can assess the uncertainty surrounding a decision.

There are several studies published using decision-analytic models to conduct economic evaluations of treatments for VLUs. However, there have been no reviews of these studies to collate and critically appraise the methods used. Given this and the likelihood that newly emerging treatment modalities will need to be evaluated in the future, this review aimed to identify, evaluate and critically appraise all published model-based economic evaluations relating to the treatment of VLUs. In doing so, it is hoped that it will facilitate researchers developing future VLU models in the common methodologies used as well as areas that may require particular attention.

2 Methods

This systematic review was conducted and reported according to PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) guidelines [14]. The review has been registered on PROSPERO, the international prospective register of systematic reviews, under the registration number CRD42018102852.

2.1 Eligibility Criteria

Studies were considered eligible if they described a full economic evaluation using a decision-analytic model to evaluate any intervention for VLUs. A decision-analytic model was considered to be one that used mathematical techniques to define a series of possible consequences depending on alternative options [13, 15]. Where a study considered other wounds as well as VLU, it was made a condition that the results for the VLU population be presented separately. Patients of all ages, geographical locations and sexes were considered eligible for inclusion. Notably, studies detailing preventions for venous leg ulcers were excluded. In addition to the above constraints, only articles available in full text in English were considered due to resource limitations. All abstracts and reviews were excluded.

2.2 Search Strategy

A systematic search of electronic databases was conducted, from database inception until 21 May 2018, when the search was conducted. The search was applied to MEDLINE, Embase, Cumulative Index to Nursing and Allied Health Literature (CINAHL), Econlit, Web of Science and NHS Economic Evaluation Database (NHS EED). Database selection was structured from research and recommendations from Thielen et al. [16]. The search strategy itself was formed with the help of a biomedical information specialist, and was specialised to detect economic evaluations through the use of published search filters [17,18,19]. The full search strategy is available in the Electronic Supplementary Material 1 (Appendix 1), with search terms including ‘venous ulcer*’, ‘varicose ulcer*’ and ‘cost*’.

2.3 Study Selection

After duplicates were removed, study selection occurred in two stages and was performed independently by two reviewers (AL and EM). Initially, titles and abstracts were screened, with full texts then obtained for studies that appeared to meet the inclusion criteria. These full texts were then again screened against the inclusion criteria to assess eligibility for inclusion in the review, with reasons for exclusion noted down. All conflicts were resolved by discussion between the two reviewers. All eligible studies had their references screened to ensure that no papers had been missed from the search strategy.

2.4 Data Extraction

Data extraction was performed independently by two reviewers (AL and EM) using a standardised form. This form was piloted in an initial study to allow it to be refined. Data collected pertained to the study characteristics, details of the decision-analytic model and the results and conclusions of the study. Any disagreements were resolved by discussion, with a template of the data extraction form found in the Electronic Supplementary Material 1 (Appendix 2) and the populated data extraction form included within Electronic Supplementary Material 2.

2.5 Quality Assessment

Quality assessment was carried out using the Philips Checklist [20], an extensive checklist specifically designed for the assessment of modelling studies and recommended by both the National Institute for Health and Care Excellence (NICE) and the Cochrane Collaboration [21,22,23]. The checklist was completed independently by two reviewers (AL and EM), with any disagreements regarding answers resolved by discussion. Possible responses for the items on the checklist were ‘yes’, ‘no’, ‘not applicable’ (for items that were not relevant to the study) and ‘partial’ (for items that had multiple elements, of which only some were satisfactorily fulfilled by the study).

3 Results

The literature search returned a total of 2515 studies. After removing duplicate papers and initial screening of titles and abstracts, 130 full texts were retrieved and assessed for eligibility. 23 studies were included within the review. This study selection process is summarised in Fig. 1.

Fig. 1
figure 1

PRISMA diagram of study selection process

3.1 Study Characteristics

Table 1 summarises the included studies.

Table 1 Summary of characteristics of included studies

The most common intervention type across the studies was dressings for VLUs, the subject of 11 studies (48%) [24,25,26,27,28,29,30,31,32,33,34]. There were also four studies each focusing on compression bandaging [35,36,37,38] and extracellular matrices [39,40,41,42]. The other interventions included in papers in the review were electric stimulation therapies [43, 44], barrier creams [45] and pentoxifylline oral medication [46]. Comparators varied greatly across studies depending on intervention type, with ten studies using more than one comparator [24, 25, 29, 32, 33, 36,37,38, 41, 45]. Eight studies used ‘standard care’ as a comparator, with all of these papers giving a definition of what ‘standard care’ involved at some point in the text [27, 34, 39, 41, 43,44,45,46].

The publication year of these studies ranged from 1999 to 2018. Ten of the studies (43%) were published in the Journal of Wound Care [24,25,26,27, 35,36,37, 39, 43, 45], with no other journal responsible for more than one paper. There were 11 cost effectiveness analyses (CEAs) [26, 28,29,30,31,32,33, 35, 40,41,42] and three cost utility analyses (CUAs) [25, 36, 38], with the remaining nine studies a combination of both CEA and CUA [24, 27, 34, 37, 39, 43,44,45,46]. For the CEAs, the most common clinical outcome related to percentage of patients healed (12 outcomes) [26, 27, 30, 32, 33, 37, 39, 41,42,43,44,45], with time taken for the ulcer to heal used as an outcome in ten studies [24, 25, 29, 31, 34,35,36, 41, 42, 46], and reduction in wound area size used twice [28, 29]. It should be noted that whilst these outcome measures may be clinically meaningful within VLUs, they do not have a willingness-to-pay threshold attached to them [such as with the quality-adjusted life-year (QALY)], and they do not facilitate comparison with other economic evaluations conducted within other disease areas.

For the CUAs, QALYs were the only measurement of utility used. Nine decision-analytic models sourced utilities for their QALY measurements from the 2007 study by Clegg and Guest [25, 27, 34, 36, 37, 39, 43,44,45]. This paper used standard gamble methodology on a sample of 200 participants, including both the general population and VLU patients, to allow for utilities to be assigned to different VLU states [44]. The remaining three QALY measurements obtained utility values from randomised controlled trials that used the EuroQol 5D and/or Short Form 6D questionnaires [24, 38, 46].

The payer perspective was taken in 18 papers [24,25,26,27,28, 30, 31, 34, 36,37,38,39, 41,42,43,44,45,46], with only two studies (9%) stating that they had taken a societal perspective [29, 40]. Three papers did not state the perspective used [32, 33, 35].

Markov models were the most common decision-analytical approach, being used in 12 (52%) studies [24, 28, 29, 34, 35, 38, 39, 41,42,43,44, 46]. The most common time cycle length used for these models was 1 week with a 1-year time horizon, found in five studies [24, 34, 35, 41, 46]. Other time horizons varied greatly, from 14 days [28] to 12 years [38]. Decision-tree models were used in five studies [27, 30, 31, 36, 45], and the remaining six studies did not explicitly state their model type [25, 26, 32, 33, 37, 40]. For these six studies, no figure was included to describe the model structure, making it difficult to appreciate the type of model used. Transition probabilities for models were described in 12 of the 23 studies (52%), with two studies stating that probabilities could be “obtained directly from the corresponding author” [39, 43]. Nine studies did not state any of the transition probabilities used in their models [25, 26, 31,32,33, 35,36,37, 45].

Of the 23 studies included in the review, 22 reported that the intervention was dominant. The one study without a dominant intervention, evaluating a skin substitute (Apligraf), had an incremental cost effectiveness ratio (ICER) of $14 per ulcer-day avoided [40]. Almost all studies (91%) in the review were funded by medical companies, other than the studies by Ashby et al. (HTA programme fund) [38] and by Iglesias and Claxton (funded by the University of York) [46].

3.2 Quality Assessment

The reporting and methodological quality of studies was evaluated using the Philips Checklist [20]. The responses given for each study are included in Electronic Supplementary Material 1 (Appendix 3). To allow for analysis of these results, a measurement system calculated the percentage of criteria fulfilled. A yes (Y) response counted as one point, a no (N) response as no points, a partial (P) response as half of a point, and criteria given an ‘N/A’ response were discounted from the calculation. Although this scoring system could be criticised for assuming equal weighting to all criteria, it allows for an estimate of number of items fulfilled. The scores from this ranged from 27% up to 89%, with the scores for included studies found in Electronic Supplementary Material 1 (Appendix 3).

The following discussion of study quality follows the three domains of the Philips Checklist, ‘model structure’, ‘data’ and ‘uncertainty and consistency’.

3.2.1 Model Structure

Selecting an appropriate model structure is key for decision-analytic models. Appropriate models should best reflect the disease process, relevant healthcare system and available evidence for the decision problem [47].

In the majority of studies, relatively little explanation was given for the selection of the type and structure of decision-analytical model. Only two studies (9%) gave full evidence as to how the model structure was formed [38, 39], whilst other possible model structures were considered by only one study [39]. Model structures should align with the biological and clinical theory of the condition studied, and should be driven by the study question rather than data availability [20, 48]. By not detailing the rationale for the model structure used, it cannot be determined whether the model was the best fit for the study objective, or whether it was chosen due to data limitations. This reduces the perceived quality of the study.

The Markov models used in the included studies often had a simplistic structure. Commonly, the following states were used, ‘healed’, ‘unhealed’ and ‘dead’ [38, 41, 46], whilst some studies included more detailed states such as ‘improved’, ‘worsened’ and ‘complications’ [24, 44]. The cycle lengths used in the Markov models ranged from 1 day to 1 month, with this length mostly determined by the frequency of outpatient appointments. This is despite it being recommended that cycle length is based on the natural history of disease and not local treatment patterns [49].

Thirteen studies (57%) were considered to have a time horizon that was insufficient to reflect important differences between the options [25,26,27, 29, 30, 32, 33, 36, 37, 40, 43,44,45]. Although most VLUs are cured within 3–4 months, treatment can go on to take much longer [2]. The time horizon was often determined from available trial data [28, 29, 39, 42,43,44], with no attempts to extrapolate this data to a more suitable economic time horizon. Guidelines suggest that a lifetime horizon is considered optimal for all decision-analytic models to ensure that no associated costs or benefits are missed [13, 50]. No study used a lifetime horizon within this systematic review and only four papers (17%) fully justified using a shorter time horizon [28, 29, 38, 46].

Only two papers (9%) adopted a societal perspective [29, 40]. Societal perspectives are recommended, as narrower perspectives may lead to important costs and benefits being ignored [13, 50, 51]. Of these two studies, Scanlon et al. did not include any indirect costs despite reporting the use of a societal perspective, reasoning that “due to the relatively high average age of people with venous leg ulcers, no costs of lost workdays were applied” [29]. Whilst Sibbald et al. included loss of work as the only indirect cost considered [40], excluding other potential indirect costs such as patient transport or time lost from daily activities. This review found an absence of a true societal perspective when evaluating the cost effectiveness of VLU treatments.

Half-cycle corrections were not used by any papers using a Markov model within this review. Reasoning for their exclusion was only given by Nherera et al., who stated “for the study model, the cycle length was considered to be small enough not to require a half-cycle correction” [34]. Whilst half-cycle correction is a criteria on the Philips Checklist, the need for such a correction is contested [52, 53].

3.2.2 Data

An important strength of model-based economic evaluations in comparison with trial-based evaluations is that they allow authors to source all relevant evidence to help answer the study objective [13, 50]. Despite this, only nine (39%) of the studies used systematic literature reviews to inform model parameters [24, 26, 29, 30, 32,33,34, 38, 41]. The details of what these reviews involved and the results from them were embedded within the studies, with no literature reviews published separately. The remaining studies used single sources of data, with nine models using randomised controlled trials for data inputs [28, 31, 35, 39, 40, 42,43,44, 46] and five using The Health Improvement Network (THIN) database [25, 27, 36, 37, 45]. There is potential when using only a single source of data that the generalisability of the results is reduced.

Expert panels were used in 12 (52%) of the studies [26, 29, 30, 32,33,34, 37, 39,40,41,42, 44]. These panels mostly consisted of healthcare professionals whose estimations were used to compensate for gaps in research data. Whilst the use of expert opinion is acceptable, it was adjudged by Coyle and Lee to be at the bottom of the hierarchy for data sources [54], and should only be used as a last resort when no other data can be obtained [55]. Two studies did not detail the methods used in eliciting expert opinion or who the experts were [34, 41].

3.2.3 Uncertainty and Consistency

Using decision-analytic models in economic evaluations allows authors to assess and analyse the uncertainty associated with decision making [13, 49, 50]. The Philips Checklist divides uncertainty into four dimensions: methodological, model structure, heterogeneity and parameter uncertainty.

Uncertainty regarding methodology, model structure and heterogeneity were poorly assessed by the studies included in the review. Methodological uncertainties were evaluated by only five studies (22%) [29, 35, 38, 42, 46]. These studies ran their sensitivity analyses with different weighting of data used [46], different time horizons [29] and with different clinical scenarios [35]. Structural uncertainties were not addressed by any studies in the review. Heterogeneity was assessed by only two studies (9%), which examined populations from a different country [30] and with a different type of VLU [27].

Parameter was the most commonly addressed dimension of uncertainty, which was completed by 16 studies (70%) [25, 27, 29,30,31, 34, 36,37,38,39, 41,42,43,44,45,46]. Sensitivity analyses were run on parameter estimates including resource use, unit costs and clinical outcomes. It is suggested that parameter uncertainty should be tested using probabilistic analysis [20], which was performed by six out of the 16 papers (38%) [27, 30, 34, 38, 41, 46]. The majority of studies instead conducted univariate deterministic analysis [25, 27, 29, 30, 36, 37, 39, 42, 44, 45]. Univariate analysis allows for the impact of individual parameters to be assessed but is not recommended for testing uncertainty and should instead be used in the model development process [50]. It is also recommended that when testing uncertainty, the range of values for a parameter should be well justified [20]; however, many models used arbitrary percentage variation without reasoning (e.g. ± 20% base case) in their sensitivity analysis [25, 27, 29, 30, 36, 37, 39, 42,43,44,45].

There were some flaws in how the results of the modelling studies were presented. Kerstein et al. [33] and Meaume and Gemmen [26] reported average cost-effectiveness ratios (ACERs) rather than ICERs, which are considered as the standard summary measures of economic evaluations [7, 50]. ACERs should not be used to report the results of analyses, as they assume that all outcomes are produced at equivalent cost and spread additional costs from an intervention over all previous outcomes, which is not the case [56, 57]. Negative ICERs were reported by Walzer et al. [24], despite describing the intervention (Sachet S) as dominant. It is unconventional to report negative ICERs, given that, presented without context, they do not show if an intervention offers a decreased cost for an increased benefit or an increased cost for a decreased benefit [50].

The 2012 study by Guest et al. [45] stated that cost effectiveness would be determined by the clinical outcomes of QALY gain and percentage of healed patients. However, after the results from the study showed that there were no differences in costs or clinical outcomes for all interventions, the study then reported on a previously unmentioned outcome (wound size reduction) as the reasoning behind recommending one of the interventions as the preferred strategy. Using post hoc outcome measures has the potential for bias, given that they may be selected on the basis of clinical or statistical significance of the observed results. The validity of this model is therefore reduced, with the potential to mislead decision makers.

4 Discussion

This systematic review has shown several common themes in published decision-analytic models for VLU treatment. The most common model types were Markov models, with new dressing technologies the most common intervention investigated. The majority of studies were funded by the health technology companies whose product was being evaluated, and were found to be the dominant treatment strategy.

The major themes when critically appraising the reporting and methodological quality of the decision-analytic models were the limited data sources and the limited length of time horizons over which interventions were evaluated. The most significant finding from this review was the large number of papers that reported little or no details of the modelling techniques or structure used for the study [25, 26, 32, 33, 37, 40]. The lack of reporting detail may impact on how usable the results of these studies are. Although it should be noted that these criticisms are not unique to VLU-based decision-analytic models. A systematic review of all decision models in UK health technology assessments by Cooper et al. in 2005, found that only 10% of studies across all disease areas suitably reported the process involved in developing the model structure [55].

This review is believed to be the first to focus on decision-analytic models for VLU treatment but has found similar results to other systematic reviews in the domain of wounds. A 2018 review by Cheng et al. reviewed all types of economic evaluations for chronic wounds [58]. The 12 decision-analytic models appraised in the Cheng review had similar methodological quality issues as the models included in this review, with short time horizons and few studies conducting analyses from a societal perspective. The researchers in this paper also criticised the absence of model validation, and models using single trials as their data source [58]. There were no decision-analytic models relating to VLU treatment in this study, meaning that none of the models evaluated in this review had been previously appraised by Cheng et al. Similar methodological shortcomings were commented on in a systematic review by Langer and Rogowski in 2009 [59]. This review focused on human-cell-derived wound care products for both VLUs and diabetic foot ulcers, and found that some of the models included had unsuitably short time horizons, used deterministic rather than probabilistic sensitivity analysis and methodological issues with ICER calculations and reporting [59].

There are several limitations of this review, which may have affected the results. The study selection only included published articles available in full text in English, meaning potentially relevant studies would have been omitted. The use of the Philips Checklist also has limitations. For example, the authors of the Philips Checklist acknowledge that the framework cannot guide appraisal on the suitability of individual model structures and structural assumptions themselves [20]. As well, many of the questions in the checklist require subjective interpretation, and whilst the error resulting from this would have been minimised by the use of two researchers, this may still be implicit within the results. Furthermore, the Philips Checklist was published in 2004, after a number of studies analysed in this review [26, 32, 33, 40, 42]. It may therefore be deemed inappropriate or unfair to judge such studies on criteria that were formed after their publication, given how modelling techniques and standards are likely to have changed over time. Word limitations in journals may also limit the detail authors can include in their studies regarding model structure. This should not be an issue for more modern papers, however, due to the increased likelihood of supplementary materials, but older papers may have been restricted when describing the methodology of their studies.

This review strengthens the calls of previous reviews for future decision-analytic models to improve their reporting quality [58, 59]. Despite systematic evidence by Cooper et al. showing that model-based evaluations are not being sufficiently described in studies regardless of disease area, reporting thoroughness does not appear to be improving with time [55], a finding echoed within this review. Future decision-analytic modelling studies should make use of the increasing ability to include supplementary online materials, as well as other transparency initiatives such as sharing of model code, to greater facilitate reporting clarity. There is also value in publishing the results of any systematic literature reviews conducted to identify the baseline data for modelling studies. This review highlights eight studies that performed their own systematic reviews to identify baseline data inputs [24, 26, 29, 30, 32, 33, 38, 41], with Nherera et al. the only paper to utilise a previously published review [34]. Since none of these studies published the methodology or results of their reviews separately, this may have resulted in a duplication of researcher effort, which could have been focused on other areas of the modelling process, such as model validation. There is also potential for future research to develop guidelines specifically for VLU-based models, similar to guidelines developed by the Mt Hood Diabetes Challenge Network [60]. A similar review focusing on trial-based economic evaluations for VLU treatments would also be of worth, as there is still a significant gap in the literature for this area. The review by Cheng et al. covered only three trial-based evaluations for VLUs [58], finding all three of these to be cost-saving, whilst a review by Weller et al. only focused on one treatment option (compression therapy), finding five relevant, within-trial studies [61].

5 Conclusion

Venous leg ulcers are an ever-increasing burden on the health of the ageing population and on the budget of healthcare providers. This review has shown that there is a sizeable number of decision-analytic model-based economic evaluations available to decision makers, but with a large variance of methodological quality across the studies. Common issues related to improper evidence for model structure, inadequate time horizons and limited data sources. These limitations in study methodology provide inferior quality evidence for decision makers to evaluate the cost effectiveness of potential interventions. By acknowledging and avoiding the shortcomings of the evaluations included within this review, future work should be of a higher reporting quality, thus providing better quality evidence to inform the cost effectiveness of different venous leg ulcer interventions.