The literature search returned a total of 2515 studies. After removing duplicate papers and initial screening of titles and abstracts, 130 full texts were retrieved and assessed for eligibility. 23 studies were included within the review. This study selection process is summarised in Fig. 1.
Table 1 summarises the included studies.
The most common intervention type across the studies was dressings for VLUs, the subject of 11 studies (48%) [24,25,26,27,28,29,30,31,32,33,34]. There were also four studies each focusing on compression bandaging [35,36,37,38] and extracellular matrices [39,40,41,42]. The other interventions included in papers in the review were electric stimulation therapies [43, 44], barrier creams  and pentoxifylline oral medication . Comparators varied greatly across studies depending on intervention type, with ten studies using more than one comparator [24, 25, 29, 32, 33, 36,37,38, 41, 45]. Eight studies used ‘standard care’ as a comparator, with all of these papers giving a definition of what ‘standard care’ involved at some point in the text [27, 34, 39, 41, 43,44,45,46].
The publication year of these studies ranged from 1999 to 2018. Ten of the studies (43%) were published in the Journal of Wound Care [24,25,26,27, 35,36,37, 39, 43, 45], with no other journal responsible for more than one paper. There were 11 cost effectiveness analyses (CEAs) [26, 28,29,30,31,32,33, 35, 40,41,42] and three cost utility analyses (CUAs) [25, 36, 38], with the remaining nine studies a combination of both CEA and CUA [24, 27, 34, 37, 39, 43,44,45,46]. For the CEAs, the most common clinical outcome related to percentage of patients healed (12 outcomes) [26, 27, 30, 32, 33, 37, 39, 41,42,43,44,45], with time taken for the ulcer to heal used as an outcome in ten studies [24, 25, 29, 31, 34,35,36, 41, 42, 46], and reduction in wound area size used twice [28, 29]. It should be noted that whilst these outcome measures may be clinically meaningful within VLUs, they do not have a willingness-to-pay threshold attached to them [such as with the quality-adjusted life-year (QALY)], and they do not facilitate comparison with other economic evaluations conducted within other disease areas.
For the CUAs, QALYs were the only measurement of utility used. Nine decision-analytic models sourced utilities for their QALY measurements from the 2007 study by Clegg and Guest [25, 27, 34, 36, 37, 39, 43,44,45]. This paper used standard gamble methodology on a sample of 200 participants, including both the general population and VLU patients, to allow for utilities to be assigned to different VLU states . The remaining three QALY measurements obtained utility values from randomised controlled trials that used the EuroQol 5D and/or Short Form 6D questionnaires [24, 38, 46].
The payer perspective was taken in 18 papers [24,25,26,27,28, 30, 31, 34, 36,37,38,39, 41,42,43,44,45,46], with only two studies (9%) stating that they had taken a societal perspective [29, 40]. Three papers did not state the perspective used [32, 33, 35].
Markov models were the most common decision-analytical approach, being used in 12 (52%) studies [24, 28, 29, 34, 35, 38, 39, 41,42,43,44, 46]. The most common time cycle length used for these models was 1 week with a 1-year time horizon, found in five studies [24, 34, 35, 41, 46]. Other time horizons varied greatly, from 14 days  to 12 years . Decision-tree models were used in five studies [27, 30, 31, 36, 45], and the remaining six studies did not explicitly state their model type [25, 26, 32, 33, 37, 40]. For these six studies, no figure was included to describe the model structure, making it difficult to appreciate the type of model used. Transition probabilities for models were described in 12 of the 23 studies (52%), with two studies stating that probabilities could be “obtained directly from the corresponding author” [39, 43]. Nine studies did not state any of the transition probabilities used in their models [25, 26, 31,32,33, 35,36,37, 45].
Of the 23 studies included in the review, 22 reported that the intervention was dominant. The one study without a dominant intervention, evaluating a skin substitute (Apligraf), had an incremental cost effectiveness ratio (ICER) of $14 per ulcer-day avoided . Almost all studies (91%) in the review were funded by medical companies, other than the studies by Ashby et al. (HTA programme fund)  and by Iglesias and Claxton (funded by the University of York) .
The reporting and methodological quality of studies was evaluated using the Philips Checklist . The responses given for each study are included in Electronic Supplementary Material 1 (Appendix 3). To allow for analysis of these results, a measurement system calculated the percentage of criteria fulfilled. A yes (Y) response counted as one point, a no (N) response as no points, a partial (P) response as half of a point, and criteria given an ‘N/A’ response were discounted from the calculation. Although this scoring system could be criticised for assuming equal weighting to all criteria, it allows for an estimate of number of items fulfilled. The scores from this ranged from 27% up to 89%, with the scores for included studies found in Electronic Supplementary Material 1 (Appendix 3).
The following discussion of study quality follows the three domains of the Philips Checklist, ‘model structure’, ‘data’ and ‘uncertainty and consistency’.
Selecting an appropriate model structure is key for decision-analytic models. Appropriate models should best reflect the disease process, relevant healthcare system and available evidence for the decision problem .
In the majority of studies, relatively little explanation was given for the selection of the type and structure of decision-analytical model. Only two studies (9%) gave full evidence as to how the model structure was formed [38, 39], whilst other possible model structures were considered by only one study . Model structures should align with the biological and clinical theory of the condition studied, and should be driven by the study question rather than data availability [20, 48]. By not detailing the rationale for the model structure used, it cannot be determined whether the model was the best fit for the study objective, or whether it was chosen due to data limitations. This reduces the perceived quality of the study.
The Markov models used in the included studies often had a simplistic structure. Commonly, the following states were used, ‘healed’, ‘unhealed’ and ‘dead’ [38, 41, 46], whilst some studies included more detailed states such as ‘improved’, ‘worsened’ and ‘complications’ [24, 44]. The cycle lengths used in the Markov models ranged from 1 day to 1 month, with this length mostly determined by the frequency of outpatient appointments. This is despite it being recommended that cycle length is based on the natural history of disease and not local treatment patterns .
Thirteen studies (57%) were considered to have a time horizon that was insufficient to reflect important differences between the options [25,26,27, 29, 30, 32, 33, 36, 37, 40, 43,44,45]. Although most VLUs are cured within 3–4 months, treatment can go on to take much longer . The time horizon was often determined from available trial data [28, 29, 39, 42,43,44], with no attempts to extrapolate this data to a more suitable economic time horizon. Guidelines suggest that a lifetime horizon is considered optimal for all decision-analytic models to ensure that no associated costs or benefits are missed [13, 50]. No study used a lifetime horizon within this systematic review and only four papers (17%) fully justified using a shorter time horizon [28, 29, 38, 46].
Only two papers (9%) adopted a societal perspective [29, 40]. Societal perspectives are recommended, as narrower perspectives may lead to important costs and benefits being ignored [13, 50, 51]. Of these two studies, Scanlon et al. did not include any indirect costs despite reporting the use of a societal perspective, reasoning that “due to the relatively high average age of people with venous leg ulcers, no costs of lost workdays were applied” . Whilst Sibbald et al. included loss of work as the only indirect cost considered , excluding other potential indirect costs such as patient transport or time lost from daily activities. This review found an absence of a true societal perspective when evaluating the cost effectiveness of VLU treatments.
Half-cycle corrections were not used by any papers using a Markov model within this review. Reasoning for their exclusion was only given by Nherera et al., who stated “for the study model, the cycle length was considered to be small enough not to require a half-cycle correction” . Whilst half-cycle correction is a criteria on the Philips Checklist, the need for such a correction is contested [52, 53].
An important strength of model-based economic evaluations in comparison with trial-based evaluations is that they allow authors to source all relevant evidence to help answer the study objective [13, 50]. Despite this, only nine (39%) of the studies used systematic literature reviews to inform model parameters [24, 26, 29, 30, 32,33,34, 38, 41]. The details of what these reviews involved and the results from them were embedded within the studies, with no literature reviews published separately. The remaining studies used single sources of data, with nine models using randomised controlled trials for data inputs [28, 31, 35, 39, 40, 42,43,44, 46] and five using The Health Improvement Network (THIN) database [25, 27, 36, 37, 45]. There is potential when using only a single source of data that the generalisability of the results is reduced.
Expert panels were used in 12 (52%) of the studies [26, 29, 30, 32,33,34, 37, 39,40,41,42, 44]. These panels mostly consisted of healthcare professionals whose estimations were used to compensate for gaps in research data. Whilst the use of expert opinion is acceptable, it was adjudged by Coyle and Lee to be at the bottom of the hierarchy for data sources , and should only be used as a last resort when no other data can be obtained . Two studies did not detail the methods used in eliciting expert opinion or who the experts were [34, 41].
Uncertainty and Consistency
Using decision-analytic models in economic evaluations allows authors to assess and analyse the uncertainty associated with decision making [13, 49, 50]. The Philips Checklist divides uncertainty into four dimensions: methodological, model structure, heterogeneity and parameter uncertainty.
Uncertainty regarding methodology, model structure and heterogeneity were poorly assessed by the studies included in the review. Methodological uncertainties were evaluated by only five studies (22%) [29, 35, 38, 42, 46]. These studies ran their sensitivity analyses with different weighting of data used , different time horizons  and with different clinical scenarios . Structural uncertainties were not addressed by any studies in the review. Heterogeneity was assessed by only two studies (9%), which examined populations from a different country  and with a different type of VLU .
Parameter was the most commonly addressed dimension of uncertainty, which was completed by 16 studies (70%) [25, 27, 29,30,31, 34, 36,37,38,39, 41,42,43,44,45,46]. Sensitivity analyses were run on parameter estimates including resource use, unit costs and clinical outcomes. It is suggested that parameter uncertainty should be tested using probabilistic analysis , which was performed by six out of the 16 papers (38%) [27, 30, 34, 38, 41, 46]. The majority of studies instead conducted univariate deterministic analysis [25, 27, 29, 30, 36, 37, 39, 42, 44, 45]. Univariate analysis allows for the impact of individual parameters to be assessed but is not recommended for testing uncertainty and should instead be used in the model development process . It is also recommended that when testing uncertainty, the range of values for a parameter should be well justified ; however, many models used arbitrary percentage variation without reasoning (e.g. ± 20% base case) in their sensitivity analysis [25, 27, 29, 30, 36, 37, 39, 42,43,44,45].
There were some flaws in how the results of the modelling studies were presented. Kerstein et al.  and Meaume and Gemmen  reported average cost-effectiveness ratios (ACERs) rather than ICERs, which are considered as the standard summary measures of economic evaluations [7, 50]. ACERs should not be used to report the results of analyses, as they assume that all outcomes are produced at equivalent cost and spread additional costs from an intervention over all previous outcomes, which is not the case [56, 57]. Negative ICERs were reported by Walzer et al. , despite describing the intervention (Sachet S) as dominant. It is unconventional to report negative ICERs, given that, presented without context, they do not show if an intervention offers a decreased cost for an increased benefit or an increased cost for a decreased benefit .
The 2012 study by Guest et al.  stated that cost effectiveness would be determined by the clinical outcomes of QALY gain and percentage of healed patients. However, after the results from the study showed that there were no differences in costs or clinical outcomes for all interventions, the study then reported on a previously unmentioned outcome (wound size reduction) as the reasoning behind recommending one of the interventions as the preferred strategy. Using post hoc outcome measures has the potential for bias, given that they may be selected on the basis of clinical or statistical significance of the observed results. The validity of this model is therefore reduced, with the potential to mislead decision makers.