In total, 644 studies were identified. After exclusion of studies that did not meet our inclusion criteria, data were extracted from 370 studies (Fig. 2).
Characteristics of the Included Publications
Pharmaceuticals or not In total, 176 (47.6 %) studies dealt with pharmacological interventions. The dominance of pharmacoeconomic evaluations was expected, considering that pharmaceutical companies in many countries are obliged to submit a pharmacoeconomic evaluation as part of a reimbursement application, while devices and procedures in many jurisdictions still lack this kind of regulation.
Types of journals The journals in which the included papers had been published were categorized into three main types: (i) clinical or medical specialty journals, (ii) non-specialty medical and health journals, and (iii) health economics type journals (i.e. those with ‘economics’ or ‘technology assessment’ in their journal name, as well as Social Science and Medicine and Value in Health). Only 24 % of the articles were published in this latter type of journal (Table 1).
Country of origin Almost 70 % of the studies had their origin in four countries: the USA (29 %), the UK (23 %), Canada (8 %) and the Netherlands (8 %). The large proportion of studies from the UK, Canada and the Netherlands might be explained by a combination of strong involvement in developing MAU instruments as well as policy guidelines that recommend the use of QALYs in applications for reimbursements. Also, note that the only health technology assessment (HTA) reports identified with our search were from the UK, as these are indexed by MEDLINE and Embase . HTA agencies from other countries may have published some of their analyses through journal articles, but generally HTA reports are disseminated in publications that are not found in searches of regular databases .
Type of study Health economic evaluations can be performed as part of an epidemiological study (most often a randomized controlled trial [RCT]), as a modelling exercise based on a synthesis of published data, or as a combination of the two. Most studies included in this review were models (80 %). The remainder were split equally between the other two groups: strictly based on an RCT with no modelling involved (10 %) or a combination of RCT and modelling (10 %). The implication of this is that a maximum of 20 % of the evaluations could have had access to individual patient-level QALY data if these were gathered in the RCT, while the rest, in general, would be based on previously published data from one or more sources.
Main disease group The disease groups targeted by the interventions analyzed were categorized in eight groups: cancer, cardiovascular diseases (CVDs), respiratory diseases, mental health, other chronic diseases, non-chronic diseases, lifestyle interventions and other prevention. Clearly, the vast majority of studies focused on various types of chronic diseases, with cancer and CVD being the most frequent; 19 and 14 % of the studies, respectively.
Comparator With only 5 % placebo controlled and 24 % compared with a no-treatment situation, the vast majority used an active comparator. However, it is impossible to tell if the comparators were chosen to reflect the most relevant alternative.
Measuring and Valuing Quality-Adjusted Life-Year (QALY) Gains
In 205 studies (55 %), there was no reference to which MAU instrument or direct valuation method formed the basis for measuring ‘the Q in the QALY’ (Table 2). However, most of these (147) papers had referred to other publications from which they had obtained QALY data. The remaining 58 were based on ‘mapping’ from a disease-specific instrument, or the valuation method was not specified at all.
Among the studies that explicitly referred to the use of MAU instruments, the EQ-5D-3L was the most frequently used: 87 of the 113 studies that were based on a single instrument. In 11 studies, more than one MAU instrument had been combined. The valuation method used for calculating HRQoL was reported in only 85 (23 %) publications. TTO was the most widely used method, reflecting the fact that most of these studies had applied the standard EQ-5D-3L tariff from the UK, which is TTO based .
The combined information on which MAU instrument and which valuation method had been applied was reported in only 66 studies (18 %) (the 16 studies that stated direct valuation and valuation method are included here). Either MAU or valuation method was reported in 99 studies.
The combined information (which MAU instrument and which valuation method) was reported in 29 % of publications in health economics journals, but in only 14 % of medical journal publications (Table 3). Hence, reporting was clearly better in health economics journals (p = 0.0013).
The Size of QALY Gains
In 37 of the 370 studies included, the size of the incremental QALY gain was not reported. Rather, these 37 studies reported the total QALY gain in the study group, the probability that the intervention is cost effective or simply the ICER.
Table 4 shows that the median incremental QALY gain in the remaining 333 studies was 0.06, which translates to 3 weeks of prolonged life in best imaginable health (the mean was 0.31 QALYs). The effect in the lowest quartile translates to 4 days of prolonged life, while the upper quartile was about 4 months or more. The generally low QALY gains might be due to short time horizons over which the gain had been measured and estimated. Table 4 shows that gains are increasing with time horizon, but not much: the median QALY gain in studies with a time horizon longer than 5 years was only 0.12.
When comparing QALY gains across diagnostic groups (Table 5), we note that interventions related to other chronic diseases yield the highest incremental gains while preventions yield the lowest.
Given the generally low QALY gains in this review, we looked closer into the 29 studies (8 %) that reported a gain larger than 1 QALY. These large gains were most common when the comparator was placebo or no-treatment (14 vs. 7 %, p = 0.03). A further characteristic of these studies was that most of them were published in health economic journals (15 vs. 6 %, p = 0.01), which indicates more methodological transparency. Furthermore, a higher proportion of these 29 studies were based on data from ‘rest of the world’ (14 vs. 7 %, p = 0.05), i.e. all countries except for those explicitly mentioned in Table 1.
Eight studies reported incremental gains of two QALYs or more. None of these involved a ‘large medical breakthrough’. Rather they were interventions targeted at relatively young patient groups who will benefit from an improved HRQoL over many years. Six of these eight studies compared the gains with a no-treatment alternative.
Discounting QALY Gains
Discounting QALY gains was common: 276 (75 %) reported results using a positive discount rate, while 58 (17 %) presented only the undiscounted result. In the remaining 36 studies, the discounting issue was not explicitly mentioned.
The most frequently applied discount rate was 3.0 %, which has come to be the current standard rate in the literature, perhaps due to recommendations by the Washington panel . Interestingly, studies that departed from this international norm appeared to do so in response to domestic guidelines. The Netherlands suggests a rate of 1.5 % in their guidelines, which explains why 17 of 27 studies using this rate have a Dutch setting. Similarly, the UK guideline is 3.5 %, which explains why 52 of 62 studies using this rate are UK based.
When comparing the practice of discounting with time horizon (Table 6), we note that most studies that did not discount the gains, or contained no mention of discounting (missing), were short-term studies (time horizon of 1 year or less).