Introduction

Economic evaluation provides information to health care professionals, decision-makers and consumers in general about the efficiency (the relation between cost and effects) of health technologies that may aid the choice of the most-favourable option [1]. Health technology is defined as any kind of drug, device and medical or surgical procedure used in health care management. The development of new health technologies, whose aim is to improve general health by reducing mortality and morbidity and increasing the quality of life, involves costs for the society and for health care providers [2]. The scarcity of available resources and the increasing demand for health care requires more rational assignment of resources and better definition of priorities: in this scenario, economic evaluation may provide valuable information [3].

Economic evaluation of health technologies has been increasing over recent decades in Spain, as evidenced by systematic reviews of Spanish economic health care evaluations [1, 4, 5]. However, these studies also point out some methodological aspects to be improved, including the definition of the perspective used in the study or the inclusion of sensitivity analyses, among others.

The objective of this study was to evaluate the methodological characteristics of CEA carried out in Spain, since 1990, which include LYG as an outcome to measure the incremental cost-effectiveness ratio (ICER). Secondary objectives were first, to determine whether the cost per LYG results were influenced by a commonly accepted cost-effectiveness threshold and second, to assess possible differences in study conclusions where quality adjusted life years (QALY) gained were also reported as an outcome measure together with LYG.

Methods

A systematic review of studies published between 1990 and March 2009 on the economic evaluation of health technologies in Spain including LYG as an outcome measure was conducted in PubMed/Medline and the CRD database.

The following combinations of terms were applied to the PubMed/Medline database: (“Cost-Benefit Analysis”[Mesh] OR “Models, Economic”[Mesh] OR “Costs and Cost Analysis”[Mesh] OR “Economics”[Mesh]) AND (Spain OR Spanish) AND (qaly OR avac OR lyg OR avg OR “life saved” OR “life year saved” OR “life gained” OR “vida salvada” OR “año de vida ganado” OR “life year gained” OR “life-years gained” OR “year of extended life”). The systematic review was limited to evaluations involving humans and whose publication language was Spanish or English.

The search strategy for the CRD database combined the following terms: “Spain” AND “cost-effectiveness” AND (“life year gained” OR “life year saved”) NOT “review”.

We searched other relevant local publications by hand, including “Revista Española de Economia de la Salud”, “Pharmacoeconomics Spanish Research Articles”, “Revista Española de Enfermedades Metabólicas Óseas”, “Angiología” or “Vacunas”.

Inclusion and exclusion criteria

Articles were selected according to the following inclusion criteria: (a) studies conducted in the Spanish context; (b) studies published in either Spanish or international journals; (c) studies related to economic evaluation of health technologies; (d) study results had to include a cost-effectiveness analysis expressed in cost per LYG; (e) studies referred to either adult or paediatric populations; (f) studies conducting an incremental cost-effectiveness analysis.

Studies were excluded if (a) they included QALYs and not LYG as an outcome measure; (b) study results did not include an incremental cost-effectiveness ratio; (c) the study was a systematic review.

Quality assessment

To assess the methodological quality of articles, a criteria checklist was developed as an adaptation of the criteria checklist for economic evaluations recommended by the National Institute of Health and Clinical Excellence [6], assigning a score of Code (−), Code (+) or Code (++) to value the methodological quality of studies. The criteria suggested by the Oxford Centre for Evidence Based Medicine [7] for economic and decision analysis were also applied to rank the validity of the evidence.

ICER analysis

The ICER is the most frequently used method of comparing treatment alternatives or clinical pathways in economic evaluations of health care. Different health care authorities have adopted a maximum ICER threshold to help decide whether a health technology is cost-effective or not and whether it should be adopted by the health care system. In Spain, there is no official threshold recommended by health care authorities as a “rule-of-thumb” for the economic evaluation of health technologies [8]. However, a review by Sacristán et al. in 2002 found that most economic evaluations that recommended the adoption of a certain health intervention were based on ICER lower than 30,000 € per LYG [9]. This commonly used threshold has been extended to cost per QALY and strengthened by the opinion of expert Spanish health economists [10].

Our review also identified studies that calculated cost-effectiveness results in terms of both LYG and QALYs gained in order to determine whether, considering the threshold of 30,000€ per QALY/LYG, they yield the same conclusion or there were differences [11]. The analysis used a dispersion graph comparing cost per LYG and cost per QALY gained in relation to the cost-effectiveness threshold.

The cost-effectiveness results of studies reviewed were updated to 2009 Euros using the inflation rates stated by the National Statistics Institute [12] and the corresponding exchange rates when necessary. In order to normalize the results taking into account biased and asymmetric data, a Box–Cox transformation of cost-effectiveness data using the natural logarithm was carried out [13].

Results

Our search yielded a total of 201 references, 62 of which were finally included according to the inclusion and exclusion criteria (Fig. 1; Table 1).

Fig. 1
figure 1

Selection process

Table 1 Description of the studies included

The oldest study selected was published in 1993 and the latest was published in March 2009, with 76% of studies being published from 2002 onwards (Fig. 2).

Fig. 2
figure 2

Annual distribution of the studies included

A total of 58% of studies were published in Spanish journals and 42% in international journals. Four studies were in paediatric populations, four in paediatric and adult populations (vaccination studies) and the remaining 87% in adult patients. Sixty-five per cent compared therapeutic interventions, while the rest dealt with preventive strategies (four related to screening programmes). The studies were conducted for cardiovascular diseases (31%), oncology (23%), infectious diseases (11%), respiratory diseases (11%), smoking (8%), hepatitis (6%), diabetes mellitus (5%) and musculoskeletal disorders (5%).

The most-frequently used perspective was that of the Spanish National Health System (69%). The societal perspective was only used in five studies (in four together with the National Health System perspective). In two articles, the authors stated a societal perspective but did not consider indirect costs. The perspective of the evaluation was not stated in 11 articles (18%).

The currency and year for unit values were acknowledged in 81% of studies and only the currency in the remaining studies. The currencies most used were the Euro (n = 46) followed by the Spanish Peseta (n = 8), US Dollar (n = 7) and the Ecu (European Currency Unit before 1999) in one study.

Seventy-four per cent of studies discounted costs and effects, 10% discounted only costs, 6% only effects and the remaining studies applied no discount. Only 44% of studies justified why discounting was necessary and why a specific discount rate was applied. The most-common discount rate used was 3% (42%), followed by 5% (25%), 6% (13%), 3.5% (9%), 4% (one study) and 4.25% (one study). In 89% of studies, the discount rate was the same for costs and health benefits.

The robustness of the results was tested by sensitivity analysis in 97% of studies, with one-way sensitivity analysis being used in 58% of studies, other methods such as multivariate or probabilistic sensitivity analysis together with one-way sensitivity analysis in 19%, multivariate analysis alone in 15% and probabilistic sensitivity analysis alone in 5%.

The ICERs were clearly stated in 84% of studies by calculating the cost and effect differences between the comparators evaluated. In the remaining studies, the incremental cost-effectiveness was stated without showing the differences in costs and effects.

In 74% of the studies, the authors acknowledged the limitations of the study. The source of funding was stated in 47% of studies, of which 90% were privately funded. Only three studies were publicly funded, none since 2004.

The level of evidence of 76% of studies was considered as 3b (analysis based on limited alternatives or costs, poor-quality estimates of data, but including sensitivity analyses incorporating clinically sensitive variations) due to the diverse nature of the sources used to estimate costs and effects and because sensitivity testing relied only on one-way analyses. Ten studies (16%) were assigned a 2b level of evidence (analysis of the effectiveness based on limited review(s) of the clinical evidence or single studies; and including multi-way sensitivity analyses) and five studies (8%) were considered level 4 (no sensitivity analysis included).

The methodological quality was considered to be good (Code +) in 55% of studies, very good (Code ++) in 26% and not good (Code −) in 19%.

Compared with previous systematic Spanish reviews [1, 4], some methodological aspects seem to have improved. First, 82% of studies reviewed stated the perspective of the evaluation, compared with 28 and 43%, respectively, in previous reviews [1, 4]. Second, the incremental cost and LYG differences are shown together with incremental ratios in 84% of studies. Third, 97% of studies conducted some form of sensitivity analysis, an essential requirement for any good economic evaluation, compared with only 30–68% [1, 4] of past reviews. And fourth, although only 47% of studies stated the source of financing, this is greater than the 29% found in past reviews [4].

ICER analysis

A total of 124 cost per LYG results were obtained from the 62 economic evaluations included in our study. The number of LYG results exceeds the number of studies included due to different sub-analyses of, for example, different time horizons, patient groups or comparators in the same study. Four (3%) LYG results showed a dominant situation for the intervention analysed (lower costs and greater effectiveness than the alternative compared) while the rest resulted in a mean cost per LYG of 49,529€ and a median of 11,490€. The great diversity of the evaluations with respect to pathologies, patients and methodologies resulted in wide dispersion of the results (standard deviation of 183,080). Therefore, more-robust statistical techniques were applied, such as the Huber estimator [14]. The robust mean calculated using the Huber estimator was 12,515€. Where classical statistical techniques fail to cope well with deviations from a standard distribution, robust statistical methods provide tools for statistical problems in which underlying assumptions are inexact. Huber’s M-estimator, a generalization of maximum likelihood estimators, allows data to be described with reduced weighting of outliers. The most widely used weighting factor for Huber’s M-estimator is 1.339 (Table 2).

Table 2 Description of the ICER results

The analysis of the cost per LYG of all studies reviewed according to the source of funding showed that robust mean results were 11,539€ for privately funded studies, 18,855€ for publicly funded studies and 13,069€ for studies without the source of funding stated. However, this comparison may be biased due to the small number (n = 3) of publicly funded studies.

As previously stated, a review published in 2002 found that most studies considered technologies with an incremental cost-effectiveness ratio below 30,000€ as efficient [9]. Our review showed that, since 2003, this unofficial threshold has been explicitly used as a reference by 66% of studies included, with robust mean results of 12,922€/LYG and 19,605€/QALY, while for those not explicitly using it, the robust mean results were 13,989€/LYG and 11,104€/QALY (Fig. 3).

Fig. 3
figure 3

Year of study and cost per LYG results

Of the 62 studies with cost per LYG results, 24 also calculated the cost per QALY gained. A total of 58 results of cost per LYG and QALY gained were represented in a dispersion graph to analyse whether the two results provided the same conclusions. In 84% of comparisons, the two results yielded the same conclusion, and in 40 cases (69%), the results were below the 30,000€ threshold showing the intervention to be cost-effective, and in 9 cases (16%), the results were above the threshold. However, in 4 cases (3 from the same study), the cost per LYG was below the 30,000€ threshold whilst the cost per QALY gained was above it. The other 5 cases (3 from the same study) showed the opposite results (Fig. 4).

Fig. 4
figure 4

Study of cost per LYG and cost per QALY gained. Cost-effectiveness threshold (ln) = 10.31€ (equal to 30,000€/QALY gained). The first and third quadrant show studies where cost per LYG and cost per QALY show the same (positive or negative) conclusion. Quadrant two shows studies where analysis by cost per QALY was not effective but cost per LYG was. Quadrant four shows studies where analysis by cost per LYG was not effective but cost per QALY was

The Spearman Rho correlation was used to correlate the estimate between the quantitative characteristics of the cost per LYG and cost per QALY gained. This rank-correlation method is considered robust against outliers and non-normal data distribution. The Spearman rank correlation between the two cost-effectiveness results was 0.89 (p < 0.001). After log transformation, the Pearson correlation was used, with a result of 0.91 (p < 0.001).

Discussion

Interest in economic evaluations in health care and their contribution to decision-making has increased in Spain in recent years [1, 4, 9]. We conducted a review of economic evaluations of health technologies in Spain assessing the incremental cost per LYG as an outcome from 1990 until the beginning of 2009. The number of publications found reflects this increasing interest.

Of the studies reviewed, only one assessed the cost per LYG for a medical device, with the remaining articles assessing mainly drugs or health care programmes. As medical devices are used to provide symptomatic improvement, the number of QALY gained is the preferred assessment outcome.

The methodology used by the majority of studies assessed satisfied most of the general methodological aspects considered to represent good practice in international recommendations [3, 68, 15]. Compared with previous reviews [1, 4], the number and quality of published Spanish health economic evaluations seem to have improved over times, and some deficiencies found in previous reviews seem to have been solved. Analysis of the methodological quality of the studies published, since 2003, showed that 86% (30 out of 35) were rated as (Code ++ or Code +), showing the possible influence of previously published reviews in 2002 [1, 9] that may have led to greater methodological rigour. Moreover, it should be expected that the recent publication of Spanish recommendations on the economic evaluation of health technologies [8] will reinforce this trend in the future.

One limitation of this study is the narrow focus on methodologies using cost per LYG as a result. However, we believe that a detailed examination of this particular topic was desirable, since a similar review focused on the results of studies using cost per QALY has been published elsewhere [5]. This review covers a similar period of time to our study, and its main purpose was a methodological assessment of reviewed studies. The authors found an increasing number of published economic evaluations and an improvement in their methodological quality, as found in our study. However, no analytical analysis of the cost per QALY results was carried out. The robust mean results of their reviewed studies was 18,309€/QALY, compared with 11,541€/QALY in our study. In addition, despite a comprehensive search, some of the earliest publications may have been overlooked, although their inclusion would have been unlikely to alter the reported findings.

Some problems arise in the increasing use of cost-effectiveness thresholds as an explicit decision-making rule. Cost-effectiveness thresholds may vary according to the country or geographical area; in fact, the World Health Organization recommends adjustment by the corresponding gross domestic product [16]. They may also vary according to the decision-maker (social or health provider perspective), the health care technologies compared (preventive or therapeutic), the effectiveness measure of the evaluation chosen (LYG, QALY gained, intermediate clinical outputs) or the disease under study. As an example, recent supplementary advice for appraising life-extending, end-of-life treatments made by NICE [17] recognized the need for further appraisal when the treatment involved is indicated for small populations with incurable illnesses,, and the most-plausible reference case point estimate for the ICER exceeds the upper threshold of the range normally considered.

Cost-effectiveness thresholds are not gathered unanimously in the different international guidelines for health economic evaluation, and the latest Spanish recommendations [8] do not state any explicit thresholds, in contrast with NICE guidelines (25,000–35,000£/QALY gained) [6]. Different thresholds have been stated in Spanish publications (ranging from 30,000€ to 50,000€/QALY) [9, 18, 19], but a recommendation of 30,000€/QALY gained is commonly considered as cost-effective for most authors after the review by Sacristán et al. in 2002 [9].

Although the reported cost per LYG according to the type of funding is below 30,000 Euros, the difference between the results of publicly and privately funded studies should be subject to more thorough analysis if the future threshold of acceptability was set at between 15,000 and 20,000 Euros per LYG by Spanish decision-makers. A more detailed analysis could be made in the future, when more publicly funded studies would probably be carried out.

The adoption of a fixed threshold could result in economic studies seeking the maximum price for the technology assessed that still shows a cost-effectiveness ratio below the threshold [20]. We found no clear influence of the commonly used cost per LYG threshold of 30,000€ in studies published after the article by Sacristán et al. (2002), although 66% of them explicitly referenced it. However, an increase in the number of studies with cost per LYG results close to, but below, the 30,000€ threshold was found, which might indicate a certain temporary publication bias caused by the implicit acceptance of a threshold of efficiency, although more information would be needed to reach definitive conclusions.

Other decision sources, such as the potential financial consequences of a new health care technology, are not covered by cost-effectiveness thresholds and represent an essential part of a comprehensive economic assessment of a health care technology. Budget impact analysis is used to quantitatively estimate the foreseen changes in health care expenses for treatment of a specific pathology when an alternative intervention is introduced [21, 22], complementing the information provided by the cost-effectiveness results of the new intervention.

When two types of final outcome results are studied (LYG and QALY gained) in the same health economic evaluation, the conclusions of the study do not depend on the final outcome chosen in most of the cases, i.e., the cost per LYG and the cost per QALY gained result led to the same conclusion. The high correlation found in our study between the two ratios (0.89 Spearman and 0.91 Pearson correlation) is similar to that found by Chapman in 2004 (0.86 Spearman and 0.84 Pearson correlation) [11]. This is important because it is often difficult and costly to find utility data for QALY calculation. However, further assessment would be needed to accept this as a fact, and it should be noted that this correlation may vary between different types of diseases. In some cases, choosing LYG or QALY as the outcome of the study may change the cost-effectiveness results of an evaluation. An intervention could be cost-effective considering cost per LYG rather than cost per QALY gained when it involves a better survival outcome but has less quality of life effectiveness (for example, having more side effects, disease complications, survival rates in a severe health state). This would be the case for certain cancers, where life years are gained when disease severity is associated with low levels of quality of life (for example, breast cancer in the studies reviewed [23]). The opposite could occur in an intervention in which the quality of life is greatly improved but there is a limited improvement in survival. This would be the case for chronic pathologies with good life expectancy but which are highly sensitive to quality of life changes associated with improvements related to a new treatment option resulting in fewer disease complications or side effects, such as hepatitis C [24] or type 2 diabetes [25]. Therefore, larger studies where the primary objective is to analyse the relationship between ICER thresholds and types of diseases would be necessary. In the present study, only 9 out of 58 results, corresponding to 4 studies, showed this discrepancy, which is not sufficient to reach any conclusions.

Cost-effectiveness analysis is useful in allowing decision-makers to maximize resource allocation. Although different approaches have been used to present results (LYG, QALYs, etc.), the best alternative may depend on the scope of the study, the disease evaluated and the financial impact of the technologies under evaluation, among other factors.

Our results suggest that some aspects should be improved in future studies using LYG as an effectiveness outcome: (a) a clear definition of the perspective of the economic evaluation; (b) a description of ICER in all economic evaluations performed; and (c) greater use of probabilistic sensitivity analysis to better evaluate uncertainty.