FormalPara Key Points

This study summarizes the evidence on the monetary valuation of losses in healthcare goods and services as compared to equally sized gains. It shows that people, generally, value losses 1.58–1.86 times higher than equally sized gains.

The results of this study provide more evidence to explain the observed difficulty of disinvesting healthcare goods and services.

The results of this study may imply the possibility of using different cost-effectiveness thresholds for decisions on starting vs stopping the reimbursement of healthcare.

1 Introduction

The healthcare market is characterized by many imperfections, such as asymmetric information between patients and physicians, third-party payers, and uncertainty in demand and supply. Because of these market imperfections and government regulations, the price people pay for goods and services in the healthcare market does not necessarily reflect their value to them. Therefore, unlike the market for consumer goods, it is difficult to use revealed preferences to determine the value of healthcare goods and services [1]. To circumvent this problem, health economists have regularly resorted to using stated preferences methods, such as contingent valuation, to estimate the value of healthcare [2, 3].

An important application of stated preferences for healthcare is the cost-benefit analysis [4,5,6]. In this context, two measures have been used for valuing healthcare: willingness to pay (WTP) and willingness to accept (WTA). Willingness to pay measures the amount of money an individual is willing to pay for obtaining a certain healthcare good or service. Willingness to accept measures the amount of monetary compensation an individual wants to receive for giving up a certain healthcare good or service. The relevant measure to use, thus, depends on the decision context, with WTP being used when people stand to gain something and WTA being used when people stand to lose something [4, 7, 8].

Previous studies have reported substantial differences between the WTP and WTA for the same good or service, both in hypothetical studies as well as in studies involving real transactions [9,10,11]. An aggregate data meta-analysis by Tunçel and Hammitt summarized the studies comparing WTP and WTA across different economic sectors. They reported an overall WTA/WTP ratio of 3.28, indicating that people, on average, want to receive a 3.28 times larger amount to give up a good or service than they are willing to pay to obtain this good or service. The size of the WTA/WTP ratio differed for the type of good valued, with studies on environmental goods reporting the largest WTA/WTP ratio of 6.23 on average [10]. A recent estimate of the WTA/WTP ratio for healthcare goods and services is lacking, indicating a knowledge gap on the WTA-WTP disparity for the healthcare sector. The only review reporting a separate WTA/WTP ratio for healthcare [12] dates back to 2002 and included only two studies reporting ratios of 1.9 and 6.4 [13, 14]. The more recent meta-analysis of Tunçel and Hammitt [10] did not look at healthcare separately, but reported a mean ratio of 5.09 for health and safety goods together. Moreover, the search for this meta-analysis dates back to early 2012 and only covered one database (i.e., Econlit), indicating that the search could be updated and expanded to more databases to identify further relevant studies in the healthcare context.

In the literature, many different explanations for a disparity between WTA and WTP have been described. According to standard economic theory, WTA and WTP should be similar when the good valued is divisible and exchanged at zero transaction costs on an infinitely large market. If these conditions do not hold, WTA and WTP may be different. The size of this difference depends on income, the proportion of income that is spent on the good, and the income elasticity [5, 6]. Furthermore, the inability to substitute money for a (public) good, either because of perfect complementarity or because of asymptotic boundedness of the utility curve, may also be a reason for WTA to exceed WTP [15,16,17]. Moreover, according to several alternative economic theories, such as prospect theory, (1) people value a change from a reference point, instead of the final state after a change, and (2) the value function for losses is steeper than the value function for gains. For these reasons, WTA values are expected to be larger than WTP values [18,19,20].

It is important to obtain more insight into the WTA/WTP ratio for healthcare goods and services, as a disparity between WTA and WTP has important implications for healthcare decision making, for example for reimbursement decision making. If WTA is larger than WTP, a higher cost-effectiveness threshold may be used for decisions on stopping reimbursement of healthcare interventions as compared to decisions on starting reimbursement, in other words, the cost-effectiveness ratio should probably be significantly less favorable for disinvestment to be welfare improving. In line with this, insight into the WTA/WTP ratio for healthcare goods and services may be helpful to better understand reimbursement decision making as policy makers seem to find it more difficult to discontinue reimbursement than not to start reimbursement in the first place [21]. It may, therefore, also be important for researchers in the field of cost-benefit analysis of healthcare interventions and preference elicitation to obtain more insight into the WTA/WTP ratio for healthcare goods and services, as insight into this issue provides guidance on choosing the appropriate measure of the value of healthcare interventions given the decision context at hand, i.e., investment vs disinvestment of healthcare goods and services. Furthermore, insight into the WTA/WTP ratio for healthcare goods and services may be helpful in understanding the general reluctance of patients to change treatment despite potential advantages [22, 23], indicating that for a new treatment to be welfare improving, it should offer substantially higher benefits to the patient than the current treatment.

The aim of this study is to review the available evidence on the disparity between WTA and WTP for healthcare goods and services to obtain an aggregated estimate of the WTA/WTP ratio for healthcare goods and services. To this end, we used a comprehensive tiered approach consisting of (1) a systematic review, (2) an aggregate data meta-analysis (AD-MA), and (3) an individual participant data meta-analysis (IPD-MA). First, the systematic review provides an overview of published studies that compared WTP and WTA for healthcare goods and services. Second, the AD-MA combines the estimates as reported in these studies. Finally, the IPD-MA enables us to calculate one overall estimate of the WTA-WTP disparity, to obtain more insight into the statistical and methodological uncertainty surrounding this estimate, and to correct the estimate for subject characteristics. The IPD-MA approach has not been applied before to estimate the WTA/WTP ratio. Hence, this study adds a new level of information to the previous literature.

2 Methods

2.1 Systematic Review

The databases MEDLINE, EMBASE, Scopus, Scisearch, and Econlit were searched from inception to the search date (i.e., 9 or 13 February, 2017) using WTP and WTA (and variations thereof) in the title, abstract, or as keywords. For the databases that do not solely focus on health (i.e., Scopus, Scisearch, and Econlit), the search strategy was extended with health-related search terms. The full search strategies are displayed in Electronic supplementary material: Appendix A.

After deduplication, titles and abstracts were screened for eligibility by two reviewers using the eligibility criteria in Table 1. If eligibility was not clear from the title and abstract, the article was included in full-text screening to ensure that no eligible papers would be missed. Differences between reviewers were resolved by discussion. If a consensus was not reached, a third reviewer was consulted. Full-text articles of all included abstracts were retrieved and screened for eligibility by one reviewer. If the reviewer was unsure about eligibility, the other reviewers were consulted.

Table 1 Eligibility criteria for the systematic review

For each included article, the estimate of the WTA/WTP ratio was extracted. If several estimates for different subgroups or experiments were provided, all these estimates were extracted. Next to the WTA/WTP ratio, the following (study) characteristics were extracted: first author, year, country, good/service valued, number of study subjects (N), subject sample type, within- vs between-subject design, elicitation method, administration method, payment vehicle, and payment frequency (see Electronic supplementary material: Appendix B).

2.2 Aggregate Data Meta-Analysis

From the WTA/WTP estimates extracted in the systematic review, an overall WTA/WTP ratio was calculated. This was calculated by taking the mean and median from the WTA/WTP estimates as reported by the studies. If studies only reported mean/median WTA and WTP at the study level (i.e., not a ratio), the WTA/WTP ratio at the study level was calculated by dividing WTA by WTP. Next to the mean and median, a weighted average WTA/WTP ratio was calculated to take account of large differences in the number of subjects and number of estimates retrieved from studies [10, 11]. The estimates from the studies were weighted using this formula:

$$\frac{{\sqrt {N_{ik} } }}{{\sqrt {K_{i} } }},$$

where N is the sample size of estimate k from study i and K is the number of estimates provided by study i. As the aggregate WTA/WTP estimates were reported in different formats (i.e., mean, median, or regression model estimate), overall WTA/WTP ratios were calculated for each format separately.

2.3 Individual Participant Data Meta-Analysis

Individual participant data (IPD) on WTP, WTA, age, sex, and income were requested by sending an e-mail to the corresponding authors of the papers included in the AD-MA. If it was not possible to contact the corresponding author, other authors were e-mailed. If necessary, the authors were reminded twice. The retrieved IPD were analyzed using three approaches increasing in complexity, which are described in the subsequent three paragraphs.

2.3.1 Descriptive Analyses

Received datasets were merged and harmonized into one dataset for analysis. To facilitate comparison and analysis, all WTP and WTA values were converted to the same base year and currency unit (i.e., 2017 Euros, Dutch price level) using the OECD purchasing power parities [24] and the consumer price index from Statistics Netherlands [25]. To test whether the studies included in the IPD-MA were different from the studies included in the AD-MA, an overall WTA/WTP ratio was calculated in a similar manner to the AD-MA. To this end, study-level WTA/WTP ratios were calculated by dividing mean/median WTA at the study level by mean/median WTP at the study level. From these study-level WTA/WTP ratios, overall estimates were calculated by taking the mean, median, and weighted average from these estimates.

2.3.2 Mixed-Model Analysis

Of the 4213 subjects included in the IPD dataset, 302 subjects (7%) had a missing value on WTP, 218 subjects (5%) had a missing value on WTA, 1107 subjects (26%) had a missing value on both WTP and WTA, and 435 subjects (10%) had a missing value on income. As a complete case analysis, i.e., exclusion of respondents with missing values, may introduce bias, multiple imputation of WTA, WTP, and income was used. The imputation model used data on age, sex, income, country of study, and converted WTA and WTP. We used a fully conditional specification with predictive mean matching to impute WTP and WTA when one was available and one was missing. The 1107 subjects with both WTP and WTA missing were excluded because they missed both parameters of interest for this study. Data were imputed ten times. All analyses were performed on each dataset separately and, subsequently, the results were pooled according to Rubin’s rule [26]. As WTP, WTA, and the WTA/WTP ratio were not normally distributed, the data were then log-transformed. As a result, respondents with WTA or WTP of zero were excluded from the analysis. As income was measured on different scales in different studies, income was dichotomized, based on median income (category) at the study level as a cut-off point. Subsequently, the log of WTA/WTP ratio was estimated with a linear mixed model. A random intercept was included to reflect any heterogeneity over studies in this outcome. The analysis was performed once without correction for covariates and once with correction for age, sex, and income. All analyses were performed with SAS 9.4 software (SAS Institute Inc., Cary, NC, USA).

2.3.3 Sensitivity Analyses: Zero Willingness to Pay and/or Willingness to Accept

In the AD-MA and the descriptive analysis of IPD, data of subjects with zero WTP and/or WTA were included in the analysis. In the mixed-model analysis of IPD, subjects with zero responses were excluded from the analysis because log-transformation of zero WTP and/or WTA is not possible. The best approach to dealing with zero responses in this context depends on the reasons behind zero responses (e.g., protest responses, not understanding the task, or an actual very low/zero valuation [27,28,29]). In this meta-analysis, we were not able to determine the reason behind zero responses. Therefore, to assess the potential impact of our main approach to zero responses on the WTA/WTP ratios, we conducted two sensitivity analyses. The first sensitivity analysis is the same as the descriptive analysis of IPD as described in Sect. 2.3.1; however, excluding subjects reporting zero WTP and/or WTA. The second sensitivity analysis is the same as the mixed-model analysis, described in Sect. 2.3.2; however, including subjects reporting zero WTP and/or WTA, by replacing their zero value by one-half, one-third, or one-quarter of the smallest value reported in the study concerned. This approach especially makes sense if subjects reported zero values because their WTP or WTA was too small to be picked up by the elicitation procedure used.

3 Results

3.1 Systematic Review

Databases were searched on the 9 February, 2017 (MEDLINE and EMBASE) and 13 Feburary, 2017 (Scopus, Scisearch, and Econlit). In total, 396 records were identified of which, after removal of 231 duplicates, 165 remained for title and abstract screening. Of the 31 articles that were included in full-text screening, 13 were included in the review (see Electronic supplementary material: Appendix C). Figure 1 displays the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram.

Fig. 1
figure 1

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram. AD aggregate data, IPD individual participant data, MA meta-analysis, SR systematic review, WTA willingness to accept, WTP willingness to pay

Table 2 displays the descriptive characteristics and the extracted WTA/WTP ratios of the studies included in the systematic review. The 13 included studies provided estimates for 19 different experiments or subgroups.

Table 2 Descriptive characteristics and extracted willingness-to-accept/willingness-to-pay (WTA/WTP) estimates of included studies

3.2 Aggregate Data Meta-Analysis

The WTA/WTP ratios calculated from the extracted WTA/WTP estimates are displayed in Table 3. A mean WTA/WTP ratio of 1.73 for 15 mean estimates and a mean ratio of 1.58 for nine median estimates were found. The weighted average was 1.87 for mean estimates and 1.55 for median estimates. The small differences between the crude and weighted averages indicate that the estimates provided by studies with more subjects and/or more experiments/subgroups were not very different from other studies. One study [35] did not report mean or median, but reported a regression model estimate of the WTA/WTP ratio instead. This estimate of 3.20 was relatively high compared with the mean WTA/WTP ratio for mean and median estimates.

Table 3 Willingness-to-accept/willingness-to-pay estimates obtained from aggregate data

3.3 Individual Participant Data Meta-Analysis

From the 13 studies included in the AD-MA, six datasets were obtained for inclusion in the IPD-MA (see Fig. 1). For the remaining seven studies, the data could not be included because of non-response (n = 3) or because the authors were not able to send the data (n = 4). The six datasets received covered 71.2% of the subjects who were included in the AD-MA, implying that the samples we could not include were relatively small compared with the samples we were able to obtain.

3.3.1 Descriptive Analyses

Descriptive information of the six datasets received is displayed in Table 4. Of the 4213 subjects included in the six datasets, 1107 subjects were excluded from the analysis because they had both WTP and WTA missing. Of the remaining 3106 subjects, 299 subjects (10%) reported a WTP of zero, 69 subjects (2%) a WTA of zero, and 77 subjects (2%) both a WTP and a WTA of zero. This left 2661 subjects for the mixed-model analyses.

Table 4 Descriptive information of the studies included in the individual participant data meta-analysis

Table 5 displays the WTP and WTA per study after conversion to 2017 Euros (for raw data, see Electronic supplementary material: Appendix D) and the results of the descriptive analysis. The study-level estimates of the WTA/WTP ratios were similar to the estimates found in the AD-MA. This indicates that the subsample of studies included in the IPD-MA was not that different from all studies included in the AD-MA. Electronic supplementary material: Appendix E shows the WTA/WTP ratio for different levels of age, sex, and income. As expected, the ratio was higher in people with a lower income compared with people with a higher income. Furthermore, the two intermediate age groups reported lower WTA/WTP ratios compared with the youngest and the oldest age category.

Table 5 Willingness-to-pay (WTP), willingness-to-accept (WTA) and WTA/WTP estimates based on individual participant data (converted to 2017 Euros)

3.3.2 Mixed-Model Analysis

Table 6 displays the results of the mixed-model analysis. The unadjusted WTA/WTP ratio was 1.86 (95% confidence interval 1.52–2.28). Age and income category both had a statistically significant effect on the WTA/WTP ratio found. The Table in Electronic supplementary material: Appendix F displays the ln(WTA/WTP ratio) and the WTA/WTP ratio for different groups of subjects. The figures in Electronic supplementary material: Appendix F display the trend of the WTA/WTP ratio for different types of subjects, based on the ln(WTA/WTP) slope estimates. The largest difference in the WTA/WTP ratio of 0.45 was found between high-income 30-year-old individuals and low-income 65-year-old individuals. Furthermore, the difference between the low- and high-income groups increased with increasing WTA and WTP values.

Table 6 Willingness-to-accept/willingness-to-pay (WTA/WTP) ratios obtained from the mixed-model analysis of individual participant data

3.3.3 Sensitivity Analysis: Zero Willingness-to-Pay and/or Willingness-to-Accept Values

The merged dataset contained 445 subjects (14%) with a WTA, WTP, or both WTA and WTP of zero (Table 4). Table 7 displays the results of the first sensitivity analysis. These results have been obtained in the same manner as the results in Table 5, only with exclusion of the 445 subjects reporting zero WTA and/or WTP. This analysis shows that the exclusion of zero WTA and/or WTP generally resulted in lower WTA/WTP ratios, with this effect being most pronounced for the mean and median WTA/WTP ratios obtained from average WTA and WTP at the study level compared with those obtained from median WTA and WTP at the study level. Furthermore, unsurprisingly, the impact was largest in the studies with more subjects reporting zero WTP.

Table 7 Willingness-to-pay (WTP), willingness-to-accept (WTA) and WTA/WTP estimates based on individual participant data (converted to 2017 Euros), with exclusion of subjects reporting zero WTA and/or WTP

Table 8 displays the results of the second sensitivity analysis, the mixed-model analysis with replacement of zero values with either one-half, one-third, or one-quarter of the smallest value reported in the study from which the subjects reporting zero WTA and/or WTP originated from. These results have been obtained in a similar manner to the results in Table 6. The estimated WTA/WTP ratios were much larger when zeroes were replaced by a small value compared with when zeroes were excluded from the analysis. This may partly be caused by the large smearing factors in the sensitivity analyses (3.7–5.7 in the sensitivity analyses vs 1.3 in the original analysis) caused by the artificial “spike” at the lower end of the distribution because of the imputation of zeroes with small values. The estimated WTA/WTP ratios were larger when the replacement values were smaller.

Table 8 Willingness-to-accept/willingness-to-pay (WTA/WTP) ratios obtained from the mixed-model analysis with replacement of zero values

4 Discussion

The aim of this study was to review the available evidence on the disparity between WTA and WTP for healthcare goods and services using a comprehensive tiered approach consisting of (1) a systematic review, (2) an AD-MA, and (3) an individual participant data meta-analysis. In the AD-MA, we found an average WTA/WTP ratio of 1.73 (median 1.31) for mean estimates and of 1.58 (median 1.00) for median estimates. In the IPD-MA, we found an uncorrected WTA/WTP ratio of 1.86 (95% confidence interval 1.52–2.28) and a WTA/WTP ratio adjusted for age, sex, and income of 1.70 (95% confidence interval 1.42–2.02). The approach to deal with zero WTP and/or WTA values considerably impacted the WTA/WTP ratio found.

This study found a significant effect of income category and age on the WTA/WTP ratio. No effect of sex was found. As previous meta-analyses on WTA and WTP have not tested the effect of age, sex, and income on the WTA/WTP ratio, it is not possible to compare these findings with other studies. However, these findings seem to correspond with the well-known income effect, which says that because WTP is constrained by income while WTA is not, there may be a substantial disparity between WTA and WTP when (1) the change concerned is large, (2) the value of the good concerned is high, or (3) the income elasticity for the good concerned is high and increasing with income [5, 15]. The reason for this is that when the value of the good increases, the WTP will increase until the income constraint is reached, while WTA would become infinite. As people with lower incomes have a lower income constraint than people with higher incomes, the WTA-WTP disparity should be larger for people with lower incomes than for people with higher incomes, as was indeed was found in this study.

To obtain an impression of the impact of our approach to zero WTP and/or WTA responses in our main analyses, we have conducted two sensitivity analyses. The results of these sensitivity analyses indicate that the approach to dealing with subjects reporting zero WTP and/or WTA may considerably affect the WTA/WTP ratio. To our knowledge, the issue on how to deal with zero WTP and WTA has not received much attention in the scientific literature so far. To determine the best approach to dealing with zero responses, it is important to know the rationale behind reporting zeroes in stated preference studies. Qualitative inquiry during or directly after the administration of the WTP and WTA task may provide more insight into the reasons behind zero responses and subsequently provide guidance on the most valid approach of dealing with zero responses (which may be another approach than was used in this meta-analysis). Some studies already included follow-up questions when eliciting WTP and found that zero responses may be protest responses as well as real zeroes [27,28,29]. However, more research on the rationale behind zero responses and the best approach to deal with these zero responses in the analysis is warranted. Furthermore, to prevent analysis and interpretation problems with regard to zero WTP and/or WTA such as encountered in our review, we recommend future research to decrease the number of zero responses by using other contingent valuation methods than open-ended questions, as previous reviews have shown that open-ended question formats are more prone to zero responses than other contingent valuation methods [2, 43]. Moreover, when using a closed-ended question format, researchers are recommended not to include the value zero in the option list, but, instead, to only provide the option ‘the good is not worth anything to me’. This will force subjects to think twice before reporting a zero, which will decrease the number of non-true zero responses. For the remaining zero responses, to determine how to best handle these individual zero responses in the analysis (e.g., exclusion or imputation), researchers are recommended to include a probing question that pops up if respondents report zero WTP and/or WTA. Answer options should at least cover the following possible reasons underlying zero responses: not understanding the question, protest response, value of the good is smaller than the answer option provided, and true zero (‘the good is not worth anything to me’). Including such a probing question, will open the ‘black box’ of zero responses, facilitating the decision on how to deal with individual zero responses in the analysis, and will force subjects to think about their zero response, which may, in some cases, result in subjects changing their zero into their true value.

4.1 Comparison with Previous Studies

The WTA/WTP ratios found in our meta-analysis are considerably lower than those found in previous meta-analyses/reviews. A possible explanation of this may be that one of the studies included in the review by O’Brien et al. was not included in our meta-analysis as it was not identified in our search because the title and abstract did not contain WTA or variations thereof. This study reported a very high WTA/WTP ratio of 6.4 for a non-fatal injury, which may be explained by the fact that the change valued in the WTA scenario (i.e., no injury vs full injury) was larger than the change valued in the WTP scenario (i.e., small injury vs full injury) [14]. Hence, it may not be surprising that this ratio is much larger than the ratios found in our meta-analysis. The estimate for health and safety goods in the meta-analysis by Tunçel and Hammitt was obtained from 11 studies of which seven were not included in our meta-analysis. These seven studies reported generally larger WTA/WTP ratios than the studies included in our meta-analysis and predominantly valued traffic safety, job safety, and product safety, i.e., safety goods [10]. This indicates that the WTA/WTP disparity may be larger in safety studies than in health studies, which may explain why our meta-analysis found a smaller disparity and stresses the need for a separate WTA/WTP estimate for healthcare goods and services, as has been obtained in our meta-analysis.

Another possible explanation for the relatively small WTA-WTP disparity found in our review may be that the studies included in our review valued relatively small changes in healthcare goods and services, such as 1 hour of informal care or one general practitioner consultation. According to standard economic theory, owing to declining marginal utility, the WTA/WTP ratio is an increasing function of the size of the change valued [44]. As a consequence, the WTA/WTP ratio is anticipated to be larger when the changes in healthcare goods and services to be valued are truly substantive, such as a year of informal care or an orphan drug. To assess the degree to which the WTA/WTP ratio for healthcare goods and services is an increasing function of the size of the change valued, we recommend future research to estimate the WTA/WTP ratio for differently sized changes in the healthcare good or service concerned.

Furthermore, another possible explanation for the relatively low WTA/WTP ratio found in our review may be that subjects were quite familiar with the goods being valued. Three studies asked informal caregivers and/or informal care recipients to value informal care. Furthermore, two studies valued primary care (general practitioner or nurse), which is a type of care many people are familiar with. If people are more familiar with the goods they value, they are more certain about their preferences and therefore report WTA and WTP values that are closer together [45]. Furthermore, many studies in this meta-analysis elicited WTA and WTP in the same questionnaire. Therefore, subjects could have used one of the measures as a reference for the other.

4.2 Implications of our Findings

The results of this study imply that losses in healthcare goods or services are valued somewhat differently from similarly sized gains in healthcare goods and services. This may have implications for cost-benefit analyses of healthcare interventions. In cost-benefit analyses, the welfare effect of healthcare interventions is transformed into monetary units using the WTP for gains in healthcare and the WTA for losses in healthcare. However, as shown, losses in healthcare have a different weight than gains in healthcare. There has been considerable debate across different economic sectors on whether WTA or WTP should be used in the context of losses. Some authors, such as those from the National Oceanic and Atmospheric Administration Panel on Contingent Valuation, argue that WTP should always be used because WTA is biased and WTP constitutes a more conservative estimate of welfare change [46]. Others argue that WTA is valid and, hence, that the most accurate measure of welfare change depends on the direction of the change from the reference point [47, 48]. This debate is still ongoing and our study does not provide any conclusive answers to resolve this issue.

Furthermore, our findings may have implications for reimbursement decision making based on cost-effectiveness/cost-utility analyses. Although the effects of healthcare goods and services are expressed in health units in cost-effectiveness analyses and in quality-adjusted life-years in cost-utility analyses, WTA and WTP still need to be used to make reimbursement decisions based on these analyses. In many countries, implicit or explicit thresholds for the WTP for additional health outcomes have been used in reimbursement decision making. For instance, the National Institute for Health and Care Excellence in England and Wales uses a threshold of £20,000–£30,000 per quality-adjusted life-year gained [49, 50], and the National Health Care Institute in the Netherlands uses a threshold of €20,000–€80,000 per quality-adjusted life-year gained, depending on disease severity [51]. However, a threshold for the WTA for a loss in health does not exist. Therefore, the WTP threshold has often been used for such decisions [52]. However, as our study shows, the WTA for healthcare goods and services is somewhat higher than WTP. Therefore, to align policy with societal preferences, one might argue to use a somewhat higher threshold in the domain of losses compared to the domain of gains.

To this end, Severens et al. suggested to use a modified cost-effectiveness acceptability curve approach to provide insight into the impact of the WTA-WTP disparity on the probability of an intervention being cost effective. This information could then be incorporated in reimbursement decision making, facilitating a societal debate on this issue [53]. However, others have suggested that the same threshold should be used for decisions in the context of gains and losses, as using different thresholds may introduce substantial inefficiencies in the allocation of the healthcare budget [54,55,56]. Hence, whether the WTA-WTP disparity should be incorporated in healthcare policy making is a political trade-off between aligning policy with societal preferences on the one hand, and stimulating efficiency in the allocation of healthcare budgets on the other hand.

Furthermore, the results of this study can also be used to better understand problems with disinvestment, which is the full/partial withdrawal of the reimbursement of healthcare interventions [57]. Decisions on disinvestment have often been perceived to be much more difficult than decisions on (not) starting reimbursement of healthcare [58, 59], a phenomenon that has also been observed in the context of conditional reimbursement [21]. In this study, we found a small disparity between WTA and WTP, implying that, in the healthcare context, people attach more value to losses than to gains. This may also partly explain the perceived difficulty of disinvestment compared to investment as the former is in the domain of losses and the latter is in the domain of gains.

4.3 Strengths and Limitations

In this study, we used a systematic approach to estimate the WTA/WTP ratio for healthcare goods and services. The eligibility criteria were strictly applied to derive WTA and WTP estimates that were based on a similar change and elicited in the same manner. In this approach, we ensured that the WTA/WTP ratios derived were not biased by incomparable WTA and WTP scenarios. Furthermore, by combining data from different studies in our meta-analysis, we were able to obtain a higher level of evidence and more insight into the uncertainty surrounding the disparity between WTA and WTP than previous studies did.

Our study, however, also has some limitations. First, the studies included in our meta-analysis were quite heterogeneous as different (changes in) healthcare goods and services were valued by different subject groups using different elicitation and administration methods. Furthermore, studies were conducted in different settings. Because of the small number of studies available, we were not able to test the effect of these different settings and methods on the WTA/WTP ratio for healthcare goods and services. Therefore, more studies on the WTA/WTP ratio for healthcare goods and services are needed to obtain more insight into this issue.

Second, as we have not tested the quality of the included studies, we were not able to weight the study estimates based on their quality. However, we are not aware of any quality assessment instrument applicable to WTA/WTP studies, hampering the incorporation of study quality in the analyses.

Third, although we were able to include the largest studies from our review in the IPD-MA, the number of studies included in this meta-analysis is still quite small. Furthermore, most studies included in the IPD-MA valued informal care or primary care services. Therefore, our results cannot be generalized to all healthcare goods and services in general. More research is needed to obtain insight into the WTA/WTP ratio for a broader range of healthcare goods and services.

Fourth, in the mixed model, we calculated the WTA/WTP ratio using the mean of ratios approach. We are aware that using the ratio of means approach instead could have resulted in a different estimate of the WTA/WTP ratio [60, 61]. However, because of differences in the goods and services valued in the included studies, we were not able to use the ratio of means approach and to determine the effect of using one approach over the other on the WTA/WTP ratio.

Fifth, in our analysis, we assumed the association between age and ln(WTA/WTP) to be linear. However, some studies showed small deviations from this assumption. Nonetheless, as correcting for non-linearity would not result in significantly improved model fits, we decided not to correct for this, applying the credo: “as simple as possible, as complex as necessary”.

Finally, we have used the median as a cut-off point to transform the income data into two categories. Although there was no better option to combine the income data, this approach may have hampered the interpretation of the effect of income. The reason for this is that the study population may not reflect the general population in terms of income. For instance, in the study on the valuation of methadone maintenance, it is imaginable that the respondents had a relatively low income. The implication of this would be that our income categories based on a median income actually represent a very low vs a quite low income.

5 Conclusions and Recommendations

This study found aggregated WTA/WTP ratios between 1.58 and 1.86 for healthcare goods and services, indicating that losses are weighted somewhat differently from gains. The ratio found depends on the method used to calculate the WTA/WTP ratio and the approach on how to deal with subjects reporting zero WTP and/or WTA. Irrespective of the method used, the ratios found in our meta-analysis were smaller than the ratios found in previous meta-analyses. For this reason, the WTA-WTP disparity in the healthcare sector may be less of a problem than what was thought based on previous studies. However, we cannot exclude the possibility that the relatively small disparity found is related to the fact that the studies in our review valued relatively small gains and losses in healthcare goods and services, with which subjects were quite familiar. Future empirical work may explicitly test the effect of size of the change valued on the WTA/WTP ratio through a within-person assessment of differently sized changes in healthcare goods and services. Furthermore, we recommend future research to pay attention to the reasons behind zero WTA and WTP responses and the best methodological means of dealing with these responses in the analysis.