FormalPara Key Points for Decision Makers

The estimated willingness to pay for an additional quality-adjusted life-year (WTP/Q) lies in the range of the threshold proposed by the WHO for developing countries (1.75 times GDP per capita versus 1–3 times GDP per capita).

The WTP/Q was found to be determined by age, income, pain/discomfort, and anxiety/depression dimensions of EQ-5D-5L, QALY size and certainty of health outcomes.

Different thresholds for quality-of-life improvement and end-of-life therapies should be considered.

1 Introduction

In China, a series of health policy reforms in recent years have promoted economic evaluation. Since 2019, evidence of economic evaluation has been required by the National Health Insurance Administration for price negotiations. Guidelines for conducting economic evaluations in China were first published in 2011 and are updated as methods evolve [1]. The rules for general economic evaluation decisions suggest that a treatment is considered cost effective if the incremental cost-effectiveness ratio (ICER) of the cost effectiveness analysis (CEA) falls below a certain cost-effectiveness threshold. Currently, the threshold in China is 1–3 times the gross domestic product (GDP) per capita per disability-adjusted life year (DALY) averted [1], which was once recommended by the World Health Organization (WHO) but no longer applies [2]. A related question is how much health departments in China should pay to improve health, which can be measured in quality-adjusted life-years (QALYs) [3].

The methods for determining this threshold are controversial. In general, two ways of estimating the threshold are proposed, namely the demand-side perspective and the supply-side perspective [4]. The supply-side threshold reflects the health opportunity cost, meaning that whether an intervention is worth its fee depends on whether the quantity of health outcomes it produces is greater than the health outcomes that would have been achieved if a different intervention had been funded [5]. However, the demand-side strategies are consistent with the approach taken in other public sectors in the context of conventional welfare economics, where the monetary value of an additional QALY is measured as the willingness to pay (WTP) per QALY (WTP/Q) of people from society. It was advocated that WTP/Q can help in resource allocation within the health sector as well as between sectors [6].

There is an ongoing debate about appropriate ways of estimating WTP/Q. A major problem raised against these methods is the theoretically unachievable task of linear translation of QALYs to WTP [4]. Two conditions that must be met are scope insensitivity and independence of severity. The former refers to the sensitivity of the WTP to the scope of the good, the latter tests the extent to which the severity of the initial health condition affects the valuation [7]. A whole series of empirical studies have shown that WTP/Q varies according to severity and scope [8,9,10,11]. For example, Sund and Svensson pointed out that the standard assumption that the more QALYs the better can only partially apply to the largest improvements [11]. It was also found that larger QALY changes lead to lower WTP/Q estimates. Recently, Kouakou and Poder [12] and Ye et al. [13] published similar reviews with meta-regression analysis on this topic, implying age, income and mode of administration (online or face-to-face interview) have significant effects on WTP/Q. All these published results raise the question of how we determine the WTP/Q that can represent the monetary value of health from the population's point of view. We believe that the essential but probably insufficient way to determine WTP/Q requires a variety of scenarios and a representative sample.

Some empirical research on cost-effective thresholds has been conducted in China. Ochalek and colleagues [14] calculated the cost per DALY averted as 27,923–52,247 (2017 Renminbi [RMB]) (central estimate RMB 37,446) from the supply-side perspective. Nevertheless, the less common measure of health-related outcome, DALY, was used in this study. Therefore, this threshold could only be used to a limited extent for economic evaluations. Using the published literature, Cai et al. [15] first estimated the value of statistical life and then calculated the cost-effective threshold to be 1.5 times the GDP per capita. Furthermore, Zhao et al [16] proved the feasibility of quantifying the monetary value per QALY with a WTP approach. However, due to the small sample size, they did not succeed in determining a reliable threshold.

The aim of this study was to investigate the monetary value for an additional QALY by eliciting the WTP of citizens for a given health scenario in China. First, we conducted a population‑based survey with quota sampling and snowball sampling. In addition, two-part regression models were used to test validity and how different factors affect WTP/Q.

2 Materials and Methods

In general, there are two approaches to determining the WTP/Q, namely the direct and the indirect approach. In the direct approach, a sample of the population is surveyed to indicate their WTP preferences for health gain. A set of hypothetical scenarios is used to assign a monetary value to the improvement in quality of life or life extension. The indirect approach is also known as the Value of Statistical Life (VSL) method, where WTP/Q is derived by estimating the WTP for mortality risk reduction.

Since the indirect method leads to a very high WTP/Q due to the income constraint not being taken into account, we focus on the direct approach in this study. The design of this study is based on previous publications [8,9,10,11, 17], including questionnaires for different types of QALY (quality-of-life improvement and life extension), different QALY sizes and different certainty (50%, 75% and 100%) of health outcome.

2.1 Study Design and Sample

The monetary value of a QALY was estimated using a contingent valuation survey (1 October 2020 to 25 January 2021) by telephone (a mobile app ‘WeChat’) and face to faceFootnote 1. As reported in previous research, the response rate of probability sampling was low [18]. Therefore, quota sampling and snowball sampling with quotas based on gender, age, income, place of residence, and education were used in the final survey. The quotas were calculated according to the China Statistical Yearbook 2020 [19]. We first recruited samplesFootnote 2 that matched our quotas. Then we asked each respondent to recommend two people for this survey and those who matched our quotas were further interviewed.

The interviewers in this survey are undergraduate or master’s students. All interviewers were required to attend two training sessions and pass the final test. Respondents were personally recruited by team interviewers. We then interviewed those who met the quotas. All subjects were asked for their full consent to participate in the study and no financial incentives were offered.

2.2 Questionnaire Design

The structure of the questionnaire can be found in Fig. 1. Each questionnaire contains five sections, including the introduction, hypothetical scenarios, the demographic section, and the ending. An example of the questionnaire translated from Chinese can be found in Online Resource 1 in the electronic supplementary material (see ESM_1). First, the introduction contained a brief explanation of this survey. Then, the EQ-5D-5LFootnote 3 was used to measure the respondents’ health status. The third part consists of two hypothetical health conditions in which we request participants to state the maximum amount they would be willing to pay for the treatment of each hypothetical condition. The demographic section included five demographic questions on age, gender, education, place of residence, and annual family income per person.

Fig. 1
figure 1

Questionnaire structure

The overview of all 18 hypothetical scenarios can be found in Table 1Footnote 4. In total, there are 9 versions of questionnaires, each containing 2 scenarios. This survey combined two ways of demonstrating health improvement, namely life extension (terminal illness) and quality-of-life improvement (mild, moderate and severe disease states).

Table 1 Description of different types of scenarios

When asked about mild, moderate and severe WTP, a hypothetical scenario with description of EQ-5D-5L (the health conditions listed in Table 1) was explained to participants. Without any treatment, they would live XX months with the described health state. After XX months, they would recover completely/with 75% probability/with 50% probability. The terminal illness scenario reflects the assumption that participants have a terminal illness and have life length of 3 months (EQ-5D-5L description: 44332). A newly developed treatment could have a 100%/75%/50% chance of extending life expectancy in this severe health state by XX months. For each hypothetical health state, WTP was measured by respondents’ willingness to purchase the treatment.

The concept of probability was explained directly, followed by a simple analogy. For example: “...you have a 75% chance of getting perfectly well immediately if you receive the new treatment, which means that in a similar situation, out of four people taking the medicine, three will get perfectly well immediately...”

The WTP payment was described as the amount of out-of-pocket costs for the purchase of an adopted therapy. Participants were asked if they would pay for the therapy. Those who said “no” were then asked to give their reasons. If the answer was “yes’, the participant was asked to indicate the maximum amount he or she was willing to pay out of pocket using a payment card. In this study, respondents were offered a payment card with total amounts ordered from low to high. The payment card had the following categories: RMB 3200 (5% of Chinese GDP per capita, USD 478), RMB 6450 (10% of Chinese GDP per capita, USD 963), RMB 12,900 (20% of Chinese GDP per capita, USD 1925), RMB 25,800 (40% of Chinese GDP per capita, USD 3851), RMB 51,600 (80% of Chinese GDP per capita, USD 7701), RMB 77,400 (120% of Chinese GDP per capita, USD 11,064), RMB 103,200 (160% of Chinese GDP per capita, USD 15,403), RMB 206,400 (320% of Chinese GDP per capita, USD 30,806) and RMB 309,600 (480% of Chinese GDP per capita, USD 46,209). Those who could not provide their payment card WTP were asked an open-ended question. To increase the chance of getting reliable estimates, a follow-up question was asked to clarify whether they were really sure about their WTP, with the options: very sure; sure; neutral; not sure; not sure at all. We also specified the following conditions to each respondent to clarify the assumed situation: therapy is not covered by statutory health insurance; loss of income due to the disease does not have to be taken into account.

2.3 Data Analysis

2.3.1 Zero Willingness to Pay (WTP), Protest Values and Irrational Values

A zero value means that the respondent is not willing to pay anything to purchase a hypothetical intervention. If the respondent refuses to state his or her maximum WTP, the result of the WTP survey (often the zero value) is called a protest value. Irrational values are those where respondents were willing to pay a higher amount for smaller health outcomes than for larger health outcomes. Since protest values and irrational values cannot reflect the true WTP of respondents, they are not included in the main results but in the sensitivity analysis. The definition of protest zero value in this study has three components: first, when respondents refused to state any reason for zero WTP; second, when the respondents refused to pay anything for both WTP questions; third, when the respondents clearly indicated that the intervention should be reimbursed by health insurance, according to the literature [20]

2.3.2 Calculation of WTP per Quality-Adjusted Life-Year (QALY)

In this study, three different probabilities (100%/75%/50%) for the health outcome were considered. According to prospect theory, people tend to assign a large value to small chances and a small value to large chances. If this weighting of probabilities is not corrected, this can lead to incorrect estimates of WTP/Q [10]. Therefore, it is necessary to adjust the probability. In this study, Kahneman and Tversky’s [21] estimating equation was used to adjust the probability when the probability of the health outcome occurring is < 100%. The parameter equation is as follows, where γ = 0.674 [22]:

$$\omega \left( p \right) = \frac{{p^{\gamma } }}{{\left[ {p^{\gamma } + (1 - p)^{\gamma } } \right]^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 \gamma }}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$\gamma $}}}} }}.$$

The disaggregated method was used in this study because it takes into account the heterogeneity of preferences and the marginal rate of substitution of individual WTP/Q between health and money [23, 24]. This means that first the individual WTP/Q was calculated and subsequently the mean value of WTP/Q was estimated. The descriptive statistics (mean, SD, median, interquartile range, minimum, maximum) of the WTP/Q values were calculated for the whole sample and subgroups. We also conducted a sensitivity analysis as follows: first, we included all protest values and irrational values; second, to reduce the effect on means of extreme values and zero values, we excluded all 0 values and the top 1% of maximum WTP/Q values [25]; furthermore, we removed the WTP/Q responses if the answer to the follow-up certainty question was “not sure” or “not at all sure.”

2.3.3 Determinants of WTP per QALY Gain

In this study, an attempt was made to build a multiple determinant model of thresholds, which included three parts: socio-demographic characteristics, health status of respondents, and aspects related to hypothetical scenarios, as shown in Fig. 2. Considering the zero WTP values, a two-part regression model was conducted. In the first part, the probit method was used to estimate the deterministic of the probability of a yes response to the WTP questions; in the second part, the generalized linear model (GLM) with gamma distribution and log function was adopted to estimate the individual’s WTP/Q.Footnote 5 All samples contained two responses from each person, hence we allowed for clustering at the individual level. We first performed univariate regression for all variables in Online Resource 2 (see ESM_2). Statistically significant variables in univariate regressions and factors mentioned in previous research were included in the multivariate regression [12, 13]. We also conducted subgroup regression as well as sensitivity analyses for the whole sample and the improving quality of life and extending life length subgroups.

Fig. 2
figure 2

Determinants of threshold

3 Results

3.1 Demographic Characteristics of Respondents

Table 2 shows the demographic characteristics of the respondents. A total of 2008 individuals were included in this survey, of which 87.5% were interviewed by telephone, and 12.5% were interviewed face to face. In this survey, 52% of the respondents are female, which is consistent with the general Chinese population from the China Statistical Yearbook 2020 [19]. The respondents are between 18 and 65 years old. In the sample, 18- to 34-year-olds are slightly overrepresented (40.6% compared with 35.3%), and 35- to 54-year-olds are slightly underrepresented (42.4% compared with 47.05%). In terms of educational background, 445 (22%) of the respondents reported that they had attended primary school or less, which is in line with the general Chinese population. However, those with only a junior school degree are underrepresented while those with a diploma of high school or higher are overrepresented. Sixty-five percent of respondents live in urban areas, which is similar to the general population (60%). Middle-income groups are slightly overrepresented, whereas annual income groups of less than RMB 12,000 (USD 1791) and more than RMB 60,000 (USD 8955) are underrepresented.

Table 2 Demographic characteristics of survey respondents

3.2 Descriptive Statistics of WTP and WTP per QALY (WTP/Q)

Of the 4016 WTP responses in this survey, there are 662 (16.5%) zero responses and 270 (6.7%) responses are considered protest zeros according to the aforementioned definition. In addition, 481 (12%) WTP responses are classified as irrational responses. A total of 3265 WTP responses were used for further analysis, excluding protest responses and irrational responses. Specifically, 2135 (65%) WTP responses are related to the quality-of-life improvement scenarios (mild 495 [15%], moderate 737 [23%] and severe 903 [28%] disease states). The rest of the responses are regarding life extension scenarios (1130, 35%).

Figure 3 is the distribution of payment card selections. As can be seen from Fig. 3, more respondents chose the lower values while fewer chose higher WTP values. Considering the relatively even distribution of WTP responses to RMB 3200 (USD 448), RMB 6450 (USD 963), and RMB 12,900 (USD 1925), it is reasonable to assume that the bias of starting point bias and range in this research had little impact on the WTP survey.

Fig 3
figure 3

Distribution of payment card selections

Table 3 demonstrates the descriptive statistics of WTP/Q of the whole sample and subgroups. The average WTP/Q for the whole sample is RMB 113,120 (USD 16,884) (range from RMB 108,802 to RMB 129,788 in the sensitivity analysis), which equals to 1.75 times GDP per capita, while the median is RMB 36,236 (USD 5408), which equals to 0.65 times GDP per capita. For quality-of-life improvement scenarios, the mean WTP/Q is RMB 78,907 (USD 11,777, 1.22 times GDP per capita). However, the mean WTP/Q of life extension scenarios is significantly higher than that of quality-of-life improvement scenarios, at RMB 177,761 (USD 26,531, 2.76 times GDP per capita).

Table 3 Results of two-part regression models for the whole sample

3.3 Determinants of WTP/Q

According to the results of the univariate regressions as well as the literature theory, the following variables were included in the final two-part regression models: gender, age, education, place of residence, income, EQ-5D-5L utility, the two dimensions of EQ-5D-5L (pain/discomfort and anxiety/depression), the size of the QALY, aspects related to hypothetical scenarios (the quality-of-life improvement or life extension scenarios) and certainty of health outcomes. However, insignificant variables were omitted from Table 3 (whole sample) and Table 4 (subgroups). The results of the sensitivity analysis can be found in Online Resource 3 (see ESM_3).

Table 4 Results of two-part regression models for subgroups

3.3.1 Two-Part Regression for the Whole Sample

Table 3 displays results from the two-part regressions for the whole sample. The first column represents the coefficients from the probit model (the first part of the model) that estimates the probability of reporting any positive WTP amount. The third column represents the coefficients from the second part, the GLM model (second part of the model) estimating the relationship among those who reported a positive WTP amount. The model predicted a mean WTP/Q of RMB 113,209 (range from 108,819 to 128,915 in the sensitivity analysis) which was similar to the descriptive mean displayed in Table 5.

Table 5 Descriptive statistics of willingness to pay for one additional quality‑adjusted life‑year

In the first part, the results show that higher education and higher income are associated with a higher probability of positive WTP. Compared with individuals without depression, respondents with moderate depression have a lower probability of positive WTP. A larger QALY value increases the probability of a positive WTP. WTP questions related to life extension were also less likely to have a positive WTP. In addition, the probability of a positive WTP was higher when health outcome had a probability of 50% when compared with a probability of 75%.

The second part of the regression shows the factors influencing the magnitude of WTP/Q. Age was associated with a lower WTP/Q. In addition, WTP/Q was higher when respondents had higher education and household income. In addition, respondents with slight or severe pain/discomfort had a lower WTP/Q than respondents who did not have pain/discomfort. QALY size was positively related to WTP/Q. Compared with the scenarios of improving quality of life, the life extension scenarios had a much higher WTP/Q, which is inconsistent with descriptive statistics. Furthermore, WTP/Q was higher when the health outcome had a 50% probability of occurring than when the health outcome was 100% certain.

3.3.2 Two-Part Regression for Subgroups

Subgroup modelling demonstrated the different determinants of WTP/Q in terms of quality-of-life improvement and life extension, which is displayed in Table 4. The mean estimates from the two-part regression models were RMB 75,116 (72,853–82,371) for quality-of-life improvement and RMB 189,374 (175,621–227,388) for life extension.

In the first part, the results show that income is associated with a higher probability of positive WTP for quality-of-life improvement, but not for life extension. In contrast to the results for the whole sample, the probability of positive WTP was lower for 100% of the possibility of a health outcome regarding life extension than for 50% of the possibility. The second part of the regression showed that age was negatively associated with WTP/Q for quality of life improvement but not for life extension. Furthermore, certainty of health outcome is negatively related to WTP/Q only for the life extension scenarios.

4 Discussion

In this paper, we report the results of a survey to estimate WTP for a QALY gain among the Chinese population. A total of 2008 individuals were included in this survey. The mean WTP/Q for the entire sample was RMB 113,120 (USD 16,884, 1.75 times GDP per capita). For the quality-of-life improvement scenarios, the mean WTP/Q is 78,907 RMB (USD 11,777, 1.22 times GDP per capita), for the life extension scenario, the average WTP/Q is 177,761 RMB (USD 26,531, 2.76 times GDP per capita). Two-part regression models indicated that the determinants of WTP/Q were age, income, pain/discomfort, and anxiety/depression dimensions of EQ-5D-5L, QALY size and certainty of health outcomes.

It is crucial to ensure the validity and practical usefulness of our empirical results. Testing theoretical validity is the most popular validity test for WTP surveys because it is comparatively easy to perform. Theoretically, a positive effect of income on WTP/Q would argue for the theoretical validity of WTP research [8, 9, 13]. Moreover, it was assumed that worse health state scenario [26,27,28] and smaller QALY gain [4, 26, 30, 31] should lead to a higher WTP/Q. Our regressions confirmed previous results and proved the theoretical validity of our estimates: WTP/Q was higher when respondents had higher incomes, when analyzing both the whole sample and subgroups; the mean WTP/Q value for life extension is almost 2.25 times that for quality-of-life improvement; negative effect of QALY gain on WTP/Q was observed. Furthermore, a comparison of our results with those of a very comparable study would provide further evidence of the validity of this research. Cai et al. [15] estimated the monetary value of a QALY based on the value of statistical life to be 1.5 times the GDP per capita in China, equivalent to almost RMB 100,000 (USD 14,952). Song and Lee [29] found that the average WTP/Q in South Korea was about RMB 145,000 (USD 21,642). Both studies agreed with our estimates.

It is important for policy makers in health economics to know the public's WTP value for a QALY. Our study provides empirical evidence on WTP/Q that questions the appropriateness of the WHO’s recommendation. According to the systematic review published by Butt and colleagues [30], the most commonly used threshold was three times GDP per capita. As we have shown, estimates of WTP/Q can vary considerably depending on the specific subgroups and hypothetical scenarios. The large discrepancy between the quality-of-life improvement and life extension scenarios leads to the question of whether we should adopt a higher threshold for end-of-life treatment. For example, the National Institute for Health and Care Excellence (NICE) standard threshold is £20,000 to £30,000, but for end-of-life interventions the threshold is £50,000 [31]. Whether and how this heterogeneity should be reflected in health care decision making in low- and middle-income countries is a question beyond the scope of this paper, but clearly a subject for future research.

Respondents who refused to pay were assigned a WTP of zero. Zero responses were quite common in all WTP surveys. For instance, in the European Value of a QALY (EuroQol) study [25], about 30% of the total responses were zero WTP. Similarly, in the study by Shafie and colleagues [26], almost half of the responses were zero. In this research, 16.5% of responses were zero, of which almost a third are defined as protest values. Twelve percent of WTP responses are classified as irrational responses because a lower WTP value was given for a higher benefit. In general, protesters and irrational respondents are not considered to have accepted or understood the scenario and they should therefore be excluded for the main analysis. However, we included all protesters and irrational respondents in the sensitivity analysis, which showed limited impact on the final estimates.

There are a number of limitations of this study that need to be mentioned. First, the final sample was broadly representative of the 18–65 years age group, but with a significant overrepresentation of middle-income groups and an underrepresentation of low-education groups. Both income and education have been shown to be positively related to WTP/Q, which is consistent with previous findings. Therefore, our study may overestimate the true WTP/Q of the population. Furthermore, a snowballing method was used to recruit the samples, which was a very effective method given the very limited funds. Nevertheless, there is a high probability that unobserved variables influence the representativeness of the sample, leading to problems with external validity. Third, the mode of this survey might influence our results. According to recent publications [12, 13], surveys conducted over the internet or telephone are more likely to have a significantly higher value for WTP/Q than face-to-face surveys. This is what we call mode bias in surveys. Almost 90% of our survey was conducted over the phone, the rest were face-to-face interviews. However, it is still not clear which mode better represents the true value. Fourth, this is a cross-sectional study limited to one survey at one point in time, conducted under the influence of the coronavirus pandemic; longitudinal time-series data would be required for robust WTP/Q estimation. Finally, an important limitation is the hypothetical nature of all WTP surveys [32]. It can be difficult for individuals to imagine an assumed scenario that differs from conditions they have experienced.

5 Conclusion

Our results suggest that the mean WTP/Q for the entire sample was RMB 113,120 (USD 16,884, 1.75 times GDP per capita). For the quality-of-life improvement scenarios, the mean WTP/Q was RMB 78,907 (USD 11,777, 1.22 times GDP per capita), for the life extension scenario, the average WTP/Q was RMB 177,761 (USD 26,531, 2.76 times GDP per capita). WTP/Q was found to be determined by age, income, pain/discomfort, and anxiety/depression dimensions of EQ-5D-5L, QALY size and certainty of health outcomes. We recommend that future researchers seek to explore the WTP/Q of major diseases from multiple perspectives. Furthermore, different thresholds for quality-of-life improvement and end-of-life therapies should be considered.