FormalPara Key Points for Decision Makers

The first series of studies estimating value sets for the five-level EuroQol-five dimensions (EQ-5D-5L) questionnaire in selected countries identified areas for improvement. A feedback module is now included after the composite time trade-off (C-TTO) tasks in a new wave of country-specific EQ-5D-5L value set studies.

This study shows that using a feedback module to allow participants to identify and exclude incorrect C-TTO responses reduces the number of inconsistent responses and improves both the quality of the data and the estimation of an EQ-5D-5L value set.

This study also reports an EQ-5D-5L value set for Hong Kong that can be used to measure the impact of interventions on health-related quality of life in economic evaluations for resource allocation in this jurisdiction.

1 Introduction

The EuroQol-five dimensions (EQ-5D) questionnaire is a generic preference-based patient-reported outcome measure (PROM) used to capture health-related quality of life [1]. It is the most widely used PROM instrument when measuring health benefits in economic evaluations of health technologies worldwide [2]. The original EQ-5D (also known as the EQ-5D-3L) questionnaire included three levels in each dimension (no problems, some problems, and extreme problems) and defined 243 (35) health states. The EQ-5D-3L questionnaire has been used for almost two decades and has been shown to be a reliable and valid PROM in areas such as diabetes and cancer [3, 4]. However, evidence about the measurement properties of the EQ-5D-3L in other diseases such as mental health is mixed [5, 6]. Researchers have also indicated that the three-level version of the EQ-5D may not capture minor changes between two different health states in patients with milder conditions [7]. In response to these concerns, a five-level version of the EQ-5D was developed [7]. The five-level version (known as the EQ-5D-5L) included the original five dimensions but increased the levels to no problems, slight problems, moderate problems, severe problems, extreme problems/unable to, defining 3125 (55) health states. New valuation exercises were therefore needed to create value sets for this five-level version. Based on experience from previous EQ-5D-3L valuation exercises, the EuroQol Group developed an international valuation protocol. The objectives of the protocol were to improve the comparability of value sets between countries and the identification of cross-cultural differences of health states between jurisdictions [8]. The original version (known as version 1) of the protocol [8] was developed using a large multi-country pilot project assessing the performance of different elicitation methods and different modes of administration of health state valuations [9]. The main novelties of that protocol compared with previous EQ-5D-3L valuation exercises were the use of the composite time trade-off (C-TTO) [10, 11] and a discrete-choice experiment (DCE) as elicitation techniques [12, 13]. Valuation exercises using version 1 identified data quality issues in the C-TTO responses [14]; to correct for this, version 2 of the valuation protocol incorporated a feedback module (FM) [15].

In this FM, participants were presented with the rank ordering of the health states based on their C-TTO valuations and asked to exclude health state valuations they considered did not have the appropriate location in the ranking. The inclusion of the FM in this new version was based on the hypothesis that it would reduce the number of C-TTO inconsistencies but was not linked to any theoretical model of economic or human behavior [15]. C-TTO exercises are complex, and the sequential nature of the task is subject to challenges that affect comparability [16,17,18]. It has been shown that the majority of participants completing TTO tasks need interviewer help [19] and, as most individuals have never previously completed a C-TTO exercise, the level of understanding is expected to vary between individuals. Interviewers play an important role in explaining the task to participants, but it is expected that the cognitive ability of the person also contributes to this understanding. A recent qualitative study reported that individuals with inadequate health literacy were more likely to provide inconsistent C-TTO valuations [20]. This can be associated with theoretical models that link cognitive abilities and economic behavior [21]. A line of thinking in experimental economics investigates whether individual heterogeneity in decision making can be explained by the cognitive ability of the subjects [22,23,24]. A recent special issue in the Journal of Behavioral and Experimental Economics included the most recent developments in the area [21]. Research conducted to date has found evidence that individuals with higher cognitive abilities are more likely to undertake optimal strategic actions in games where behavior is not affected by social preferences (e.g., trust) [25,26,27,28]. As part of the C-TTO task, the FM is a tool that facilitates participants to select their final responses to the exercise. It is possible that subjects with higher cognitive skills are more likely than those with lower cognitive skills to better understand the task and agree with their final responses without engaging with the FM. However, it is also possible that subjects with higher cognitive skills perceive the FM as a tool to refine their final C-TTO responses, identifying inconsistent responses and engaging more with the tool than individuals with lower cognitive skills.

Version 2 of the protocol was recently employed in a valuation exercise in Hong Kong. We report the results of that valuation exercise and evaluate the impact of using an FM when modelling an EQ-5D-5L value set for Hong Kong. We hypothesize that whether a participant engages with the FM depends on their cognitive ability and background characteristics.

2 Methods

Ethics approval to conduct this study was obtained from the Joint Chinese University of Hong Kong—New Territories East Cluster Clinical Research Ethics Committee (CUHK-NTEC; Approval no: CRE-2013.464).

2.1 The EQ-5D-5L Hong Kong Valuation Exercise

The EuroQol valuation protocol version 2 was followed in this valuation study and included five sections: (1) a general welcome to the participant; (2) introduction to the context of the research and completion of self-reported health using the EQ-5D-5L descriptive system and visual analog scale (VAS), and background information; (3) ten C-TTO tasks and an FM; (4) seven DCE tasks; and (5) a general thank you and goodbye. The implementation of the interview protocol was facilitated using the EuroQol Valuation Technology version 2 (EQ-VT 2.0) software [8].

2.1.1 Sample and Data Collection

The EuroQol valuation protocol recommended the collection of 10,000 C-TTO responses, and—given that each participant valued ten health states—suggested a minimum of 1000 respondents [29]. Sample size calculations were based on obtaining mean C-TTO values with some level of precision; details of these power calculations have been reported elsewhere [8]. Data were collected between June 2014 and October 2015, and the 2016 Hong Kong census was used to evaluate the representativeness of our sample. A representative sample in terms of sex, age, and highest educational attainment was recruited from 18 geographical districts throughout Hong Kong. The survey included Cantonese-speaking Hong Kong residents aged ≥ 18 years. Face-to-face interviews were conducted in community centers in each of the geographical regions included in the study. The study was advertised in the community centers using leaflets and posters that included information about the research being conducted and how to participate in the survey. Participants received an incentive of supermarket coupons to the value of $HK50 ($US6.5, £5). Interviews were conducted by a team of six interviewers well-trained in the use of EQ-VT software. All the interviewer’s training material, originally prepared in English by the EuroQol Group for previous valuation exercises, was professionally translated to Hong Kong Chinese.

2.1.2 Methods to Elicit Preferences

We used a C-TTO and a DCE to obtain preferences in this study. These techniques have already been applied in several EQ-5D-5L valuation exercises across different countries [29,30,31,32,33,34].

The C-TTO involved using the conventional TTO for the health states better than dead and lead-time TTO for states worse than dead [35, 36]. The TTO tasks included 86 EQ-5D-5L health states identified through simulation and divided in ten blocks with similar severity levels. All the blocks were constructed to include one very mild state (one dimension at level 2, the remaining dimensions at level 1), the state 55555, and a set of intermediate health states. Participants were randomized to one of the ten blocks, and the order in which the states were presented was also randomized.

The use of DCEs is a complementary approach to elicit individual preferences [12, 13, 37]. It has been suggested that a DCE is able to measure health preferences using a test that is cognitively easier to understand [38]. A recent multinational study showed the feasibility of using DCEs in the context of EQ-5D-5L valuations [39]. However, the values obtained from DC models are not on the same scale 0 (death)–1 (full health) as its TTO counterpart and need to be re-scaled for any subsequent quality-adjusted life-year (QALY) calculations. In a DCE, respondents are presented with multiple (usually two) health states and asked to indicate which they prefer. Our DCE included 196 pairs of EQ-5D-5L states divided into 28 blocks with similar severity representation using an efficient Bayesian design [40], and each block included seven DC pairs. Participants were randomly allocated to one of the blocks, and the order of the pairs and the position of the health states (i.e., left or right) were also randomized.

2.1.3 Quality Control

EQ-VT 2.0 included a data quality control process to monitor data collection using the EQ-VT QC tool [14]. The quality control process included different measures to monitor how C-TTO and DCE information was collected. These measures were presented in a weekly quality control report that was generated to identify any suspicious or questionable performances by interviewers. Additional information about this quality control process is available upon request.

2.1.4 Feedback Module (FM)

Version 2 of the protocol also included an FM for C-TTO tasks. This FM consisted of presenting each respondent with a ranking of the ten health states ordered by their own C-TTO responses and provided the opportunity to indicate and exclude any state that—in their opinion—did not have the correct ranking position (Fig. 1). No option for re-valuing the selected states by the participant was allowed. We argue here that the use of an FM as part of the C-TTO task can be supported using theoretical models that link cognitive abilities and economic behavior [21]. In particular, we hypothesize that the engagement of the subject with the FM depends on their cognitive skills and background characteristics. To test such an hypothesis, an explicit measure of cognitive skills and a number of participant characteristics were needed. The protocol used in valuation exercises for the EQ-5D-5L instrument did not include an explicit measure of cognitive skill and included the following background characteristics: age, sex, whether the participant had experience with serious illness, educational level, marital status, and whether the participant was diagnosed with a chronic condition. Educational level was used in this study as a proxy for cognitive skills, as research has shown that cognitive skill is a good predictor of academic achievement [41, 42]. Therefore, in this study, we assumed that participants with higher educational attainment had greater cognitive skills.

Fig. 1
figure 1

Screenshot of feedback module of composite time trade-off (C-TTO) responses (reproduced with permission from EuroQol Group)

We first explored whether the background characteristics of participants selecting and excluding states during the FM differed from those not engaging (not selecting and excluding states). Then we identified the number of inconsistent health states before and after using the tool and the number of states excluded during the FM. This provided information about whether using the FM helped reduce inconsistent C-TTO responses. We defined a logical dominance relationship between two health states as follows: state A dominates state B when state A is better than state B on at least one dimension and no worse than state B on any other dimension. Elicited values were considered logically inconsistent if a state was assigned a value indicating it was better than a dominating state.

The impact of the FM in terms of duration of interview (in minutes) for the different elements of the C-TTO tasks was also assessed, and we report timings for the whole study sample and by participants who did and did not exclude states during the FM.

2.2 Statistical Analysis

Background characteristics for the study samples were described using means and standard deviations for continuous variables and frequencies and percentages for categorical covariates. Statistically significant differences at the 5% level in the distribution of socioeconomic characteristics between participants who did and did not exclude states during the FM were evaluated using Pearson’s chi-squared (χ 2) test. The distribution of observed C-TTO utility values were explored across respondents overall and by severity index, defined as the sum of the levels in a particular state (e.g., state 22222 had a severity index of 10).

A hybrid model that combined both C-TTO and DCE data was used to estimate the value set in Hong Kong, thereby maximizing the usage of data collected. This modelling strategy has been used successfully in previous EQ-5D-5L valuation exercises [29, 32]. Briefly, this model estimates a set of coefficients from a unique likelihood function obtained by multiplying the likelihood functions of a distribution for the C-TTO data by the likelihood function of a distribution for DC data [29, 43, 44]. A conditional logit was assumed for DC data together with a normal distribution with values censored at −1 (Tobit model) assumed for C-TTO data. We censored C-TTO observations at −1 (lowest possible value in the C-TTO task) given that, theoretically, valuations for worse than dead states were in the range (−, 0). However, the lead-time TTO approach used in this study did not allow the elicitation of values below −1. Censoring values at −1 when modelling EQ-5D-5L C-TTO data has been implemented in recent valuation studies in the UK and the Netherlands [32, 34] and is considered good practice [45]. The dependent variable of the hybrid model includes both C-TTO and DC responses. The C-TTO responses were defined as 1 minus the C-TTO observed values for a given health state to indicate disutilities and therefore coefficients expressed as utility decrements. The DC responses included in the dependent variable were binary outcomes 0/1 indicating the respondent’s choice to each pair of EQ-5D-5L states. The hybrid model used a rescaled parameter theta (θ) that assumed that the C-TTO model coefficients were proportional to the DC model coefficients. We used cluster estimation to acknowledge that, for each participant included in the models, ten C-TTO and seven DC responses were available.

The model specification was a hybrid main effects (20-parameter model) consisting of four dummies for each EQ-5D-5L dimension using level 1 as the reference. Dummies were constructed to represent the utility decrement of moving from the reference (level 1) to any of the remaining levels (levels 2, 3, 4, and 5). As an example, we created four dummies (MO2–MO5), indicating the utility decrement of moving from level 1 to level 2 (MO2), moving from level 1 to level 3 (MO3), and so on. The same set of dummy variables was defined for each of the remaining dimensions: self-care (SC), usual activities (UA), pain/discomfort (PD), and anxiety/depression (AD).

Previous EQ-5D-5L valuation studies have reported different distributions of C-TTO values between severe and less severe states [29, 46]. In general, people tend to agree more about valuations of less severe states than about more severe ones, resulting in a heteroscedastic error term when modelling C-TTO data. We tested for homoscedasticity of the error term using a separate Tobit model for the C-TTO data as described by Cameron and Trivedi [47]. We corrected for heteroscedasticity and modelled the variance of the error term as a function of EQ-5D-5L levels for each domain including a constant. We also evaluated the impact of including a constant term in the main effects models as this has received attention in recent work [48]. The decision over whether or not to include a constant in the final value set was based on whether that term was statistically significant.

Performance of the final model for the Hong Kong value set was determined by the logical consistency of parameters and goodness of fit. Coefficients were considered logically consistent if values from logically worse health states were lower than those from logically better health states (all coefficients positive in ascending order in our model outputs). This was reflected by estimated coefficient values in descending order.

We included all DC data in the modelling exercise. However, we excluded C-TTO observations based on criteria 1 (respondents with a positive slope on a regression between his/her C-TTO values and the severity of the states indicating that the participant provided higher utility values for poorer health states on average) and criteria 2 (respondents who valued all states equally except non-traders, i.e., subjects who valued all states as 1).

All analyses were carried out in Stata MP [49]. Hybrid models were estimated in Stata using the hyreg command [45].

2.3 Assessing the Impact of the FM on Modelling Results

The final value set for Hong Kong was estimated excluding states selected by participants during the FM. This was a normative approach because there was no reason to keep C-TTO responses participants excluded in the FM. However, we compared this final value set with a value set that did not exclude states identified during the FM. Although such a comparison was meaningless in terms of model selection for a final value set, it provided a useful “what-if” situation for if the FM had not been available. Hence, we estimated models including all states (without FM information) or excluding states (with FM information) and compared the model goodness of fit using the Akaike information criteria divided by the sample size (AIC/n) [50]. Predicted mean values of the 86 health states included in the C-TTO tasks were also compared with and without FM information.

2.4 Comparison with Other Asian Country-Specific EQ-5D-5L Value Sets

We calculated and compared predictions for the 3125 health states using the final Hong Kong EQ-5D-5L value set with predictions from currently published EQ-5D-5L value sets in China, Indonesia, South Korea, and Japan [33, 51,52,53]. Predictions were compared using kernel density functions for the 3125 states.

3 Results

3.1 Characteristics of Study Sample

Figure 1 in the Electronic Supplementary Material (ESM) presents a flow diagram of participants who completed the survey. A total of 1033 Hong Kong residents aged ≥ 18 years participated in the survey; 19 dropped out at the beginning of the C-TTO task. Withdrawal reasons included fatigue, the time consumed, and the complexity of the task. As the responses from these 19 respondents were limited, they were excluded from any further analysis, leaving 1014 respondents with complete information included in the study.

Table 1 presents the background characteristics of the study sample in comparison with the Hong Kong general population. The mean age of the sample was 46 years, and 600 (59%) were female. In total, 813 (77%) individuals attended at least secondary school, 583 (58%) were married, and 435 (43%) were in paid employment. Around 70% of participants had experienced serious illness themselves or in a relative or cared for others. Compared with the general population, our sample had fewer respondents aged 45–54 years but slightly more aged 55–64 years, and fewer individuals were in paid employment. Overall, our study sample was reasonably representative of the Hong Kong population in terms of sex, educational attainment, and marital status. The distribution of self-reported EQ-5D-5L descriptive and VAS for the overall sample and different age groups is presented in Table 1 in the ESM. Most participants in the sample (95%) reported either no or slight problems in all domains. The mean ± standard deviation (SD) of the VAS was estimated at 82.72 ± 11.77 for the overall sample. The mean ± SD and median total interview time in minutes was estimated at 40 ± 14 and 37, respectively.

Table 1 Background characteristics of study samples compared with the Hong Kong general populationa

3.2 Responses from the Composite Time Trade-Off and Discrete-Choice Experiment Data

Figure 2 in the ESM depicts the distribution of observed C-TTO utility values across all respondents. The histogram showed a concentration of values at 1 (health states equal to full health) and −1 (lowest possible value in the C-TTO task). The spike at −1 indicated that, in 16% of the C-TTO tasks, the lead time was exhausted and the participant indicated that dying now was equivalent to living 10 years in full health followed by 10 years in the impaired state. This result supported our choice to censor C-TTO values at −1 when modelling the C-TTO data as part of the hybrid model. The distribution of observed C-TTO utility values by severity index is presented in Fig. 3 in the ESM. The series of histograms shows that less severe states enjoyed less variability across participants than did more severe health states, where disagreement across respondents was greater. This translated into a problem of heteroscedasticity of the error term that was corroborated by rejecting the null hypothesis of equal variances in the test of homoscedasticity (critical value = 4561 with p = 0.000). No DC data were excluded from any modelling analyses, but C-TTO responses from 15 interviews were excluded from any subsequent analysis after applying the exclusion criteria (Fig. 1 in the ESM).

3.3 Quality Control

There was no need to interrupt the study or drop data as all interviewers met the minimum quality criteria. Additional details about this quality control are available upon request.

3.4 FM

Table 1 reports the background characteristics of participants who excluded [n = 340 (34%)] or did not exclude (n = 674 (65%)] states during the FM. There were statistical significant differences between the two groups in terms of age (χ 2 = 19.3; p < 0.01), highest educational attainment (χ 2 = 11.5; p < 0.01), personal experience with serious illness (χ 2 = 12.1; p < 0.01), and whether the participant lived with a diagnosed chronic condition (χ 2 = 10.9; p < 0.05). Participants who selected health states during the FM were younger, had post-secondary or degree qualifications, did not have personal experience with serious illness, and did not live with a chronic condition.

A total of 9990 states (after exclusion criteria) were used in the C-TTO section of the survey across all participants, and 515 (5%) C-TTO states were selected and excluded during the FM. Hence, data from the remaining 9475 states were included in the estimation of the value set. Of the 340 participants who excluded at least one state during the FM and had at least one inconsistent state before using the tool, 285 (84%) removed the inconsistency or inconsistencies (Table 2). Conversely, 140 (21%) of the 674 participants who did not exclude any state during the FM had at least one inconsistency that was retained in the estimation sample. Table 2 also shows that, in most cases, the number of states excluded during the FM was one or two.

Table 2 Number of participants using the feedback module and impact on the number of inconsistent health states

Although participants who excluded states during the FM spent an additional 2 minutes in that section at the end of the C-TTO task, the overall length of the C-TTO section was similar between those who did and did not exclude states during the FM (Table 2 in the ESM).

3.5 The EQ-5D-5L Value Set in Hong Kong

The last two columns of Table 3 show that utility decrements for each EQ-5D-5L level associated with each potential value set had the correct sign and magnitude (e.g., was logically consistent). A non-significant constant was obtained when the term was introduced and was excluded from the final value set. Consequently, the final selected model from which to derive utility decrements in Hong Kong is reported in the last column of Table 3, and Table 3 in the ESM reports how to use this value set in practice. The lowest possible estimated value in that model for the health state 55555 was estimated at −0.8637.

Table 3 Hybrid main effects model results without and with feedback module

3.6 Assessing the Impact of the FM on Modelling Results

The coefficients of estimating a hybrid model without the FM are also reported in Table 3. The hybrid model without constant and with the FM has a slightly better goodness of fit, estimated at 1.504 compared with the hybrid model without constant and without the FM: goodness of fit estimated at 1.543. Hence, a marginal benefit in terms of goodness of fit was seen for the hybrid model with the FM, but predictions between the two models were similar (Fig. 4 in the ESM).

3.7 Comparison with Other Asian Country-Specific EQ-5D-5L Value Sets

Figure 2 shows the kernel density function of the 3125 predicted health states in Hong Kong and other published Asian countries. The value set for Indonesia provided predictions that were remarkably similar to those of the Hong Kong value set. A larger proportion of health states considered worse than death (negative values) was observed in the Hong Kong and Indonesia value sets (36%) compared with China (10%) and Japan and South Korea (0.1%).

Fig. 2
figure 2

Kernel density function of predicted values for all 3125 health states for EQ-5D-5L value sets in Asian countries

4 Discussion

In this study, we estimated an EQ-5D-5L value set using a sample of the Hong Kong general Chinese population that excluded health states using an FM at the end of the C-TTO tasks. The sample was representative of the general Hong Kong population in terms of sex, educational attainment, marital status, and most age groups but not for employment status. Overall, the proportion of C-TTO states excluded during the FM was low (5%). Limited qualitative information collected using free-text at the end of the FM for 71 participants suggested they found the tool useful to identify logically inconsistent states, but they also expressed difficulty understanding the task given the lack of reality of some states. Excluding states identified during the FM did not have a major impact in the model coefficients and predictions compared with not excluding states, but it was associated with slightly better goodness of fit. In addition, the FM reduced the number of inconsistent C-TTO responses used in the final estimation sample.

The use of a feedback process to give participants the opportunity to reflect and think about their preferences in valuation exercises is not novel in the health preference research literature. Existing studies have demonstrated that preferences can vary after a reflection or deliberation exercise [54, 55]. Therefore, it can be seen that the FM used in this study was a process for individuals to reflect about their own C-TTO responses. In our study, we also provided a theoretical rationale for the use of the FM from a behavioral economics viewpoint and hypothesized that cognitive skills and participant characteristics explained how individuals interacted with the tool. Participants who excluded states during the FM differed from those who did not: They were younger, had post-secondary or degree qualifications, did not have personal experience with serious illness, and did not live with a chronic condition. Therefore, our results seem to suggest that those with higher cognitive skills were more likely to use the FM. This systematic difference is important, and future EQ-5D-5L valuation exercises should bear this in mind. In particular, our results should be incorporated in the training package for interviewers using the EQ-VT. A note of caution: Our measure of cognitive skills was based on the proxy variable of highest educational attainment and not an explicit measure, and this should be seen as a limitation of our analysis.

A hybrid main effects model using both C-TTO and DC data without the constant coefficient was selected as the final value set for Hong Kong. The use of additional covariates to inform the model using either first-order interactions or other related terms has been widely practiced in EQ-5D-3L valuation exercises [56]. The inclusion of interactions in EQ-5D valuation studies was normally informed by previous evidence of significant interactions in other valuation studies and/or imposed by the researchers [57,58,59,60]. However, a recent simulation study has warned that, to allow identification and estimation of interactions between dimensions and levels, the design of the study should include main effects and the expected interactions [61]. The authors of the work concluded that most EQ-5D-3L valuation exercises conducted to date lack sufficient coverage of the EQ-5D space to allow estimation of interactions. The experimental design implemented in the current EQ-5D-5L valuation exercise was based on main effects only for both the C-TTO and the DC tasks [8]. Consequently, modelling of health states for Hong Kong was based on a main effects specification without interactions or additional terms. An important aspect of the inclusion of interaction effects in any econometric model is largely ignored in applied work: The identification of interaction effects to test in an econometric model should occur before the data are collected and not after to minimize the role of chance in the findings. This has recently been debated by researchers describing the pros and cons of the use of analysis plans in econometric analyses [62, 63].

This study was conducted at the same time as other EQ-5D-5L valuation exercises were being or had been conducted in different Asian countries. We compared predictions of the Hong Kong value sets with those from published EQ-5D-5L value sets in China, Japan, South Korea, and Indonesia. Health preferences of the Cantonese population in Hong Kong could be similar to the general population in these countries. However, Hong Kong has a population epidemiology and healthcare system that differs from those in these other settings [64]. Besides, although Hong Kong is a special administrative region of China, its cultural diversity suggests views on health between the two jurisdictions are likely to differ. We found that predictions between the Hong Kong and Indonesian value sets were very similar but those from the remaining countries differed. Hong Kong and Indonesia employed the same valuation protocol, which included an FM, and modelling technique to derive their value sets (Table 4 in the ESM). Future research should investigate whether the differences observed with the other Asian countries were due to the use of a different protocol version, modelling methods, or cultural factors.

5 Conclusion

An FM for C-TTO responses was a useful tool that improved the quality of the data used when estimating an EQ-5D-5L value set in Hong Kong. The final value set can now be used to measure the impact of interventions on health-related quality of life in economic evaluations for resource allocation in this jurisdiction.