FormalPara Key Points for Decision Makers

This study estimated an EQ-5D-Y-3L value set for China using a representative sample from 4 different regions.

Two types of preference data (composite time trade-off and discrete choice experiment) were collected, and different modelling approaches were explored.

A hybrid model with 10 main-effects parameters and an additional parameter accounting for the effect of having severe problems on all dimensions was chosen as the value set.

1 Introduction

As the most-used health-related quality of life (HRQoL) instrument worldwide including China, EQ-5D has been employed for varying purposes including monitoring population health [1, 2], quantifying disease burden [3, 4] and, most prominently, estimating health utility values for economic evaluation of health interventions and technologies [5]. For the purpose of estimating health utility values, EQ-5D value sets have been established in China [6, 7]. In 2009, the child-friendly version of EQ-5D was developed by the EuroQol Group to cope with the growing need to measure the HRQoL of children and adolescents [8]. This newly developed instrument has been psychometrically validated for different health conditions in China [9,10,11,12]. To date, there are no value sets available for the EQ-5D-Y-3L in China, limiting the use of this instrument for calculating health utility values.

Although the EQ-5D-Y-3L retained the five core health dimensions and three response levels from its adult counterpart, the EQ-5D-3L, the descriptive system of EQ-5D-Y-3L, including both the health dimensions and the response levels, was altered [8]. Thus, the EQ-5D-3L value set could not be applied to calculate EQ-5D-Y-3L health state values [13, 14]. This prompted a methodological studies research programme in search of the most appropriate method to establish EQ-5D-Y-3L value sets [13, 15, 16]. In 2020, the EuroQol Group published the international EQ-5D-Y-3L valuation protocol [17].

In the protocol, discrete choice experiment (DCE) data are used as the primary health preference data to model the relative importance of the five health dimensions, and the composite time trade-off (cTTO) data are used to anchor the DCE modelling results onto the quality-adjusted life year (QALY) scale [17]. Use of DCE data has the advantages of better data collection feasibility and avoidance of the comparison with dead in the task [17]. In comparison, a concern with respect to the use of the cTTO method for valuing children’s health is that the task can be upsetting and abhorrent to some respondents, posing potential ethical issues. In addition, cTTO values were found to be high for EQ-5D-Y-3L states, especially for mild/moderate health states, which could lead to insensitivity and hamper the modelling process [17]. Adopting a taxpayer’s perspective [17] and considering the possible challenges of involving adolescents in valuation studies [18], the protocol aims to elicit only adult respondents’ preferences using both the DCE and cTTO methods. This indicates that in EQ-5D-Y-3L valuation studies, participants will value health for someone else, but not for themselves. Specifically, participants are asked to provide valuations for a hypothetical 10-year-old child, without specifying the relationship between this hypothetical child and the participant. This perspective change, and a preference for longer life over quality of life when valuing children compared with valuing adults, are found to be the two major reasons for high values in the cTTO task [19, 20].

To date, six countries have published their EQ-5D-Y-3L value sets following the international EQ-5D-Y-3L valuation protocol [21,22,23,24,25,26]. All six studies used DCE methods for determining relative preferences and cTTO values for anchoring, though the choices of models and anchoring methods differed. An important finding which emerged from these studies was that the worst possible state (33333) had a much lower value than the other states, creating a gap at the lower end of the valuation space. This questions the appropriateness of using 33333 to anchor DCE values, given that it is the outlier state. Alternative options available are the mapping method (i.e. regressing DCE latent values onto observed cTTO values, and using the regression coefficient to rescale the DCE coefficients) and hybrid modelling (i.e. modelling cTTO and DCE data simultaneously), which have been widely used in published EQ-5D-5L value sets [27, 28].

Given the increasing need for a local EQ-5D-Y-3L value set for China’s health technology assessment (HTA) projects, this study aims to estimate an EQ-5D-Y-3L value set based on the Chinese general public’s health preferences. To identify the best approach to establish the value set, we compared mapping (the most-used approach) with hybrid modelling. We expanded the cTTO design since both approaches could benefit from collecting cTTO values for more health states [29]. In addition, using a larger cTTO design allowed us to evaluate the gap effect observed in previous studies that only collected cTTO values for 10 health states.

2 Methods

Our study followed the published international EQ-5D-Y-3L valuation protocol but expanded the original cTTO design of 10 health states with an orthogonal design of 18 health states, resulting in 28 health states in total. The study was approved by Jinan University ethics committee (Approval letter number: JNUKY-2021-002). In this section, we describe the methodological details used to collect two types of valuation data and the model evaluation process.

2.1 Experimental Design

In the protocol of the EQ-5D-Y-3L valuation study, different valuation methods served different purposes. The purpose of the cTTO is to provide anchors for the latent DCE values, and the DCE is the primary method to estimate the relative importance of dimensions/levels. Hence, the sample size for the DCE component was larger than that of the cTTO [17]. Following this consideration, the study used two independent samples to collect cTTO data (n = 400) and DCE data (n = 1000). Note that the sample size of the cTTO component was larger than the recommended sample size in the protocol, as we included more health states for the cTTO design.

Overall, there were 150 choice sets and 28 EQ-5D-Y-3L health states to be valued using DCE and cTTO methods, respectively. The DCE design was a Bayesian efficient design featuring a two-dimension overlap, a good level spread (i.e. each response level of each dimension appearing an approximately similar number of times in the study design), and utility balance [15]. Dimension overlap has been demonstrated to be an efficient method to reduce attribute non-attendance [30]. The design was constructed using a two-step procedure by first collecting data using a design with a sample of 127 participants, and then incorporating the collected information as priors to derive the final design. In total, there were 150 choice sets divided into 10 blocks, with each block containing 15 choice sets. This design did not include a dominant task or test–retest task to evaluate data quality.

The cTTO design included the 10 health states from the original protocol and another 18 states from an orthogonal design, which exhibited good predictive performance in estimating an EQ-5D-3L value set [29]. All 28 states were divided into three blocks of 10 states, with state 33333 included in every block. The total number of health states and observations per health state permitted the estimation of a value set based only on TTO values.

2.2 Sampling Strategy and Participant Recruitment

In addition to the experimental design aspects, another key consideration was to recruit a representative sample for China. We used the quota sampling method to ensure the sample represented the Chinese general population in terms of gender, age, education attainment and registered residence area (rural/urban, also known as hukou). Quotas for these four criteria were set based on the 6th National Population Census of China [31]. In total, 14 provinces/cities were selected to cover five different geographical parts of China, including North China (Beijing, Tianjin, Heilongjiang, Shandong), East China (Shanghai, Jiangsu), South China (Guangdong, Fujian), Central China (Hubei), and West China (Guizhou, Chongqing, Yunnan, Xinjiang, Shanxi).

Given the target sample size of 1400 participants (n = 1000 for DCE and n = 400 for cTTO), 100 participants were recruited from each province/city. In Tianjin, Shanghai, Jiangsu, Guangdong, Hubei, Guizhou, Chongqing and Shanxi, 50 participants completed the cTTO interviews and another 50 participants completed the DCE interviews. In other regions, 100 participants completed the DCE interviews. We used non-probability sampling methods to recruit participants, including both snowball sampling and purposive sampling methods. The interviewers first recruited respondents from their acquaintances and then rolled out to others via word of mouth. In each recruitment province/city, a local researcher monitored the quota fulfilment and aided interviewers in recruiting hard-to-reach participants. Individuals who (i) were Chinese citizens and had lived in China over the past 5 years, (ii) gave informed consent, (iii) met the four quota criteria, and (iv) did not participate in any earlier EQ-5D valuation studies, were recruited.

2.3 Data Collection Procedures

Both cTTO and DCE data were collected using the EQ-VT platform. Consenting participants were invited to a face-to-face, one-on-one computer-assisted personal interview. There were four sections in each interview: (1) the interviewer explained the study and obtained informed consent; (2) consenting participants provided their demographic information and reported their health using EQ-5D-Y-3L; (3) participants completed either a block of cTTO tasks or a block of DCE tasks. For the cTTO interview, each participant completed five practice states (two wheelchair examples and three practice EQ-5D-Y-3L states), before valuing formal EQ-5D-Y-3L states; (4) participants responded to debriefing questions, rating the difficulty of the valuation task and providing information related to their children, if any.

Two training workshops were arranged prior to the data collection. For the interviewers who conducted the cTTO data collection (n = 8, one interviewer per region), we invited all interviewers to participate in a 2-day training and practice workshop. The workshop covered the study background and design, providing the interviewers with opportunities to perform practice interviews with one another. Post-training, each interviewer conducted two rounds of five practice interviews. Quality control (QC) reports were prepared, and feedback was provided to the interviewers after each round (5–10 interviews per interviewer) following the EuroQol QC protocol [32]. Overall, the QC process evaluated the protocol compliance of the interviewers using four criteria (sufficient time to explain the example task, completing the form task, whether lead-time TTO was used, and whether inconsistent responses occurred), and further evaluated the presence of interviewer effects using time, number of moves, and value distributions of the cTTO interview data. For the interviewers who conducted the DCE data collection, a 2-hour online training workshop was arranged. QC control was not implemented for the DCE data collection process.

2.4 Data Analysis and Model Evaluation

There is no established consensus on how to estimate the EQ-5D-Y-3L value set [18]. Our study design allowed us two options: (1) use the DCE modelling results and a subsequent anchoring procedure to estimate the value set, or (2) use a hybrid model to model DCE and cTTO data jointly. All published studies that followed the standard protocol [17] applied the former approach due to the lack of health states in the cTTO design for modelling a main-effects model [22, 27]. It should be noted that both approaches have their limitations, as the former approach only uses the cTTO values for anchoring purposes, and the latter approach only supports the use of a conditional logit model for modelling DCE data.

For the first approach, we decided to use the mixed logit model (with random correlated coefficients) that accounted for preference and scale heterogeneity [15]. For anchoring, we opted for the mapping method, given the discontinuity of cTTO values for the worst state 33333 that has been reported in published studies. Thus, for the first approach, we initially fitted the DCE data with a mixed logit model and calculated the latent values for all health states. Next, we regressed the DCE latent values onto the observed cTTO values using an ordinary least squares (OLS) model and obtained the rescale coefficient. Finally, the DCE mixed logit modelling coefficients were rescaled [22]. For the second approach, the hybrid modelling, the ‘hyreg’ function of Stata [33] was used to model the DCE and cTTO data (accounting for heteroscedasticity) jointly [27]. For details on installing and implementing the ‘hyreg’ function, see Ramos-Goñi et al. [27].

For both approaches, a main-effects only model was first used (see Appendix 1 in the electronic supplementary material for the model equations). In an EQ-5D-Y-3L main-effects model, health state utility values were explained by 11 variables, one intercept, and two dummy variables to represent the move from level 1 in any dimension to level 2 and level 3. Having observed a discontinuity of cTTO values between the worst state 33333 and other states in previous studies [20, 21, 23], we explored the use of an additional term ‘All 3’ (A3) to capture this gap between state 33333 and other states. Note that the A3 term was only tested in the hybrid model, since 33333 was not included in the DCE design and could not be modelled. We compared the performance of these two approaches/three models using two criteria: (1) coefficient significance and monotonicity; and (2) prediction accuracy of the observed cTTO values. Prediction accuracy was measured by computing the mean absolute error (MAE) between the predicted values and the observed mean values for the 28 health states.

3 Results

Data collection was conducted between January 2020 and October 2021. In total, 1476 individuals participated in the study, of whom 1058 participated in the DCE interviews, and 418 in the cTTO interviews. Table 1 reports demographic information for the sample. Based on the 6th National Population Census (31), our sample was representative.

Table 1 Sample demographic characteristics

3.1 Data Descriptions

In total, 418 respondents participated in the cTTO interviews. On average, respondents took 35.70 minutes (SD 12.42) and 13.21 moves (SD 9.00) to complete an interview. Around 21.89% of observations (n = 915) were negative values. Figure 1 shows the distribution of the cTTO values. Overall, cTTO values were distributed primarily at the higher end of the scale (≥ 0.5), with a long tail towards negative values and a ‘spike’ at − 1.0 (1.65%). The highest mean TTO value was 0.924 (SE 0.011) for state 11112, and the lowest mean TTO value was − 0.088 (SE 0.025) for state 33333. State 33333 was the only state with a negative observed mean value. In total, 1058 participants completed the DCE interviews, with an average duration of 9.61 minutes (SD 7.12).

Fig. 1
figure 1

Composite time trade-off (cTTO) value distribution

3.2 Value Set Modelling

Table 2 presents the modelling coefficients and model characteristics of the three models, with an additional OLS model as a reference. In both the mixed effects model with random correlated coefficients and the hybrid main-effects only model, the sc2 term (corresponding to the effect of level 2 problems in self-care) was not significant at the 0.05 level. The rest of the coefficients were consistent and significant in all three models. In the hybrid model with the A3 term, the term was statistically significant, and the coefficient magnitude was larger than the sc3 term (corresponding to the effect of level 3 problems in self-care). In addition, after including the A3 term, the hybrid model resulted in the lowest MAE results and predicted a negative value for state 33333. Applying the two model evaluation criteria, the hybrid model with the A3 term outperformed both the mixed effects logit model with correlated coefficients and the main effects hybrid model and was therefore chosen to be the value set.

Table 2 Modelling coefficients and performances

4 Discussion

Using two different preference elicitation methods, our study collected health preference data from two general Chinese population samples encompassing 14 different geographical regions. A strength of our study is that we used an expanded cTTO design and explored the use of the hybrid model, which has not been explored in any published EQ-5D-Y-3L valuation study. The expanded cTTO design allowed us to investigate the gap effect in EQ-5D-Y-3L cTTO values. After fitting the preference data into different models, we recommend using the hybrid model with an additional A3 term to generate the EQ-5D-Y-3L value set for China. This model made the most use of the collected data, reported the lowest prediction errors, and had no inconsistent or insignificant coefficients. In addition, the model predicted a negative value for state 33333, which was observed in the cTTO data but was not achieved with the other two models. The availability of this value set can facilitate economic evaluations of paediatric interventions in China.

Using the coefficients of level 3 problems as an indicator, the mixed effects logit model and the hybrid model resulted in a different rank order for the five dimensions. In both models, the top and second most important dimensions were pain/discomfort and mobility, respectively; however, the worried/sad/unhappy dimension was the least important in the mixed effects logit model and ranked third in the hybrid model with an A3 term. This may suggest that respondents focused on different aspects of the health states in the DCE and cTTO tasks, and that the time component may have also played a role in driving people's preferences for the health states [34]. The DCE method does not require participants to trade-off between life years and quality of life, while the cTTO method elicits values by requiring participants to shorten life expectancy in exchange for better health. It is possible that during valuation, respondents focused on the overall severity in the cTTO tasks but shifted focus to the dimension differences in the DCE tasks. It should be noted that the current cTTO method may be suboptimal for valuing EQ-5D-Y-3L states that are considered worse than dead (WTD). In other words, when a 10-year lead time is used in the context of valuing health states for a 10-year-old child, a 10-year duration of full health lead time and a 10-year duration of being in the valued states means that the state being valued occurs at ages 20–30 years. Theoretically, a lag-time TTO method [16, 35, 36], in which the hypothetical life B in the WTD task starts with an impaired health state for 10 years and is followed by a full health state for another 10 years, is more appropriate. Thus far, only Shah et al. have investigated the use of lag-time TTO in valuing EQ-5D-Y-3L state 33333 [16]. Future studies should investigate the potential of using lag-time TTO in valuing a set of EQ-5D-Y-3L health states.

In line with the consensus of not solely using the value of 33333 for anchoring purposes [18], a value gap between state 33333 (−0.088) and the second worst health state 33323 (0.219) was observed in this study, as in other published EQ-5D-Y-3L valuation studies [21,22,23,24,25,26]. For example, in the Japanese EQ-5D-Y-3L value set, although all observed values were positive, a value gap of 0.28 existed between 33333 and the second worst state 33232 [22]. This value gap implies that a simple additive main-effects model may not perform well when state 33333 has been included in an EQ-5D-Y-3L valuation study. Hence, anchoring the latent DCE values using the cTTO value of 33333 could ensure the accuracy of the value range, but at the cost of reduced predictive accuracy for the other states. In this study, the use of the A3 term in the hybrid model successfully reduced the overall prediction error and predicted a negative value for state 33333. However, the current DCE design does not allow the exploration of this gap effect, as the DCE design does not include 33333. This could be a disadvantage of any model that only uses the DCE data (e.g. the mapping approach tested in this study). To address this, one could expand the DCE design with a pair including state 33333. This design allows the exploration of the gap effect in DCE data and facilitates data quality assessment, given that 33333 is the worst state in the EQ-5D-Y-3L descriptive system and any pair including state 33333 is a dominant pair [37]. Future research is required to understand the reasons for this value gap.

Unlike the cTTO value distributions observed in most EQ-5D studies [6, 7], there was not a strong sign of censoring at − 1 in our study. This may indicate the unwillingness to trade off all the life years of a child for better health [19, 20] and subsequently led to high cTTO values. The high values raise concerns about the sensitivity of the resultant EQ-5D-Y-3L values [17]. However, the higher EQ-5D-Y-3L values mean life-saving interventions for children will gain more QALYs than life-saving interventions for adults. In fact, no value set can favour both quality-improving and life-saving interventions over another value set. Hence, children and adolescents will not always be disadvantaged if EQ-5D-Y-3L values are used together with EQ-5D-3L or -5L values to inform resource allocation decisions. Before the implications of the systematically different EQ-5D values for adult and children health states are fully understood, efforts to make EQ-5D-Y-3L values lower may not be tenable.

It appears that the values for EQ-5D-Y-3L state 33333 in Asian countries (China: − 0.088; Japan: 0.200 [22]; Indonesia: − 0.086) have been significantly higher than those elicited in European countries (Slovenia: − 0.691 [23]; Germany: − 0.260 [21]; Spain: − 0.389 [24]; Hungary: − 0.517 [26]; the Netherlands: − 0.372 [25]), implying cultural differences in the valuation of child health states. However, this pattern was not evident for EQ-5D-5L valuation studies, that is, the predicted value of 55555 was − 0.341 for China [7], − 0.019 for Japan [38], − 0.719 for Indonesia [28], − 0.166 for Spain [27], − 0.462 for Germany [39], − 0.642 for Hungary [40], and − 0.446 for the Netherlands [41]. These results suggest that when compared with the European populations, Asian populations are less willing to trade-off life years for a child. Future studies should examine the differences in values between instruments and the implications for applications that entail switching between instruments.

Our study also found that the social preferences with respect to EQ-5D-Y-3L and EQ-5D-3L/EQ-5D-5L health states were different [42] in China. Mobility was considered the most important dimension, followed by pain/discomfort, while usual activities was the least important dimension in both the EQ-5D-3L and EQ-5D-5L Chinese value sets [6, 7]. In contrast, pain/discomfort was the most important dimension, and mobility and self-care were the least important dimensions in this study. Similar differences between EQ-5D-3L/EQ-5D-5L and EQ-5D-Y-3L were observed in other countries [20,21,22,23], suggesting that EQ-5D-3L value sets should not be applied to EQ-5D-Y-3L data. The differences could be attributed to the differing disutility perspectives associated with different EQ-5D health dimensions for children and adults. For example, mobility and self-care are important to adults since they are supposed to be independent; these dimensions are not of foremost importance for children because they are most likely to be dependent on their parents or caregivers. Nevertheless, the differences may also have been partially due to the change in valuation perspectives (self for EQ-5D vs others for EQ-5D-Y-3L), and in descriptive systems (EQ-5D-3L vs EQ-5D-Y-3L) [13, 14].

This study is not without limitations. First, we collected cTTO data from only 8 regions. This may explain the discrepancy between the mixed logit model and the hybrid model, given that the DCE data covered more regions and there may be preference heterogeneity across regions. Second, following the EQ-5D-Y-3L protocol, the relationship between the imagined child and the respondent was not specified, which could have increased data variability [43]. Anecdotally, interviewers reported that various relationships were hypothesised by respondents; for example, the child being their own child, their friend’s or relative’s child, a random child, or themselves when they were 10 years old. In future EQ-5D-Y-3L valuation studies, it may be more appropriate to standardise the relationship (e.g. a 10-year-old child you do not know) for all valuation tasks.

The published EQ-5D-Y-3L valuation protocol represented best practice at the time of development and our knowledge with respect to the valuation of children’s health has been deepening and expanding over time. For example, it has been reported that adolescents could complete the DCE tasks and exhibit different health preferences compared with adults [44]. Additionally, following the current EQ-5D-Y-3L protocol, we fixed the age of the imagined child at 10 years old in the valuation task. The effect of the hypothetical child’s age appears to be minimal on the latent utility values derived from the DCE tasks [45]. Hence, the resultant value set may be applied to the health states for the full age range of children and adolescents, although future studies are needed concerning the effect of age on cTTO values. As our knowledge about children’s health valuation grows, an updated EQ-5D-Y-3L valuation protocol and a new EQ-5D-Y-3L value set can be expected to be developed.

5 Conclusion

Using an expanded cTTO design, this study collected both cTTO and DCE preference data for EQ-5D-Y-3L health states in a large representative sample in China. By comparing two analytical approaches, we found that the hybrid model outperformed the DCE mixed logit model (anchoring through mapping) in terms of coefficient significance and predictive performance. Furthermore, by adding an A3 term, the hybrid model addressed the value gap (between 33333 and other states) issue that has been reported in some other published EQ-5D-Y-3L studies. We recommend the hybrid model with A3 term as the EQ-5D-Y-3L value set for China.