Do health preferences differ among Asian populations? A comparison of EQ-5D-5L discrete choice experiments data from 11 Asian studies

Introduction Many countries have established their own EQ-5D value sets proceeding on the basis that health preferences differ among countries/populations. So far, published studies focused on comparing value set using TTO data. This study aims to compare the health preferences among 11 Asian populations using the DCE data collected in their EQ-5D-5L valuation studies. Methods In the EQ-VT protocol, 196 pairs of EQ-5D-5L health states were valued by a general population sample using DCE method for all studies. DCE data were obtained from the study PI. To understand how the health preferences are different/similar with each other, the following analyses were done: (1) the statistical difference between the coefficients; (2) the relative importance of the five EQ-5D dimensions; (3) the relative importance of the response levels. Results The number of statistically differed coefficients between two studies ranged from 2 to 16 (mean: 9.3), out of 20 main effects coefficients. For the relative importance, there is not a universal preference pattern that fits all studies, but with some common characteristics, e.g. mobility is considered the most important; the relative importance of levels are approximately 20% for level 2, 30% for level 3, 70% for level 4 for all studies. Discussion Following a standardized study protocol, there are still considerable differences in the modeling and relative importance results in the EQ-5D-5L DCE data among 11 Asian studies. These findings advocate the use of local value set for calculating health state utility. Supplementary Information The online version contains supplementary material available at 10.1007/s11136-021-03075-x.


Introduction
EQ-5D is a generic preference-based health-related quality of life (HRQoL) questionnaire that is widely used around the world [1,2]. When value sets are available, EQ-5D data can be converted to health utility [2]. Many countries have established their own EQ-5D value sets proceeding on the basis that health preferences differ among countries/populations [3,4]. Indeed, studies have found differences between value sets [3,5,6]. In developing the value sets of three-level version of EQ-5D (EQ-5D-3L), published studies differed in terms of design, data collection protocol and the choice of model. By comparing these value sets, Norman et al. concluded that these variations in methods could obscure true differences in values [6]. For the latest five-level version of EQ-5D (EQ-5D-5L), the EuroQol Group developed a standardized protocol for data collection in valuation studies, 1 3 which is named the EuroQol valuation technology protocol (EQ-VT) [7][8][9].
With application of the EQ-VT, EQ-5D-5L valuation data can be exploited to study whether important differences in health preferences across populations exist, as the method variations observed in the 3L studies are minimized. The EQ-VT data collection protocol uses both time trade-off (TTO) and discrete choice experiment (DCE) as preference elicitation methods [7]. Currently, all comparison studies of EQ-5D value sets only used the TTO data from the valuation studies [5,10,11]. This is partially because the TTO data is considered as the primary preference source in the EQ-VT protocol and some studies estimated their value set using TTO data only, for example, China and South Korea [12,13]. So far, the DCE data collected using the EQ-VT protocol has not been utilized for the purpose of identifying preference differences across studies. While the TTO valuation data could be subject to interviewer effects as the task relies on the good performance of the interviewers [14], there is minimal interviewer effect for the DCE data.
As a preference elicitation method, DCE has been increasingly used in health preference studies [15]. Based on random utility theory, DCE is designed to ask respondents to choose a preferred multi-dimensional health state from two or more alternatives. The ordinal preference data can be modeled to predict health utility on a latent scale [16]. This means that the coefficients of DCE are not directly comparable across studies and most studies assessed their difference by calculating and comparing the relative importance of five health dimensions [17,18].
Further, differences in health preferences among Asian populations are not well understood. By comparing the multiplicative model coefficients of the EQ-5D-5L TTO valuation data from seven Asian studies, Wang et al. noticed that there was no consensus about the rank ordering of the five dimensions [10]. Additionally, statistical test suggested most coefficients differed among Asian studies. In the study of Roudijk et al., the authors found that cultural variables (i.e. traditional/rational-secular, survival/self-expression) did not explain the variations of value differences (defined as utility differences between the mild and severe states) among EQ-5D valuation studies, including 10 Asian studies [11]. As stated before, these studies only explored the TTO valuation data.
Following the EQ-VT protocol, 11 studies (China, Indonesia, Japan, South Korea, Malaysia, Singapore, Thailand, Philippines, Vietnam, Hong Kong, Taiwan) have been completed in Asia. Of those studies, China and South Korea did not use the DCE data to model the value sets, and Singapore and Philippines have not yet published their value sets. The rest of the studies modeled the DCE data and TTO data jointly. Notably, no study has compared DCE-derived preference data among Asian populations. Given all studies used the standardized EQ-5D-5L instrument, DCE experimental design, and data collection protocol, it is possible to explore the variations of health preferences in Asia. We hypothesized that the health preferences differed in Asian populations. If this is true, the results of this study could further support the establishment of national/regional value sets for better guidance of health care decision-making and resource allocation rather than using a unified value set designed merely for the continent. In this study, we aim to understand the similarities and differences in Asians' preferences for EQ-5D-5L health states in 11 Asian DCE datasets collected as part of EQ-5D-5L valuation studies.

DCE design and tasks
All Asian studies included in this study used the standard EQ-VT protocol for data collection [9,27]. In general, the DCE design of the EQ-VT protocol consisted of a total of 196 pairs of health states including 186 pairs generated from a Bayesian efficient design algorithm and 10 pairs of mild states [27]. The priors for the Bayesian efficient design algorithm were extracted from a main effects model of an EQ-5D-3L DCE study [28]. The detailed experimental design development process and considerations were described in Oppe et al. [27]. The 196 pairs of EQ-5D-5L health states were distributed over 28 blocks, each consisting of 7 pairs of health states with similar severity. No dominant pairs were included. [27]. In each study, each respondent was assigned one block of DCE tasks to complete. The 7 pairs were presented in random order, and the right-left presentation of the two health states was also randomized [8]. Figure 1 shows the screenshot of one DCE task in EQ-VT software.

Data collection
Following the EQ-VT protocol, all respondents were interviewed face-to-face by a trained interviewer using the EQ-VT software. The data collection included four sections: The first section was for respondents to report their own health using the EQ-5D-5L descriptive system and the EQ-VAS. In the second section, respondents valued 10 different EQ-5D-5L health states using the composite time trade-off (cTTO) [8]. In the third section, respondents completed 7 pairs of EQ-5D-5L discrete choice tasks [27]. Finally, respondents reported their socio-economic and other background characteristics. We used the DCE data obtained from the third section for the analysis.

Analysis
To understand how the health preferences are different/similar with each other, the following analyses were done: (1) the statistical difference between the coefficients; (2) the relative importance of the five EQ-5D dimensions; (3) the utility decrements between each of the response levels.
For modeling, a 20-parameter main-effects mixed logit model was fitted for each study. In this model (Formula 1), utility was explained by 20 dummy variables and was on a latent scale (referred as latent utility). For each dimension (MO for mobility, SC for self-care, UA for usual activities, PD for pain/discomfort, AD for anxiety/depression), 4 dummy variables were used to represent the departure from level 1 to the other 4 levels, e.g. MO 3 was 1 if the health state being valued had "moderate problems with mobility" and 0 for any other level of mobility [29]. In addition, a heteroscedastic conditional model was also fitted for each study [30]. The major difference between the heteroscedastic conditional logit model and the mixed logit model is that the heteroscedastic conditional logit model accounted for the heterogeneity in error variance and the mixed logit model accounted for the preference heterogeneity among respondents.
Next, the statistic difference between two studies' coefficients were explored using a pairwise comparison. For each pair, a dummy variable was generated as 0 for one study's data and as 1 for the other. Next, a 20-parameter main-effects model plus 20 interaction terms was fitted for all two-bytwo study combinations (see Formula 2). In this model with interaction terms, a significant interaction term suggests that the coefficient is statistically different between two studies. The number of statistically differed coefficients were summarized for each study pair. Notably, the coefficient of a (1) significant interaction term may not exceed the minimal important difference (MID) on the utility scale [31].
Using the mixed effect logit model results (Formula 1), the relative importance of dimensions and levels were estimated for each study [17,18,32]. The relative importance of the five dimensions were calculated in two steps. First, the dimension-level coefficient was divided by the mean of the same level from all the dimensions. For example, the adjusted coefficient for mobility level 3 was obtained by the MO 3 coefficient divided by the sum of all level 3 coefficients for each dimension aMO 3 = MO 3 /(MO 3 + SC 3 + UA 3 + PD 3 + AD 3 ). This step resulted in adjusted coefficients for the last four levels (level 1 is the reference level) of every dimension. Second, the means of all adjusted coefficients for each dimension were calculated. Continuing the mobility example, the relative dimension importance of mobility for a study would be estimated as (aMO 2 + aMO 3 + aMO 4 + aMO 5 )/4.
The relative importance of levels was also obtained in two steps: first, the sum of each level coefficient from all dimensions was calculated. Second, the sum of each level coefficient was divided by the sum of level 5 coefficients: e.g. the relative importance of level 2 was the sum coefficient of level 2 divided by the sum coefficient of level 5. In practice, relative importance for level 2 sum for a study would be calculated as follows: (MO 2 + SC 2 + UA 2 + PD 2 + AD 2 )/ (MO 5 + SC 5 + UA 5 + PD 5 + AD 5 ). The relative importance results were summarized across 11 studies and two figures were plotted, one for the relative importance of the dimensions and one for the relative importance of levels (see Online Appendix 1 for the calculation of the relative important). If five dimensions are equally weighted by a population, all five dimensions should have a relative importance of 0.20 (i.e. 1 divided by 5). The relative importance of levels is interpreted as the percentage of the weight attached to level five problems. The 95% confidence intervals of relative importance were calculated using the Delta method (see Online Appendix 2 for an example STATA code). Analyses were performed using STATA 14 (Stata Corp LLC) [33]. Table 1 summarizes the key information from the 11 valuation studies. Based on the EQ-VT protocol, all studies recruited at least 1000 respondents. Quota sampling was the

Data descriptions
most used sampling strategy, but the quota differed. All studies were conducted between 2012 and 2017. Table 2 shows the mixed logit modeling results. All coefficients for all studies were significant at 0.05 level except for the second level of usual activities in Taiwan. Vietnam and Philippines each had 1 and 2 inconsistent coefficients, respectively. Three inconsistencies occurred on the third level of self-care, mobility, and usual activities, respectively. Within each study, the standard errors of the coefficients generally increased with severity levels. Table 3 shows the number of coefficients that differed statistically between two studies. Overall, 9.3 out of 20 coefficients differed among studies. Almost all studies had at least 5 coefficient differences with others except for Taiwan versus Hong Kong, Taiwan versus Malaysia. Malaysia and Singapore differed the most with 16 statistically different coefficients. An example of this comparison between China and Indonesia can be found in Online Appendix 3.

Modeling results
Compared with the mixed logit model results, the heteroscedastic conditional logit model improved the nonsignificance for Taiwan but did not improve the coefficient inconsistency for Philippine and Vietnam. Furthermore, this model resulted one non-significant coefficient for South Korea and one inconsistency for Thailand. The heteroscedastic conditional modeling results can be found in Online Appendix 4. Table 4 shows the relative importance and their 95% confidence intervals of 11 studies. Figure 2 shows a universal rank order does not exist across 11 Asian populations. Mobility was the most important dimension for every study except for Vietnam. The lowest important dimension was either usual activities or self-care except for Philippines and Indonesia. Notably, these two functional dimensions had similar weights in China, Indonesia, Japan and Vietnam, and only Korea had larger relative weight for usual activities. Pain/discomfort was the second most important dimension for 6 studies, and it was valued higher than or equal to anxiety/depression in almost all studies except for Thailand. Singapore, Japan, Philippines, and Indonesia placed similar weights on pain/discomfort and anxiety/depression. The sum of the first three functional dimensions were larger than the sum of the two symptom dimensions across all studies.

Relative weight results
Some individual characteristics can be spotted from Fig. 2. South Korea showed the largest difference between the dimensions of mobility and self-care. Japan had similar weights for dimensions other than mobility. Hong Kong, Malaysia and Taiwan showed similar rank order, i.e. Mobility > Pain/discomfort > Anxiety/depression > Selfcare > Usual activities. China differed with these three studies by placing usual activities more important than selfcare. Indonesia showed a different pattern by weighing more on usual activities and self-care over pain/discomfort and anxiety/depression. Both Vietnam and Singapore had similar weights three dimensions. Thailand and Vietnam were unique in the sense that Thailand valued anxiety/depression as the second most important dimension and Vietnam valued pain/discomfort as the most important dimension.
Compared with the large variations among the relative importance of health dimensions, the relative importance of levels were more comparable across studies (Fig. 3). The weights of mild (L2) and moderate problems (L3) were more similar across regions as compared to the weights of severe (L4). The L2 ranges from 0.156 for Taiwan to 0.322 for the Philippines; the L3 ranges from 0.211 for Thailand to 0.367 for Indonesia; the L4 ranges from 0.600 for South Korea to 0.837 for the Philippines. In the Philippines and Thailand, the difference between level 2 and level 3 were minimal. On average, level 2 accounted for 20% of the weight of level 5, level 3 accounted for approximately 30% of the weight of level 5 and level 4 accounted for 70% of the weight of the level 5. The smallest relative importance was 0.156 of L2 from Taiwan, which represents having a mild problem accounted for about 15.6% weight of having an extreme problem. The smallest L3 was from Thailand (0.211), and this value was smaller than L2 from some studies.

Discussion
The present study compared the DCE based modeling results and relative importance of EQ-5D-5L dimensions and levels of 11 Asian valuation studies. The strength of this study is all 11 studies followed the standardized EQ-VT protocol, which minimized possible noises in identifying the true differences. Based on our results, it is fair to declare that there does not exist a single preference pattern for Asian populations. This is in line with a previous study comparing TTO preference data [10]. A clear distinction between our DCE results and the TTO results is the relative weights for level 3 and level 4 are larger in the TTO study.
Our study first tested the differences of modeling coefficients and then compared the relative importance attached to the dimensions and levels of EQ-5D-5L. Both analyses suggest large health preference heterogeneities among Asians. First, the number of differed coefficients ranged between 2 (Malaysia vs. Taiwan) and 16   *For the 'Mean' coefficients: Italic and bold font suggests the coefficient is not significant at 0.05 level; bold font suggests the coefficient is inconsistent #For the 'SD' coefficients: Bold font suggests not significant at 0.05 level. The direction of the SD does not have meaning (Singapore vs. Malaysia) and the average number is 9.3, suggesting about half of the coefficients differed when pooled two studies' data for a joint model. Second, both the relative importance of dimensions and levels differed among studies. Only Hong Kong, Taiwan and Malaysia showed the same order of five dimensions. Here we concluded some common patterns that, however, always come with exceptions. First, among the five dimensions, mobility is the most important dimension for every population except for Vietnam. This is similar to the results from a comparison of TTO-only preference data from 7 Asian regions [10]. However, western countries do not value mobility as highly; the Dutch, German, and US populations view mobility as third, fourth, and second most important dimension, respectively [34][35][36]. Purba et al. argued that in the western developed countries, problems with mobility had less influence due to better infrastructure provision and less emphasis on manual labor [20]. However, in high income and developed regions such as Singapore and Japan, mobility is still the most valued dimension. Second, the sum of three function dimensions (mobility, self-care and usual activities) were higher than the sum of two symptom dimensions (pain/discomfort and anxiety/depression). Also, either usual activities or self-care is the least important dimension. Indonesia and Philippines are the exceptions. This result agrees with the previous study of comparing TTO data among 7 Asian populations. In that study, Indonesia was the only one who valued pain/discomfort and anxiety/depression the lowest. Third, pain/discomfort was valued more important than anxiety/depression and is the second most important dimensions for 6 studies. These characteristics mark some notable difference between preference pattern from most European, American, and African populations [5,[37][38][39].
Despite these similarities, it is clear that a singular preference pattern does not exist for all Asian populations. For example, there is no agreement on the least important dimension in our comparison: 3 studies valued self-care, 2 studies valued anxiety/depression, and 6 studies valued usual activities as the least important. This contrasts to a previous study of comparing health preference pattern for Canada, England, the Netherlands, and Spain. In that study, Olsen et al. found a clear pattern existed for these four western countries and named it western preference pattern (WePP) [5]. In the WePP, four general characteristics were noticed in terms of the relative importance: 1) (PD + AD) ≈ (MO + SC + UA); 2) PD ≈ AD; 3) MO ≈ SC; 4) UA < SC. However, no Asian preferences fit well with these four characteristics. In fact, the sum of pain/discomfort and anxiety/ depression was less than the weight of the other three dimensions in all Asian studies: (PD + AD) < (MO + SC + UA), suggesting that compared with the western countries, the Asian placed more weights on the functional dimensions. The second characteristic of 'PD ≈ AD' was only observed in the results from Indonesia, Singapore, and Malaysia. The third characteristic was clearly invalid in Asia as mobility was valued as the most important dimension while self-care had less relative importance in 4 studies. For the last characteristic, four Asian populations put similar or higher values for usual activities.
The differences of health preferences can be attributed to several reasons. First, in our sample, 11 populations come from diverse cultural, economic, political and social environments. Although no study has examined how these factors related to health preferences, country specific value set has been established on the notion that these factors shape people's preference. Second, even though each study followed the same study protocol, their sampling method differed. Quota sampling method was the most used sampling  Indonesia  9  Japan  8  8  Korea  9  12  10  Malaysia  7  12  9  13  Singapore  13  10  8  12  16  Thailand  9  10  7  8  8  8  Philippine  8  10  8  10  10  12  8  Vietnam  9  12  10  14  6  12  11  8  Hong Kong  5  9  7  12  5  13  6  7  8  Taiwan  8  10  11  12  2  15  6 10 10 3  [40,41]. Hence, different respondents recruited for each study may contribute to the observed differences. Last but not least, the EQ-5D-5L descriptive system was translated into different official languages from English. Though a standardized translation process was conducted to maintain equivalence between the translated questionnaire and its source version, different languages have different ways of expression which maybe inadequately captured [42]. This study has some limitations. First, the point estimates of the relative weights were used to identify the preference pattern. Considering the 95% confidence intervals were overlapped for some dimensions, the relative weight difference between dimensions may not be statistically significant. Assuming a scale length of 1.5 (i.e. 55555 has a value of -0.5, 11111 has a value of 1) and using a MID of 0.05, any relative importance difference over 0.03 should be meaningful. Nevertheless, since we do not know the actual scale length of each study, we did not use this criterion. Second, even though a standardized protocol was used, the demographic questions used for each study was customized by each local study team. Due to these sampling variations, we did not further test how these variations affect preferences. Only the heteroscedastic model shown in Online Appendix 4 demonstrates that the variances was constant for respondents with different ages and gender.
Norman et al. pointed out that differences in methods obscured the true differences in health preferences across countries after comparing published EQ-5D-3L value sets [6]. Our study has shown that using a standardized data collection protocol, study design and modeling choice, there still remained differences in EQ-5D-5L modeling results and the relative importance of dimensions and levels among Asian populations. Therefore, the effort of estimating a combined continental value set that was carried out for European and Western countries [5,43] should be discouraged for Asia.

Conclusion
By comparing the DCE data modeling results, we found that the rank order of EQ-5D-5L dimensions and the relative weight of levels differed among Asian populations. These findings confirmed the health preference heterogeneity among Asian populations that was observed in previous studies using TTO data. All the evidence suggested the necessity of using local value set for estimating health utility.