Estimating the EQ-5D-5L value set for the Philippines

Background The Philippines has recommended the use of Quality-Adjusted Life Years (QALYs) in government health technology assessments (HTA). We aimed to develop a value set for the EQ-5D-5L based on health preferences of the healthy general adult population in the Philippines. Methods Healthy, literate adults were recruited from the Philippine general population with quota targets based on age, sex, administrative region, type of residence, education, income, and ethnolinguistic groups. Each participant’s preference was elicited by completing Composite Time Trade-Off (C-TTO) and Discrete Choice Experiment (DCE) tasks. Tasks were computer-assisted using the EuroQol Valuation Technology 2.0. To estimate the value set, we explored 20- and 8-parameter models that either use c-TTO-only data or both c-TTO and DCE (also called hybrid models). Final model choice was guided by principles of monotonicity, out-of-sample likelihood, model fit, and parsimony. Results We recruited 1000 respondents with demographic characteristics that approximate the general population such as 49.6% Female, 82% Roman Catholic, 40% in urban areas, and 55% finished high school. None of the 20-parameter models demonstrated monotonicity (logical worsening of coefficients with increasing severity). From the 8-parameter models, the homoscedastic TTO-only model exhibited the best fit. From this model, mobility and pain/ discomfort had the highest effect on utilities. Conclusion The selected model for representing the Philippine general population preferences for EQ-5D-5L health states was an 8-parameter homoscedastic TTO-only model. This value set is recommended for use in QALY calculations in support of HTA-informed coverage decisions in the Philippines. Supplementary Information The online version contains supplementary material available at 10.1007/s11136-022-03143-w.


Introduction
Health Technology Assessment (HTA) provides a transparent and rational priority setting mechanism for the optimal use of health technologies in a finite budget setting [1,2]. Analyses in HTA often include economic evaluations which estimate the incremental cost-effectiveness ratios (ICERs) expressed as incremental cost per incremental benefits/outcomes/health effects. In recent years, the quality adjusted life years (QALY) has become a more common proxy measurement of health effects in HTAs and the EQ-5D-5L, a tool developed by the EuroQol Group [3] has become widely used as a means in quantifying changes in QALY due to an intervention. The EQ-5D-5L measures health-related quality of life using five dimensions (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression) 1 3 with each dimension having five levels (no problems, slight problems, moderate problems, severe problems, and unable/ extreme problems) [4]. This tool has been translated into various languages and a number of national value sets have been published across the globe [5][6][7][8][9][10][11][12][13][14][15][16][17]. However, in order to utilize the EQ-5D-5L in QALY calculations, the five dimension scores of the tool need to be transformed into a single value through the use of a country-specific value set [3]. Despite using HTA for inclusion of drugs in the national formulary and recommending the use of QALYs in HTA, the Philippines has yet to establish its own value set [18].
In the absence of its country-specific value set, it has been a common practice for cost-effectiveness analyses (CEAs) in the Philippines to use either the Thai value set or the disability adjusted life years (DALY) [19][20][21][22]. The practice weakens the validity of these CEAs as there are socio-cultural differences between Thailand and the Philippines, which in turn could make the Thai value set an inappropriate proxy for the Philippine value set. In addition, DALY and QALY have theoretical differences that could lead to divergence in utilities [23], albeit a recent review has noted there might be minimal differences in HTA decisions due to these differences [24].
The enactment of the Universal Health Care Act in the Philippines (Republic Act 11223) in 2019 institutionalized the use of HTA to inform the coverage decisions of the Department of Health and the Philippine Health Insurance Corporation. Thus, establishing a Philippine-specific value set would be of high relevance to three of the five criteria in developing coverage recommendations applying HTA in the Philippines which are: (1) responsiveness to magnitude, severity, and equity of medical conditions with heaviest burden to the population; (2) cost-effectiveness; and (3) affordability and viability [25]. In this study, we aimed to estimate the utility values of EQ-5D-5L health states based on the preferences of the general population in the Philippines.
Over the years, several modeling approaches have been developed to generate country-specific value sets from empirical preference data [26,27]. Earlier approaches used Composite Time Trade-Off (C-TTO) data only. However, this approach has been challenged as the iterative process could lead to biased responses due to responder fatigue, and both the hypothetical health states and time horizon could be difficult to visualize for the respondents [28]. Likewise, the use of conditional logit models to estimate coefficients using only Discrete Choice Experiment (DCE) data has had limitations observed in previous investigations [29]. Finally, recent EQ-5D valuation studies have found that in some contexts, the 20-parameter model produced coefficients that violate monotonicity (e.g., worse estimated decline if experiencing slight pain vs those experiencing severe pain). This is because additive models estimate a parameter for each domain level (e.g., a beta each for mobility levels 2 to 5). Recently, multiplicative models have been proposed which estimates fewer parameters and constructed in a way that avoids monotonicity violation [13,30]. Thus, in this analysis, we likewise explored multiplicative models to generate the utility value set.

Study design and sites
The Philippines is an archipelago with 17 administrative regions, wherein each region roughly follows the dominant local ethnolinguistic groups [31]. The study employed a crosssectional design that was conducted in 34 towns across all the Regions in the Philippines (one rural and one urban town per Region). Data collection was conducted from October to December 2017.

Sampling method and recruitment
Consistent with EuroQoL methodology, quota sampling based on 1000 respondents was employed in the study, with quota buckets calculated based on age, sex, administrative region, type of residence, education, income, and the six predominant ethnolinguistic groups (Tagalog, Cebuano, Ilocano, Hiligaynon, Waray, Bicolano) in order to produce a sample comparable to the general Philippine population [31]. Income was based on coverage under the National Household Targeting System (NHTS) which identified poor households based on a proxy means test [32]. This was selected because villages have lists of NHTS families which facilitated identification of potential respondents.
We included healthy, literate, and non-institutionalized adults (18 years or older) who provided consent. Healthy individuals were defined as respondents who did not self-report any disabilities or acute disease at the time of the survey. This was done through a screener that asked respondents 'How do you feel today? Do you feel unwell? Do you have any illnesses?' and 'Do you have any disabilities?'." Individuals who reported chronic diseases (e.g., hypertension, diabetes) were still included in the sample (26.6% reported chronic condition at time of survey). In each study site, the team coordinated with local government community health workers to identify individuals who met our inclusion criteria and these individuals were then invited to go to the recruitment area. Our study team members performed final screening before obtaining consent and conducting the final interview.

Data collection
Three teams composed of three interviewers each were deployed. A supervisor was assigned to each team to ensure data quality.
Respondents who met the inclusion criteria were first asked to accomplish the informed consent. Thereafter, consenting respondents were interviewed by a trained interviewer fluent in their preferred language using a computerbased platform, EuroQol Valuation Technology (EQ-VT, version 2.0) software that followed the standard valuation protocol [33]. This version gave more attention to the valuation tasks than EQ-VT Version 1.1 and allowed respondents to review their responses through a new Feedback Module [17,33]. Changes in version 2.0 including the revised quality control procedures and addition of the Feedback module has been found to improve data quality and consistency without affecting mean health state values [33][34][35].
Majority of the interviews were done in a room at the local government office, with a few being completed at health centers or at the respondent's place of residence. Each respondent received a token worth PhP 150.00 (approximately USD 3.00) for the survey completion. Ethical clearance for this study, with protocol code UPM-REB2017-156-01, was obtained from the University of the Philippines Manila Research Ethics Board.

EQ-5D-5L
The EQ-5D-5L is a multi-attribute health-related quality of life instrument with 3125 possible health states defined by its five dimensions (mobility (MO), self-care (SC), usual activities (UA), pain/discomfort (PD), anxiety/ depression (AD)) and five levels of severity (1 to 5, e.g., MO2 = slight problems with mobility). Thus, a five-digit number summarizes the level of problems for a specific individual. For example, health state '11111' indicates no problem in any of the five dimensions [4]. The second part of the questionnaire is a vertical visual scale, called Visual Analog Scale (VAS), which records the respondent's self-rated health on a scale of 0-100, where 0 means 'the worst health you can imagine' and 100 as 'the best health you can imagine'.
The official Tagalog, Cebuano, and English language versions of the EQ-VT protocol were used. Translations were produced by the EuroQol Group using a standardized translation protocol that followed international recommendations [36].

EQ-VT interview
After obtaining informed consent, the team implemented the EQ-VT protocol consisting of five sections [11]: (1) General welcome and introduction to the study.
(2) Completion of the self-reported EQ-5D-5L questionnaire and background questions (e.g., age, sex, experience of illness, disabilities, language proficiency, etc.). (3) Composite Time Trade-Off tasks commencing with a pre-test valuation of two wheelchair scenarios, followed by three scenarios of mild, moderate, and severe health states. It aimed to train respondents and to clarify their understanding. After which, valuation proceeded to 10 C-TTO tasks. The C-TTO uses traditional TTO to elicit better-than-dead (BTD) values and lead-time TTO to elicit worse-than-dead (WTD) values. This method is considered more robust than traditional TTO [28]. Details on C-TTO task can be found in Janssen et al. study [37]. There were 86 EQ-5D-5L health states included in EQ-VT for evaluation with C-TTO, distributed into ten blocks with similar levels of severity. Each block consisted of (i) one very mild state (only one dimension at level 2 and all others at level 1, e.g. '11112'), (ii) one most severe state ('55555'), and (iii) eight intermediate health states. Respondents were randomly assigned to one of the ten C-TTO blocks, with each health state presented in random order [28]. (4) Discrete Choice Experiment tasks wherein each respondent was randomly assigned to one of 28 DCE blocks with seven forced pair comparisons of health states. DCE has been included by the EuroQol Group to make valuation studies more robust and valid [38].
Respondents were presented with a pair of health states (i.e., Life A is the health state at the left of the screen and Life B is the health state at the right of the screen) to select their preferred state. The DCE design included 196 pairs of EQ-5D-5L health states distributed over 28 blocks, each consisting of seven pairs with similar severities. Further, the right-left order of the two health states were also randomized by the EQ-VT [14]. (5) Feedback module where respondents were shown a rank order list of the c-TTO health states in the order of how severe they deemed the health state. The respondent would then have the option to flag specific health states if they felt that the order was incorrect. Flagged health states were excluded in the valuation computation. They were also asked about the difficulty of the c-TTO and DCE tasks using Likert scale questions, but this information was not used for the current analysis.

Data quality control
The quality of data collected in an EQ-5D-5L valuation study relies heavily on interviewers' skills and adequacy in explaining the C-TTO tasks [39]. The team hired interviewers with prior survey experience and had proficiency in Filipino, English, and at least one other major Philippine language (Cebuano, Bicolano, Ilocano, Hiligaynon, Waray). They underwent intensive training and received individual feedback before and during deployment. During the actual fielding of the project, field supervisors provided on-site monitoring and feedback daily to their team. Additionally, the core team conducted bi-weekly meetings to address quality concerns. After the first 4 weeks of data collection, the team decided to conduct a 2-day retraining, as two interviewers were consistently flagged for 10-20% of the interviews they completed. After the retraining, none of the subsequent interviews were flagged.

Statistical analysis
We explored various techniques previously used for modeling EQ-5D 5L valuation data which included TTO-only models, DCE-only models, and hybrid approaches which used both TTO and DCE. Hybrid approaches were known to address possible issues that may occur in models using C-TTO-only or DCE-only data. We still included nonhybrid approaches (e.g., TTO-only) to ensure comprehensive exploration of candidate models and consistency with prior practice in valuations in other countries [13,14,17]. More details in the modeling are provided in Ramos-Goñi et al. [27,40]. The most widely used models contain either 20 or 8 parameters. The 20-parameter models (also called additive model) include a term for the effect of each level beyond the first level of each dimension (i.e., MO2 to MO5, SC2 to SC5, UA2 to UA5, PD2 to PD5, AD2 to AD5). This approach has been used in value sets, such as in Indonesia [14] and Germany [17]. The independent variables of the 8-parameter model (also called multiplicative model) include Level 5 utilities for each dimension (i.e., MO5, SC5, UA5, PD5, AD5) and the three intermediate utility levels (i.e., Level 2, Level 3, and Level 4). The same approach has been used in producing the Malaysian value set [8].
In selecting the final model to generate the value set, we first assessed the logical consistency of coefficients (i.e., the effect of severity levels increasing monotonically within each dimension). The next planned criteria applied were model fit and parsimony. While we tested many models, in this paper, we only present results from three 20-parameter approaches: (1) TTO-only 20-parameter Robust ordinary least squares (OLS); (2) TTO-only 20-parameter random intercept model; and (3) 20-parameter hybrid heteroscedastic model. We also explored various specifications of the 8-parameter models in terms of (1) data used (TTO-only vs hybrid), (2) intercept (fixed vs random), and (3) error (homoscedastic vs heteroscedastic). We compared the eight versions of the 8-parameter models using regular fit statistics and out-of-sample log-likelihood. All models were run using R 3.6.1. Hybrid and 8-parameter models were implemented using the 'xreg' package. Bootstrapping (10,000 samples) was used to estimate the confidence intervals for the 8-parameter model.

Respondents' characteristics
Among the 1107 individuals who were approached for the study, 1000 were included in the analysis. Among the excluded, 48 refused to participate, 30 did not meet inclusion criteria, and 29 were not included since the quota was already reached. (see Fig. 1 in Supplemental File 1). Respondents were given the choice for the interview site. Majority of the interviews were conducted in the local government unit offices, and several were at the respondents' domicile or at the local health center. About a third of all respondents (34.7%), completed the study in a language other than English, Filipino, or Cebuano (the three languages available in the software).
The characteristics of the included respondents mirrored the Philippine general population in terms of age group, sex, ethnolinguistic group, and region. Unemployment rate was the only characteristic that showed greater than 10% absolute difference from the general population (13.7%). Although residence, education, and income, had some difference with the national estimates, the discrepancy with the targets were small (education: ± 2.3; residence: ± 5.3; income: ± 11.3) ( Table 1).
Results further showed consistency between the reported health status and VAS score wherein those who reported 'Very Good' health state had the highest mean VAS Score (95, SD ± 6.8), while the lowest mean VAS score (82, SD ± 6.8) was noted among those reporting "Bad" health state (Table 1).

Feedback module results
Each of the 1000 respondents valued 10 health states, providing 10,000 C-TTO observations. Of these, 1164 (11.64%) health state values were 'flagged' by the respondents themselves as being in incorrect order of health states severity during the Feedback Module task and were excluded.
In the DCE dataset, respondents completed seven paired comparisons of health states, providing 7000 DCE observations. Of these, 42 (4.2%) respondents were flagged for displaying unusual response patterns (e.g., AAA AAA A, BBBBBBB, ABABABA or BABABAB). These observations were included in the final analysis since our inquiry showed no indication of false responses.

Modeling results
The three 20-parameter models showed non-monotonicity ( Table 2) and were removed as candidates for final models to calculate the Philippine value set. TTO-only models showed inconsistency in the coefficients for the mobility dimension wherein Level 3 had lower coefficients than the Level 2.
The TTO-only 20-Parameter Robust OLS model showed inconsistency for the Level 3 pain/discomfort dimension. Similarly, the 20-Parameter Hybrid Heteroscedastic Model yielded lower coefficients for Level 3 severity compared to Level 2 severity for all dimensions except for 'usual activities' ( Table 2). Among the 8-Parameter models, we chose the homoscedastic 8-parameter TTO-only model with random intercept as the final model. We observed that including a random intercept term improved out-of-sample log-likelihood without significantly changing the fit statistics like MSQE and ICC (see Table 2 to 5 in Supplemental File 1). The inclusion of DCE data through hybrid models or using heteroscedastic errors slightly improved log-likelihood but did not always improve fit statistics. Since the various random intercept models had similar fit statistics, we opted for the most parsimonious model to generate the Philippine EQ-5D-5L value set (Supplemental File 2). Given the non-normal nature distribution of the utility values, bootstrapping was used to generate 95% confidence intervals of the coefficients obtained (Table 3). 0017. Since the preferred model has a nonzero intercept that leads to a predicted value of less than 1.000 (i.e., 0.979 for the full health ('11111'), the team decided to apply linear adjustment to all the health states. This was done by dividing the coefficients by 1−α [13] and using the adjusted coefficients (except the intercept) to calculate the utilities (see Supplemental Table 1 in Supplemental File 1). Therefore, the adjusted value for health state '11111' becomes 1 representing full health and '44444' becomes − 0.0234. Consequently, the most severe health state ('55555') value equated to − 0.4289 (unadjusted) and − 0.4381 (adjusted). (See Supplemental File 2 for calculated values for all health states).

Discussion
In this study, we demonstrated the complexities of developing a value set in a multi-lingual country context while also creating an important resource to facilitate health technology assessment in the Philippines. We extended the literature for EQ-5D-5L valuation in several ways. First is that we showed an adaption of the protocol that allowed inclusion of speakers of languages that have not been included in the valuation software. Second, we presented the value of running multiple models covering additive and multiplicative approaches as well as using c-TTO-only and hybrid datasets. After running several models, we selected an 8-parameter TTO-only model with homoscedastic error term and a random intercept at the level of individual study respondents as the most appropriate model to generate the Philippine value set. We found that 20-parameter models violated the logical dominance order of the EQ-5D-5L descriptive system. Meanwhile, we found that c-TTO + DCE hybrid models did not significantly improve model performance. Third, we quantified the underestimation of utilities with use of value sets from a neighboring country rather than a country-specific value set. We found that the generated utility weights were, on the average, higher than those in the Thai value set suggesting, differences in health preferences between the Thai and Philippine populations.
According to the final model, mobility and pain/discomfort are the two dimensions that have the highest impact on the utility, and 169 (5.41%) health states have negative values or are considered worse than death. While there is some overlap in the Philippine and Thai values set [7], the Philippine utilities skewed more toward one (1) than the Thai utilities (Fig. 1A). Most (72%) of the Philippine utility values were higher with an average difference of 0.041 points (SD: 0.072). This underestimation of Philippine values by the Thai value set is most severe in states with lower sum scores (Fig. 1B). For example, the utility for '12345' (sum score of 15) for the Philippines was 0.4423 and for Thai it was 0.3685. At higher sum scores (and presumably worse states), the difference narrowed, and the Thai utilities then tend to overestimate the Philippine utilities at sum score of 23 and higher (e.g., at '55555', Philippine utility is − 0.4381 while Thai is higher at − 0.4211). We also note that the Philippine value set seemed to have more variability within groups based on sums of level digits compared to the Thai value set (Fig. 1C).
Previous EQ-5D-5L valuation studies in the region used other regression models for their valuation. South Korea [12] used a variation of the TTO-only model while Japan [11], Hong Kong [16], Indonesia [14], and Thailand [7] used the 20-Parameter Hybrid Model. These models resulted in non-monotonic utility values when applied to the Philippine data. One example is that individuals having moderate problems with mobility would have higher health utility than those having slight problems with mobility. Simplified nonlinear models have also been proposed as these are more parsimonious and, in the other value sets, have been demonstrated to outperform the 20-parameter model in terms of predicting out-of-sample health states [30]. The 8-Parameter model was one of the new approaches and was first used in the Malaysia EQ-5D-5L valuation study [8]. In our case, we found that using the hybrid approach did not lead to much better fit to the data compared to using only TTO data. Additionally, our results demonstrated that Filipinos value each domain differently and have different overall health preferences compared to other populations. Our results suggested that the 'mobility' dimension had the highest impact on health-related quality of life, followed by the 'pain/discomfort', 'self-care', 'usual activities', and 'anxiety/depression' dimensions. This is consistent with the reporting from 75% of the respondents that self-care and mobility are more important considerations in completing the DCE tasks (see Supplemental Table 6 in Supplement File 1). Mobility also had the highest utility estimates in South Korea [12], Japan [11], Canada [6], Uruguay [10], Indonesia [14], and Thailand [7]. On the other hand, 'pain/discomfort' and 'anxiety/ depression' have higher utility estimates in Netherlands and England [5,9] and this might be related to more accessible living conditions and less emphasis on manual labor in these countries. Future studies, especially qualitative ones, are needed to explore reasons for these observed differences, especially those between the Philippines versus surrounding nations like Thailand and Indonesia.
While our study is the first nationwide valuation study for the Philippines, it has several limitations. The main limitation was the use of a non-probability-based sampling design, which may have affected external validity and made it less likely to produce a statistically representative sample. To minimize this, the team obtained a sample that was roughly like the national general population in key demographic characteristics using a quota system. Another limitation was that we excluded illiterate individuals, albeit by necessity. While this group comprises only a minority of Filipinos (4.4%) [41], we are unable to assume that they hold the same preferences as the literate population. Another limitation was that the valuation software was translated only to English, Filipino, and Cebuano despite having at least four other major languages in the sites visited. The Where e is an error term assumed to have a mean of zero and x variables (e.g., x MO2 ) are binary indicator variables of the responses so that an MO score of 4 means x MO4 = 1 and all other x MO 's are coded as 0. MO mobility, SC self-care, UA usual activity, PD pain and discomfort, AD anxiety and depression, L level, log(σ) is the estimated variance term for the error distribution; log(ω) is the error term of the respondent-level random intercept.
translation ambiguity or inaccuracy may have been the reason for non-monotonic coefficients present in regression models other than the chosen 8-parameter hybrid model [14]. We mitigated the impact of translation ambiguity in multiple ways. First, we asked participants to select which of the three available languages they feel most comfortable in using. Second, we recruited interviewers who are fluent in the non-translated major languages and provided interviewers standardized translations of the EQ-5D-5L instrument which allows them to describe the various health states in the languages not available in the software. Finally, to ensure minimal biases and variability during data collection, we coordinated closely with the EuroQol foundation in adapting the EQ-5D-5L data collection and valuation protocol for the Philippine context and implemented the quality control process recommended by the foundation for valuation studies. While we followed the current EQ-5D valuation protocol [33], the use of the feedback module resulted in flagging and dropping of data. Our rate (11%) is also at the higher rate among published flagging rates (4.3% in Norway [35] Fig . 1 Comparison of the Philippine and Thai EQ-5D-5L value set: A density curve of utilities, B differences per simple score of level digits, C utilities per simple score of level digits to 9.7% in Indonesia [14]). We view the exclusion of these data points as an important trade-off to improve consistency and facilitate modeling of the data. We are also unable to examine the influence of socio-demographic characteristics on odds of flagging. These questions are important for future development of EQ-5D-5L valuation protocol. Lastly, our sample only covered the adult population. While it may be acceptable for now to use this value set for HTA of interventions for children, future work on using and valuation of the EQ-5D-Youth is needed.

Conclusion and recommendations
An 8-parameter TTO-only model with a homoscedastic error term was selected as the best representation of the Philippine general population preferences for EQ-5D-5L health states. This Philippine EQ-5D-5L value set is recommended for use in EQ-5D, and should be helpful in performing QALYbased economic evaluations to facilitate HTA-informed coverage decisions in the country. Future research is called for to explore the issues raised around translation ambiguity, the potential impact of these on utility valuation, and how to better account for such in subsequent modeling.
Acknowledgements This project would not be possible without the sponsors, DOH-PD and DOST-PCHRD. Findings and insights of this publication have not been endorsed by the above agencies and therefore, do not reflect their policy stance. We would especially like to thank EuroQol Group, Inc. who provided us with their expertise and guidance throughout the study. We would also like to recognize the valuable contributions of our research assistants (Maria Eleanor Candelaria, Jesebell de Jesus, Lindsley Go, Amelyn Mamoprte), our field supervisors (Honeyleen Loilo, April Joy Paloma, Justine Marjorie Tiu) and our data collectors (Annaveve Rose Alaban, Pearl Joy Asenjo, Angelica Caponpon, Vivian Concepcion, Eva Dimog, Vernalyn Agua, Joebell Gasang, Auerero Narag, Rowena Paulino, Leah Villarin). Ethical approval All procedures performed in this study involving human participants, including how informed consent was obtained, comply with the ethical standards of the University of the Philippines Manila Research Ethics Board (Protocol code: UPMREB2017-156-01).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.