Dealing with the health state ‘dead’ when using discrete choice experiments to obtain values for EQ-5D-5L heath states

Objective To evaluate two different methods to obtain a dead (0)—full health (1) scale for EQ-5D-5L valuation studies when using discrete choice (DC) modeling. Method The study was carried out among 400 respondents from Barcelona who were representative of the Spanish population in terms of age, sex, and level of education. The DC design included 50 pairs of health states in five blocks. Participants were forced to choose between two EQ-5D-5L states (A and B). Two extra questions concerned whether A and B were considered worse than dead. Each participant performed ten choice exercises. In addition, values were collected using lead-time trade-off (lead-time TTO), for which 100 states in ten blocks were selected. Each participant performed five lead-time TTO exercises. These consisted of DC models offering the health state ‘dead’ as one of the choices—for which all participants’ responses were used (DCdead)—and a model that included only the responses of participants who chose at least one state as worse than dead (WTD) (DCWTD). The study also estimated DC models rescaled with lead-time TTO data and a lead-time TTO linear model. Results The DCdead and DCWTD models produced relatively similar results, although the coefficients in the DCdead model were slightly lower. The DC model rescaled with lead-time TTO data produced higher utility decrements. Lead-time TTO produced the highest utility decrements. Conclusions The incorporation of the state ‘dead’ in the DC models produces results in concordance with DC models that do not include ‘dead’.


Introduction
The EQ-5D is one of the most widely used preference-based instruments. In 2009, the EuroQol Group released a new version (EQ-5D-5L) of the instrument that included five levels of severity in each dimension, as opposed to three in the original version [1]. For the new instrument to generate a set of societal values for the 3,125 health states, it had to distinguish five levels of severity in five dimensions.
Previous valuation studies had predominantly used time trade-off (TTO) to obtain social preferences from which value sets for EQ-5D health states could be modeled [2][3][4][5].
However, increasing the number of health states from 243 to 3,125 made it considerably more costly and complicated to conduct valuation studies based on an interview method such as TTO. Conventional TTO also has problems with health states worse than the state 'dead' [6]. These issues led the EuroQol Group to explore new approaches to obtain social values for health states, notably discrete choice (DC) methodology.
In a typical DC task, respondents compare two different options (paired comparison) and indicate which one they prefer. Discrete choice experiments (DCE) have been used extensively in areas such as marketing and transport but not so much in health economics. The use of DCE for healthstate valuation is a relatively recent development. Potential advantages include the relative ease of comprehension and administration of ordinal tasks and its greater reliability. DC models may also avoid some of the biases associated with traditional valuation methods [7]. Stolk et al. [8] demonstrated that DC modeling with the classic EQ-5D (three-level) instrument produces values that are congruent with values obtained by other valuation techniques, TTO in particular. That result confirmed previously published findings [9][10][11][12].
A question that arises about the use of DC for healthstate valuation concerns how to anchor the values produced by the choice model onto the dead (0)-full health (1) scale that is required to compute quality-adjusted life years. One strategy is to use DC data in combination with TTO data. This would entail deriving values from DC data and then using values from TTO to rescale those DC values. The need to collect TTO data alongside a DC study, however, might make the valuation study more complex than necessary. So, instead, the DC task could be designed in such a way that a value for 'dead' can be extracted from the DC responses and then used to anchor the values. One way to do this is by explicitly comparing the health state 'dead' to the EQ-5D-5L health states that are being judged in the DC task. An objection on theoretical grounds is that responses obtained from choices comparing heath states to dead may violate the random utility theory underlying the DC model. This happens when a subset of respondents consider all health states to be better than dead-for example, due to their religious beliefs. The size and effect of the bias are yet unknown; in practice, the bias may be small. Indeed, when this approach was adopted for the valuation of EQ-5D-3L health states [8], the results were promising. Whether or not this will also be so when it is used for EQ-5D-5L valuation will be expanded upon in this paper.
The primary objective of the study reported here was to examine the results of two different approaches to rescale DC models incorporating 'dead' into the utility scale as an anchor point and to compare the results with those obtained anchoring on lead-time TTO. A secondary objective was to evaluate the effect of excluding DC responses elicited from those who did not consider any health state to be worse than the health state dead.

Methods
This pilot study used both a DC and a lead-time trade-off (lead-time TTO) approach to produce values for the set of 3,125 (5 5 ) health states defined by the EQ-5D-5L instrument. As a detailed description of each approach in the context of health-state valuation can be found elsewhere [8,13], only a brief summary will suffice here. The study design followed recommendations from the EuroQol Group Valuation Task Force and was part of a multicountry initiative to explore methodological uncertainties about the valuation protocol for a new EQ-5D-5L value set.
Valuation of EQ-5D-5L health states

DC method
In the DC method, the respondents were asked to state their preference between two health states, A and B. This comparison of health states produces data that were subsequently analyzed to produce values on a latent scale. The profiles did not mention either their duration or what happens after these states. The DC design was generated using a Bayesian efficient approach [14] and consisted of 50 pairs of health states allocated to five blocks. These amounts were set in order to have sufficient power to estimate health-state values based on the proportions of choices between the pairs of states. To allow anchoring of the values on the 'dead-full health' scale, we extended the DC task by asking whether state A was worse than dead (WTD) and whether state B was WTD.

Lead-time TTO
The lead-time TTO method is an extension of the traditional TTO [13]. In a classic TTO, participants complete one task for health states considered better than dead and another task for those considered WTD. Leadtime TTO consists of a single task: to choose between Life A (T years in full health) and Life B [10 years in full health (lead time) plus 5 years in a target health state (disease time)]. All respondents start with Life A versus Life B where T = 15 years in 11111; depending on whether they choose A or B, the value of T is raised or lowered until the participants feel that A and B are the same. The lead-time TTO design was constructed with a Federov algorithm that allowed model parameters to be estimated without bias and with minimal variance [15]. The final lead-time TTO design contained 100 states in ten blocks.

Data collection
Four hundred persons, who were representative of the Spanish population in terms of age, gender, and education, took part in this study. An online survey administered via the EuroQol Valuation Technology (EQ-VT) software was used to collect DC and lead-time TTO responses. The final survey included the EQ-5D-5L questionnaire, ten DC tasks, and five lead-time TTO tasks as well as demographic questions. Participants were also queried about the difficulty of the DC and lead-time TTO tasks and how well they had understood them. The EQ-VT randomly assigned each participant to a DC block and a lead-time TTO block. In both types of block, the tasks were presented in random order. Given the number of participants, the study yielded an average of 80 observations for each DC pair (400 participants 9 10 states/50 pairs) and 20 observations for each lead-time TTO state (400 participants 9 5 states/100 states).
A survey company administered the study in Barcelona (June 2011). The researchers JMRG, ME, MH, and JC supervised the data collection with assistance from the EuroQol Group. Participants were recruited using telephone directories for the metropolitan area of Barcelona, personal contacts, a database of panelists, or 'snowballing' from contacts of participants included in this study.
Eight groups, each with an average of ten respondents, were recruited per day during 6 days, yielding the target of 400 participants. Each participant was assigned a computer and given an ID number and a password. Two computer rooms were available for each session. Interviews were conducted by two trained interviewers and four members of the Spanish Valuation Team (JMRG, ME, MH, and JC).

Statistical analysis
The sample as well as the DC and lead-time TTO responses were described with descriptive statistics. Four statistical models were used to estimate EQ-5D value sets: (1) a conditional logistic model, which produced the health-state values based only on choices between health states, thus ignoring responses to the dead questions (N = 397; henceforth DC TTO ; (2) a rank-ordered logistic model, which was then used on the full DC dataset and included responses to the dead questions (N = 397, henceforth DC dead ); (3) a rank-ordered logistic model, which used data only on those participants who chose at least one state worse than dead (N = 195, henceforth DC WTD ); a linear regression model, which used the lead-time TTO responses (N = 373; henceforth called lead-time TTO). The three models that were estimated with DC responses had to be rescaled to indicate that 0 stands for dead and that 1 forms the upper bound for full health. This was achieved using the additional 'dead' questions in the DC experiments in the case of DC dead and DC WTD . For the DC TTO model, the worst health state predicted on the lead-time TTO model (profile 55555) was taken as an anchor point to rescale the arbitrary scale of the conditional logistic model. Details on each model are given below.

DC TTO model
In the case of DC, the values are not directly observable and have to be calculated from the responses to the choice exercise. We assume that the participants choose the health state that gives them higher utility, so this can be modeled as a conditional logistic model. As such, the independent variable Y I represents the choice of participant I between A or B. The model assumes a value decomposition in two parts, explainable by V iA plus an error e i . If errors are assumed to be random and to show a type 1 extreme value distribution, a conditional logistic model emerges [8,16,17]. Let us assume that component V of the value can be explained with an additive model: where X iAj are 20 dummies {0, 1}, per participant i, representing the severity levels for each dimension of EQ-5D-5L for state A. Then b j will represent the coefficient for each independent variable j. Accordingly, it is possible to estimate the coefficients of the model and thus to extrapolate values that have not been observed within the population by using the linear part of the DC TTO model. The values obtained from the linear part of the model shown above are on an arbitrary scale. In order to rescale the values from the DC TTO model, the extreme negative value estimated in the lead-time TTO model (55555) was used to anchor the DC TTO 55555 health state to that value. Therefore, both models produce the same index value for the 55555 health state. To obtain a full set of utility decrements, every coefficient of the DC model is divided by the scalar (55555 lead-time TTO -1)/ (55555 DCTTO -1). The outcome of this transformation for each coefficient yields the utility decrements for the DC TTO model.

DC dead model
A rank-order logistic analysis was performed for the DC dead model [8]. In the same way as for a conditional logistic model, a two-part decomposition is assumed for the value. Where V iA , this model can be written as follows: Values are therefore obtained from the linear part (above) of the model on an arbitrary scale, as they are in the DC TTO model. For this DC dead model, the anchor point is the health state dead. Since the value for dead has to be 0, each coefficient is divided by b death : ensuring b 0 death = -1. The final function to estimate index values is given by:

DC WTD model
The DC WTD model was estimated as a rank-order logistic model similar to the DC dead model. For this case, the data were restricted to responses from participants who chose at least one state worse than dead. This model was used to evaluate whether including participants who did not choose any state worse than dead would bias the coefficient estimates.

Lead-time TTO model
For lead-time TTO responses, a linear model was estimated. The specification of the model in its general form is: where Y i represents the observed values from lead-time TTO data for participant i. A continuous variable, which takes values between -2 and 1, was created. The lead-time TTO values T from the survey were transformed into a -2 and 1 scale using the formula (T -T_lead)/(T_total -T_lead). In our design, T_lead = 10 indicates that the additional years in full health occur at the beginning of the exercise, and T_total = 15 indicates the sum of T_lead and disease time (5 years). The independent variables X ij are 20 dummies {0, 1} for each participant i, representing the severity levels for each dimension of EQ-5D-5L. b j represents the coefficients for each independent variable j; e i represents the errors for each participant i. Different specifications used in previously published examples were explored in order to fit the best model [2][3][4][5]. However, none of the models led to improved goodness of fit measured with log-likelihood, nor did they correct any inconsistencies in the models' coefficients. Therefore, the lead-time TTO model presented in this study was estimated using a simple ordinary least squares model. Finally, a function to estimate values for each health state was created using the regression model specified in the following equation: with mo2, mo3, mo4, mo5, sc2, sc3…, ad4, and ad5 indicating the corresponding dummy for the EQ-5D-5L severity level.
To compare the four models, we used descriptive statistics and quantile-quantile plots (Q-Q plots) of the value sets obtained from the different models. A Q-Q plot sets off estimates of the quantiles of two distributions against each other, and the pattern of points it displays is used to compare the two distributions of value sets. In addition, the value sets produced for each model are compared using the mean square difference (MSD) and concordance correlation coefficient (CCC) [18]. All values for the 3,125 health states are estimated by each of the estimated models. For each one:one comparison (model 1 vs. model 2), the MSD is calculated as follows: All statistical analyses were performed on STATA 11 MP (StataCorp LP, College Station, TX).

Sample characteristics
The study cohort comprised 400 persons with a mean age (standard deviation, SD) of 44.1 (16.9) years; and 59.7 % (239) were male (Table 1). More than half were employed or freelance and 15 % were retired. Less than half (43.75 %; 175) were in full health (11111). Few reported extreme or severe problems in any dimension of the EQ-5D-5L (three was the maximum number of respondents reporting extreme problems in the 'usual activities' dimension; see Table 2).

Models
For the estimation of the three DC models, we omitted two respondents from the analysis because their DC choices were always A or always B; the 328 responses without a logical order among state A, state B, and dead were also omitted. For the lead-time TTO model, it was necessary to clean the dataset for inconsistencies. In this case 24 respondents with the same value for all TTO tasks were excluded from the analysis, as were two respondents for whom data were missing due to technical problems. Several model specifications were explored. However, only main effects models are presented here. The others did not perform better in terms of having fewer inconsistencies or maximizing the likelihood function. In order to allow comparison among the models' coefficients, we present here the rescaled coefficients for the three final DC models. The DC WTD model has the highest likelihood value (-1,401.549), but DC TTO performs better than DC dead (-1,791.37 vs. -2,700.25 respectively) ( Table 5).
Regarding the rescaling method for DC models, the value for 55555 was estimated with a lead-time TTO model to be -0.535. This value was used to anchor the DC TTO model, which previously had a value of -5.491 for state 55555. The ratio to rescale the coefficients was abs [(-5.491 -1)/(-0.535 -1)] = 4.228. The final rescaled coefficients for DC TTO are b 0 j = b j /4.228. In DC dead models, the dead state has a value of 0. The coefficient for the dead state b dead in the DC dead model is -6.494, since this coefficient must be -1 (meaning that the dead state has a value of 0). The rescaled coefficients are then b 0 j = b j / 6.494. If the coefficient for the dead state b dead in the DC WTD model is -5.346, then the rescaled coefficients are b 0 j = b j /5.346.
In general, values in the lead-time TTO model were lower than in any of the DC rescaled models due to the estimated intercept value of 0.452. However, there are several inconsistencies for some estimated coefficients. In all of the estimated models, for example, the coefficient for moderate problems (level 3) in the pain/discomfort domain is positive, although not statistically significant. Other inconsistencies are statistically significant: the lower coefficients for slight (level 2) compared to moderate problems (level 3) in the self-care domain for the three DC models and in the mobility and usual-activities domain for DC. The value of the 55555 state in the DC dead model SD standard deviation a Data are presented as the number (N) of subjects with the percentage of total subject cohort given in parenthesis, unless stated otherwise  (Fig. 1c, e, f). Both DC dead and DC WTD models estimated very similar values (Fig. 1a).
The MSD for differences between the 3,125 states in both DC dead models is 0.009. However, the MSD for the differences with the lead-time TTO model are 0.217, 0.142, and 0.045 for the DC dead , DC WTD , and DC TTO models, respectively. The MSD for the differences with DC TTO are 0.091 and 0.044 for DC dead and DC WTD , respectively.

Discussion and conclusions
In the study reported here we compared two approaches for rescaling DC values on the dead (0)-full health (0) scale to obtain an EQ-5D-5L value set that can be used in economic evaluation. The two approaches were: (1) DC incorporating an additional judgmental task in which the health state 'dead' is assessed against other health states; and (2) a DC model anchoring on lead-time TTO values.
None of the estimated models were completely consistent in terms of regression coefficients. All models had some positive coefficients. Also, to be consistent, a model must meet the condition that each dimension should satisfy an increasing order in the absolute value of the coefficients for each level of severity. According to the results, each of the models did satisfy the condition for some dimensionsbut not for all. The DC TTO model did not satisfy the condition more often than the DC dead models, and its rescaled results produced higher utility decrements than both rescaled DC dead models. The rescaled DC WTD model differs less from rescaled DC TTO than from rescaled DC dead . However, we have to take into account that the intercept for the lead-time TTO model was extremely high, which leads to health state values that lack face validity. For example, a person with slight mobility problems has a value of \0.55, which is ridiculous when compared to the previous EQ-5D value set [2][3][4][5].
The reason for the inconsistencies in the logistic regression results is not clear. On the one hand, these inconsistencies could be explained by the fact that the DC design included only 50 pairs of health states, which may be inadequate to yield sufficient information (and thus power) to estimate the logistic models (some coefficients were not statistically significant). On the other hand, more power (thus, a larger sample size) may be needed for each pair of health states when the number of pairs is fixed. When the data were applied to the Spanish arm of the multi-country study, the inconsistencies in the DC model disappeared [19]; however that study had both more pairs (200) and more observations per pair. The questions touching upon dead, which are necessary for the DC dead models, were only conducted in the Spanish pilot study. Therefore, the analysis of DC dead models could not be extended to all countries for the sake of comparison. In that light, it would make sense to increase the number of pairs in the DC design that touch upon dead and also to increase the power per pair as this approach would ensure that future studies conducted by using a DC model incorporating dead will be consistent for the whole multi-country dataset.
On comparing the results of the modeling exercise for all participants versus those who rated at least one state as WTD, we found that the DC dead and DC WTD models produced similar results, with the only difference being the position of 'dead'. In particular, we found higher utility decrements and thus lower health state values for EQ-5D-5L states when the participants who did not rate any state as WTD were removed from the analysis. However, this may not amount to bias and may simply reflect the preferences of the population. Whatever the reason, the impact on actual results was not large. It should be kept in mind that this was not a direct comparison, as the participants it covered were not identical. From a mathematical point of view and based on the RUT theory, estimation may fail    when many participants do not choose any WTD option. Nevertheless, the DC dead model could be estimated and did not perform much worse than the DC WTD model in terms of likelihood.
There is some concern about the feasibility of some elements of the DC and lead-time TTO as conducted in this survey. In general, the participants understood the hypothetical nature of the health states and lives they were presented with. They knew they had to choose the health state/life that they preferred rather than the health state/life with which they identified the most. However, some problems arose in the course of both exercises, especially during the lead-time TTO task. Many individuals were confused when making choices and did not realize that the health conditions changed when they answered that 'both lives are almost equal'. Although this consequence had been explained, it was necessary for the administrators to do the first lead-time TTO exercise together with the participants so they could do the rest of the exercises as required. The general impression was that many of the respondents did not answer the TTO part of the exercises appropriately. Some individuals reported that they could not decide when they were indifferent between both lives because they always preferred Life B. This indecisiveness could explain the illogical results obtained with the leadtime TTO model. In general, the respondents needed less assistance on the DC part of the survey, but many did comment on the difficulty of making choices between health states. The difficulties they encountered in the survey tasks emphasize the important role of the face-to-face interviews that are also part of the study design. DC and lead-time TTO elicitation techniques require the respondents to compare health states with 'dead'; this question was posed directly in each of the DC exercises and indirectly in each of the lead-time TTO exercises. From the results we can deduce that a state was more frequently considered WTD in indirect (lead-time TTO) than direct questions (DC ? dead), possibly due to the fact that in lead-time TTO the distinction between negative and positive values was not explicitly made. This fact could explain the lower values observed for the lead-time TTO method and hence the DC TTO . Previous studies have investigated the incorporation of the health state dead in the DC task [8,16,17]. However, none of these used the EQ-5D-5L to allow a direct comparison. Stolk et al. [8] used the classic three-level version of EQ-5D. Our results do not confirm those obtained by Stolk et al., probably because their comparison was made with classic instead of lead-time TTO. Also, the five-level version makes the DC task more complicated for the respondents, and this complexity might have led some participants to make random choices when they could not decide between health states A and B.
DC dead models produce correlated results with slight differences (no bias). Incorporating the health state dead into the general DC technique produces results in concordance with the DC TTO . DC modeling warrants further research to optimize the design if it is to be used to estimate EQ-5D-5L value sets. The lead-time TTO produces very high utility decrements, and its consistency among responses is lower than that of DC models.