FormalPara Key Points

The EQ-5D-Y-3L valuation protocol states that composite time trade-off (cTTO) data should be used to anchor discrete choice experiment (DCE) data onto the full health to dead scale but does not specify which method should be used to do so.

There are several different anchoring approaches that can be used and the final decision on an approach is likely to require some value judgements to be made. This paper sets out the key considerations for analysts when deciding upon an anchoring approach.

Researchers should seek input from relevant stakeholders when commencing valuation studies to help guide decisions and should clearly set out their rationale for their preferred anchoring approach in study outputs.

1 Introduction

Health state utilities are required to estimate quality-adjusted life years (QALYs) for use in health technology assessment (HTA). These are normally obtained using concise generic measures, such as EQ-5D-5L [1], which have value sets that provide a utility for any health state described. Value sets are derived using stated preference methods, typically conducted with members of the general population [2].

Traditional methods, such as time trade-off (TTO), provide data on an interval scale anchored at full health (1) and dead (0) as required for estimating QALYs. However, these methods have been criticised due to their difficulty and associated biases [2, 3]. Further issues may arise with these methods when valuing health states for children (aged < 18 years) because in this context adults may be asked to value health in someone else (a child) [4, 5]. These issues include difficulty in making decisions in tasks [6], as well as emerging evidence in some studies that adults are less willing to trade off life years when asked to complete TTO tasks concerning child health [7,8,9,10]. The latter has raised broader concerns about the comparability between child QALYs and adult QALYs, which may cause issues in the context of HTA [11].

In recent years, ordinal methods such as discrete choice experiments (DCEs) have had an increasing role in health valuation [12]. In the context of child health valuation, DCEs focused on pairwise choices between health states defined by EQ-5D-Y-3L, without a duration attribute, have been seen as particularly useful as they can provide information about the relative importance of different dimensions of health, without any need for respondents to trade-off years of a child’s life. However, unlike DCEs that include a duration attribute, it is necessary for further data to be collected and used to anchor the DCE data onto the required scale. Various methods have been used to generate data for anchoring, including TTO, visual analogue scale (VAS), and the ‘location of dead’ (LOD) exercise (a component of the ‘personal utility function’ [PUF] approach) [13,14,15,16].

Following an extensive programme of research, a valuation protocol was developed for EQ-5D-Y-3L [17]. The protocol recommends that EQ-5D-Y-3L is valued using a DCE, and that the DCE data are anchored using data from a separate composite TTO (cTTO) task. The aim of this initial protocol was to enable value sets for EQ-5D-Y-3L to be developed, and it has been successful in achieving that aim [18,19,20,21,22,23,24]. However, the protocol was only considered to be a starting point, with further updates expected in future due to the various ongoing normative and methodological debates in the context of child health valuation [4, 25]. These debates include, but are not limited to, the source of preference data [26, 27], the perspective taken in the task [6, 7, 9, 28], and the choice of valuation method [29].

A further methodological debate relates to the optimal approach for anchoring DCE data using cTTO data. The current valuation protocol does not recommend a specific approach. Furthermore, the protocol is not prescriptive regarding the amount of cTTO data collected, only recommending a minimum of ten health states and 200 observations. Resultingly, in practice, EQ-5D-Y-3L valuation studies have utilised different anchoring approaches, and several have included more than ten health states in the cTTO task.

Approaches which can in principle be taken to anchor DCE data using cTTO data, from rudimentary to more complex, include (1) anchoring on the worst health state; (2) mapping DCE onto mean cTTO values; and (3) hybrid modelling. Whilst these three approaches have been subject to a statistical comparison in the past [30], there has been relatively little discussion around how the methods are employed in practice, and how resulting value sets may differ in their characteristics. As many EQ-5D-Y-3L valuation studies are ongoing [11], research teams will need to consider how to anchor their DCE data. Furthermore, valuation studies for other instruments may face similar issues [31].

This paper aims to address the gap in the literature by explaining the different approaches, illustrating their use using previously collected EQ-5D-Y-3L valuation data [26, 32], and discussing the implications of using each approach for the characteristics of the resulting value sets.

2 Methods

2.1 Overview of Anchoring Approaches Using cTTO Data

2.1.1 Anchoring on the Worst Health State

This approach requires only the mean value of the worst health state in the descriptive system which, for the EQ-5D-Y-3L instrument, is denoted as 33333, reflecting that each dimension is at the worst level (i.e., level 3). This value is used to create a rescaling parameter, \(\theta\), using Eq. 1:

$$\begin{array}{c}\theta = \frac{\left\{{U}_{DCE}\left(WS\right)-1\right\}}{\left\{{\overline{U} }_{TTO}\left(WS\right)-1\right\}}\end{array}$$
(1)

where ŪTTO(WS) is the mean value of the worst state in the descriptive system from the cTTO task, and UDCE(WS) is the latent scale value for the worst state in the descriptive system. If regular dummy coding is used, with the best level as the base level for each dimension, UDCE(WS) is one plus the sum of the coefficients for the worst level on each dimension. To rescale the DCE data, the coefficients must subsequently be divided by \(\theta\).

2.1.2 Mapping DCE onto Mean TTO Values

The simplest mapping approach [30] estimates a rescaling parameter, \(\theta\), based on the relationship between the observed mean TTO values and the unanchored DCE values using a linear ordinary least squares regression model as in Equation 2:

$$\begin{array}{c}{\overline{U} }_{TTO}\left(j\right)=\theta {U}_{DCE}\left(j\right)+\varepsilon \end{array}$$
(2)

where ŪTTO(j) is the mean value from the TTO task for all j health states valued, and UDCE(j) is the latent scale value for each of the j health states valued in the cTTO task based on the DCE coefficients. The DCE latent scale coefficients are then multiplied by the rescaling parameter, \(\theta\) (estimated in the mapping model), to produce the value set.

For simplicity, and to enable direct comparisons with the hybrid model, we focus on linear mapping without a constant. However, different specifications can be tested to improve the model fit, such as the inclusion of a constant and/or non-linear terms. Additionally, if sufficient cTTO data are collected such that they can be modelled independently, the predicted cTTO values for every health state in the system (i.e., j = 243 for EQ-5D-Y-3L) could be included in the model instead.

2.1.3 Combining DCE and TTO Data in a Hybrid Model

An alternative anchoring approach that takes into account all of the individual-level DCE and cTTO observations is a hybrid model. This approach has since been employed in many EQ-5D-5L valuation studies and is described in detail elsewhere [30, 33, 34]. Briefly, the hybrid model works under the assumption that the same value function dictates the responses to both DCE and cTTO tasks and therefore the coefficients from DCE and cTTO models should reflect the same relative weights for the dimensions and levels of the instrument being valued. The hybrid model multiplies the likelihood functions obtained through estimation of separate DCE and cTTO models, and uses a rescaling parameter, \(\theta\), based on the assumption that the DCE model coefficients are proportional to the TTO model coefficients. Thus, the model output is the final anchored value set. User-written commands are available for commonly used software packages that enable analysts to employ this method (hyreg in both Stata and R) [35].

2.2 Data Sources

To illustrate the different anchoring approaches, and compare the characteristics of the resulting values, previously collected data were used. The DCE data were the adult responses from a UK-based EQ-5D-Y-3L study which compared the preferences of adults and adolescents (aged 11–17 years old) [26]. The cTTO data were from a multinational study which examined the impact of wording and perspective on EQ-5D-Y-3L valuations [32]. The DCE design is the same as that in the EQ-5D-Y-3L valuation protocol [17]. One-thousand adults completed the DCE survey. The cTTO data were from a composite cTTO task where 17 health states were valued, split into two blocks of nine (both containing 33333). There were four arms in the study, and this analysis uses only the data from the EQ-5D-Y-3L child perspective arm to align with the DCE data. There were 211 respondents in this arm, split almost equally across four countries: England, Germany, Spain, and the Netherlands. Further detail about the data sources can be found in the electronic Supplementary Materials (ESM).

2.3 Data Analysis

Two of the anchoring approaches require a latent scale choice model to be estimated. A random utility framework was employed, and a conditional logit model was estimated with a linear, additive utility function, as in Eq. 3:

$$\begin{array}{c}{V}_{j}={\beta }_{1}MO2+{\beta }_{2}MO3+{\beta }_{3}LAM2+{\beta }_{4}LAM3+{\beta }_{5}UA2+{\beta }_{6}UA3+{\beta }_{7}PD2+{\beta }_{8}PD3+{\beta }_{9}WSU2+{\beta }_{10}WSU3\end{array}$$
(3)

where each independent variable is a dummy variable indicating whether that level was observed. Although alternative models have been shown to provide a better fit for these data elsewhere [26] as might be expected, the conditional logit model was preferred for this analysis as it is consistent with the hybrid model approach. The coefficients from the DCE model (as well as the value set coefficients) were transformed to relative attribute importance (RAI) scores by dividing the level range for each dimension by the sum of all level ranges.

Due to the censoring in the cTTO task (i.e., by design, the cTTO task used in the EQ-5D-Y-3L protocol yields a minimum value of −1), the mean cTTO values arguably should be adjusted accordingly before they are used in the worst health state anchoring and mapping approaches. This was done by estimating Tobit models for each health state with left censoring at −1. A censored hybrid model was estimated using the hyreg user-written command in Stata [35]. A linear, additive function as in Eq. 3 was also used. The performance of the value sets was explored by examining the alignment between the predicted values and the Tobit-adjusted cTTO values, using mean absolute error (MAE) and root mean squared error (RMSE).

3 Results

Table 1 provides the value sets produced using the three anchoring methods. The underlying data and model output can be found in the ESM. The value sets based on worst state rescaling and linear mapping approaches perfectly reflect the RAI scores from the DCE (Table S1), whereas the hybrid model RAI scores differ slightly. The most notable difference is that, in the hybrid model value set, mobility is the least important dimension rather than ‘looking after myself’. This appears to be driven by a direct switch in the level rankings between mobility level 3 and pain/discomfort level 2.

Table 1 Value sets generated using the three different methods, and selected summary statistics

The predicted values for the selected health states vary between the three value sets. As expected, the worst state rescaling value set has the lowest value for 33333 (−0.271), which is identical to the observed Tobit-adjusted mean from the cTTO task (Table S2). The mapping value set has the highest value for 33333 (−0.052) and therefore has the smallest overall range and the smallest proportion of health states that are worse than dead (n = 1; 0.4%). However, of all three value sets, the hybrid model value set provides the highest values for all but one of the mild health states, illustrating that the differences between these value sets is not limited to their overall range.

Figure 1 compares the values of the 243 health states between the three value sets by rank order. The reference set of rankings is based on the worst state rescaling value set (identical to the DCE rankings). The rank ordering is identical with the linear mapping value set (indicated by the smooth line), but the range is smaller. In contrast, the rank ordering differs from the hybrid model value set (indicated by the oscillation), and the range is similar to that of the linear mapping value set.

Fig. 1
figure 1

Values of EQ-5D-Y-3L health states with the three different value sets. DCE discrete choice experiment

Figure 2 presents the comparison between the predicted values from the three value sets and the Tobit-adjusted means from the cTTO task. All three value sets underpredict the majority of the mild health states. The worst state rescaling value set performs worst overall (higher MAE and RMSE), and the hybrid model value set performs best overall.

Fig. 2
figure 2

Comparison of predicted values and observed Tobit-adjusted mean cTTO values. cTTO composite time trade-off, MAE mean absolute error, RMSE root mean squared error

4 Discussion

4.1 Comparison of the Characteristics of the Three Value Sets

Whilst the value sets produced by each anchoring method are similar, there are some notable differences. The values for 33333 are significantly higher when using the mapping and hybrid model approaches, relative to the worst state anchoring approach, which, by design, fixes the value set at the observed mean cTTO value for the worst health state. Therefore, in comparison, the value sets derived from mapping and the hybrid model have a smaller range.

In terms of relative importance, the worst health state anchoring and mapping approaches perfectly maintain the RAI scores from the DCE, as they are simply linear transformations of the DCE coefficients. In contrast, the hybrid model combines all the individual-level DCE and cTTO data, resulting in different relative importance scores.

In terms of alignment between the predicted values and the observed mean cTTO values (Fig. 2), differences in alignment were observed between all three value sets. This suggests that preferences differ in the DCE and cTTO datasets. Overall, the value set from the hybrid model has the best alignment.

4.2 How Generalisable Are These Results to Other EQ-5D-Y-3L Valuation Studies?

To understand the generalisability of these results, it is first worth noting how these data might differ from other EQ-5D-Y-3L valuation studies. The DCE data used here is based on the same experimental design as the EQ-5D-Y-3L valuation protocol [17, 26]. However, the selection of health states used in the cTTO differs from other EQ-5D-Y-3L valuation studies [11]. Furthermore, our cTTO data are from respondents from four countries [32], whereas the DCE data are from one country. Thus, it may be that the alignment between the cTTO and DCE data in our case study is worse than might be expected when both datasets are made up of respondents from the same country.

Three published EQ-5D-Y-3L valuation studies have made comparisons using different anchoring approaches, finding that (in line with this case study) alignment is worse with the worst state rescaling approach compared with other approaches [20, 22, 23]. Furthermore, the worst state rescaling approach consistently produced a lower predicted value for the worst state relative to other approaches (resulting in greater value set ranges) [20, 22, 23]. Finally, differences were also observed in the importance of dimensions between the hybrid model and DCE results [20].

4.3 How Should an Anchoring Approach Be Selected?

The extent of the potential differences in value set characteristics based on the anchoring approach chosen highlights how important this choice can be. Analysts must weigh up multiple factors when making their decision. Table 2 summarises some key analytical considerations.

Table 2 Key analytical considerations when deciding upon an anchoring approach

In most cases, worst state rescaling could be viewed as suboptimal. It is unlikely that a cTTO exercise would be conducted with only the worst health state being valued—nor would it be advisable, given that this state tends to be seen as ‘worse than dead’ by many respondents. Using all the cTTO data that has been collected in a linear mapping would be advantageous, as it would improve the alignment between the predicted and observed cTTO values, without compromising the alignment with the DCE results. If we therefore regard worst state rescaling as suboptimal, and both linear mapping and hybrid modelling are feasible, the choice between them could be based solely on alignment with the cTTO data, which will typically be greater with hybrid models. However, it is worth noting that the proportion of cTTO and DCE data may have an impact on the results of hybrid models. Whilst further research is needed to better understand the extent of the impact on resulting value sets, ultimately this means that the hybrid model approach is less predictable than linear mapping.

However, for normative reasons, some relevant stakeholders (i.e., local end-users) may have priorities that extend beyond the alignment between the resulting value set and the observed cTTO values. There may be cases where alignment with the DCE data is prioritised, such as when mixed samples are sought [27]. For example, if adolescents complete the DCE and adults complete the cTTO, the use of linear mapping will ensure that adolescents’ preferences (in relation to the relative importance of dimensions) are fully represented in the value set. Alternatively, some stakeholders may be particularly concerned about the range of the value set due to concerns about comparability between child and adult QALYs and therefore seek to maximise it (possibly using cTTO data from the valuation of an adult instrument instead), which could be achieved relatively simply with worst state rescaling.

These normative considerations, coupled with the lack of a ‘gold standard’ anchoring approach, present challenges for analysts and we therefore make two recommendations. Firstly, research teams should engage with relevant stakeholders before commencing valuation studies to better gauge priorities relating to the normative issues, which can later inform analytical approaches. Secondly, analysts should ensure that they test the sensitivity of their results using different anchoring approaches, and clearly report the rationale for their preferred approach in study outputs (with results reported based on different anchoring approaches, where relevant and feasible).

4.4 What Future Research is Needed?

There are some areas where further methodological research would be advantageous. Firstly, a better understanding of the hybrid model’s experimental design and sample size requirements, and the impact of DCE-cTTO data proportions, would be useful. Future updates to the EQ-5D-Y-3L valuation protocol are likely to include a cTTO design that is optimised for the estimation of hybrid models, which based on past empirical work is likely to involve a greater number of health states [36, 37].

However, an arguably bigger issue relates to the source of data for anchoring. As recent work has shown [6,7,8,9,10], cTTO data in this context may be biased due to participants being reluctant to trade years of life of a child. It may therefore be advantageous to consider alternative methods of obtaining stated preference data for anchoring that avoid this trade-off or present it differently. Prior studies have considered the use of data from a VAS and more novel tasks such as the LOD approach [15, 16]. Another alternative may be to use methods that can generate value sets without additional data collection, such as DCE with duration [38, 39], or the (online) PUF approach [13, 14]. Less is known about these methods and the similarities of the data that they produce compared with cTTO data in this context.

5 Conclusion

There are numerous analytical considerations when anchoring DCE data using TTO data and, in the context of EQ-5D-Y-3L valuation (or other child/adolescent-specific instruments), many of these involve value judgements. There is no single criterion on which to judge the performance of a value set, and therefore no anchoring approach is guaranteed to outperform the others. However, where cTTO data are collected for more than one health state, anchoring on the worst health state is arguably suboptimal. Researchers should seek input from relevant stakeholders when commencing valuation studies to help guide decisions and should clearly set out their rationale for their preferred anchoring approach in study outputs.