FormalPara Key Points for Decision Makers

When considering youth health, respondents from the German adult general public considered the health dimensions pain/discomfort and feeling worried/sad/unhappy as most important.

Following the international EQ-5D-Youth (EQ-5D-Y) valuation protocol, an EQ-5D-Y value set for Germany was developed, which now enables cost-utility analysis of paediatric healthcare interventions.

1 Introduction

Economic evaluations compare the costs and benefits of healthcare approaches, applications, and technologies (hereafter, ‘healthcare interventions’). To measure benefit, quality-adjusted life-years (QALYs) are often used as a standard utility measure in cost-utility analyses (CUAs) [1]. In several countries, CUAs including QALYs are used to inform decision making in healthcare [1, 2]. QALYs combine health-related quality of life (HRQoL), obtained by direct or indirect valuation approaches, with length of life [3, 4]. CUAs of healthcare interventions for children and adolescents are less standardised than those for interventions for adults [5, 6]. This is because there is less consensus around how HRQoL should be measured and valued for children and adolescents.

A review in 2019 summarised all primary studies reporting health utilities for childhood conditions and identified various approaches that had been used in practice [3]. In general, indirect valuation approaches are seen as advantageous as they use standardised generic preference-based HRQoL instruments with corresponding value sets to obtain a single utility for each health state to be used in QALY calculations [1, 4]. Several such instruments have been designed for younger populations. However, some were designed only for specific age groups (e.g. 16D for adolescents aged 12–15 years, 17D for children aged 8–11 years, and Assessment of Health Utility Measurement for adolescents aged 12–18 years) or have unclear age ranges and are relatively long (e.g. Assessment of Quality of Life 6 Dimension (AQoL-6D), which has 20 items, and the Quality of Well-Being scale (QWB), which asks about three dimensions and 58 symptoms) [4, 6]. Hence, the number of generic preference-based HRQoL instruments applicable to a broad age range in children and adolescents is limited, and there is no widely used child-specific instrument [4, 6, 7].

The generation of value sets for youth-specificFootnote 1 HRQoL instruments is a more recent development that follows a debate on methodological and conceptual issues of valuation studies for paediatric instruments. The discussion is first and foremost about whose values should be considered: those of the adult general public, of parents, or of children and adolescents themselves [6, 7]. Studies show differences in the values given to child health by the adult general public and parents [8, 9] and between those given by adults and adolescents [10, 11]. The most suitable elicitation methods and perspectives in valuation tasks have also been discussed [4, 6, 7].

The EQ-5D-Youth (EQ-5D-Y) is a short generic instrument developed by the EuroQol Group to measure HRQoL in children and adolescents aged 8–15 years as an equivalent to the adult instrument EQ-5D-3L (three-level version of EQ-5D). It consists of five dimensions: mobility (MO), looking after myself (SC), doing usual activities (UA), having pain/discomfort (PD), and feeling worried/sad/unhappy (AD), with each dimension specifying three levels of severity: no problems/not (level 1), some problems/a bit (level 2), and a lot of problems/very (level 3)Footnote 2 [12,13,14]. The adult instruments are widely used to assess HRQoL and calculate QALYs in economic evaluations, and—with its similar structure—the EQ-5D-Y also has the potential to be used [2, 7, 15]. However, very few EQ-5D-Y value sets currently exist [12, 16, 17]. As previous studies have shown that health state values for adults and children differ, value sets for adult instruments should not be used to calculate EQ-5D-Y-based utilities. Instead, separate value sets are necessary [18, 19]. Based on results of explorative studies testing different approaches of valuing EQ-5D-Y health states, a first international valuation protocol for EQ-5D-Y was recently published [20].

In Germany, no youth-specific HRQoL instrument is available that allows for utility calculation. Therefore, the main objective of this study was to develop a German value set for EQ-5D-Y—as one of the first national value sets—according to the methods proposed by the protocol to enable the use of the EQ-5D-Y as a utility measure. In addition, we explored differences in values given to child health states by parents and non-parents.

2 Methods

2.1 Data Collection

Data collection took place between November 2019 and July 2020. As suggested by the EQ-5D-Y valuation protocol, it was split in two sub-surveys to collect (1) discrete choice experiment (DCE) data via an online survey and (2) composite time trade-off (cTTO) data via interviews. Ethical approvals for both sub-surveys were received in Germany (Ethics Committee of Bielefeld University, No. EUB 2018-172 and EUB 2019-204).

2.2 Methods for Eliciting Health State Preferences

DCEs are used to assess the relative importance of dimensions and levels, and cTTO is used to rescale/anchor the latent scale DCE values on a scale from full health (1) to dead (0) [20]. The DCE task uses pairwise comparisons. The respondent is asked to decide which out of two health states, A and B, is better (forced choice) [21, 22]. As we used neither a ‘duration’ attribute nor a comparison to the alternative ‘dead’ in our DCE, only latent scale values were produced. The cTTO identifies the number of life-years in full health at which the respondent is indifferent between a longer period of life-years with impaired health and a shorter life duration in full health. The respondent is asked to trade-off life-years. In cTTO, the conventional TTO is used to start the tasks for all health states. For health states that the respondent considers to be worse than being dead, lead-time TTO is used [23,24,25].

2.3 Health State Selection

The DCE design from the EQ-5D-Y valuation protocol is D-efficient and consists of 150 DCE pairs separated into ten blocks. A two-dimension overlap was used for all pairs, meaning that the health states in each pair differed in the levels of three dimensions, whereas the other two dimensions presented the same level [20]. Differences between health states were presented in bold font to reduce non-attendance. Further, level balance among blocks was ensured. In each block, the order of health state pairs was randomized, as  well as the left/right presentation during the task. Each respondent completed 18 DCE tasks: 15 from the experimental design and three for quality control (QC) purposes (see Sect. 2.7).

The cTTO design included one block of ten health states, which were valued by each respondent. The design included three mild health states (11112, 11121, 21111), two moderate ones (22223, 22232), and five severe health states (31133, 32223, 33233, 33323, 33333) [20]. The order of health states was randomised for each participant.

2.4 Framing of Discrete Choice Experiment and Composite Time Trade-Off Tasks

In both valuation tasks, participants were asked to imagine a hypothetical 10-year-old child when valuing the health states. Therefore, the wording of the EQ-5D-Y proxy version 1 was used for the health states, which means only the part of the item describing the dimension and severity level, e.g. ‘no problems walking about’ was used (rather than ‘I have no problems walking about’) [12].

2.5 Interview Process

The online DCE survey consisted of the following elements:

  1. 1.

    Information sheet on the project aim and procedures

  2. 2.


  3. 3.

    Demographic questions on age, gender, and region to inform the quota sampling

  4. 4.

    Self-reported EQ-5D-Y to familiarise respondents with the instrument

  5. 5.

    Three questions on experience with severe illness

  6. 6.

    18 DCE tasks (15 DCE tasks and three DCE tasks for QC)

  7. 7.

    Self-reported EQ-5D-5L

  8. 8.

    Socio-demographic and health-related questions

The cTTO data were collected via computer-assisted personal interviews using the EuroQol portable valuation technology (EQ-PVT). An interviewer guideline was prepared explaining the interviewer’s role, how to handle the software, and instructions to be given to the respondents. Each of the four interviewers attended a day-long training session and had to conduct three test interviews. The interviews consisted of the followingFootnote 3:

  1. 1.

    Welcome and study aim (information sheet obtained prior to the interview)

  2. 2.

    Written consent

  3. 3.

    Self-reported EQ-5D-Y to familiarise respondents with the instrument

  4. 4.

    cTTO wheelchair examples plus three cTTO practice states

  5. 5.

    Ten cTTO tasks

  6. 6.

    Feedback module

  7. 7.

    Debriefing questions

  8. 8.

    Socio-demographic questions and three questions on experience with severe illness

  9. 9.

    Self-reported EQ-5D-5L

Most of the cTTO interviews were conducted face to face; however, with the advent of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2; coronavirus disease 2019 [COVID-19]) pandemic, 32 interviews were conducted online using a video conference application to finish data collection.

2.6 Sample and Recruitment

For the DCE survey, we aimed for a sample of 1000 respondents from the adult general population in Germany recruited by an online panel of a market research agency. To facilitate a representative sample in terms of gender, age groups, educational level, and region (16 federal states in Germany), quota-based sampling was applied based on German official statistics [26].

For the cTTO interviews, we aimed for 200 respondents from the adult general population [20]. A convenience sample controlled in terms of gender and age groups was recruited. Respondents came from Bielefeld and surrounding areas and were mostly recruited by study team members as well as through advertisements in local newspapers. The interviews were conducted mainly at Bielefeld University. A small number of participants came from other German regions as some interviews were conducted online. All interview respondents received a €20 voucher.

2.7 Quality Control

To identify non-engaged respondents in the DCE online survey, two QC criteria were applied. First, we included three fixed dominant pairs in which one health state logically dominated the other. This enabled us to check choices for rationality and to identify respondents who seemed to have a low level of attentiveness to, engagement with, or understanding of the tasks or made irrational choices. The dominant pairs were presented at the beginning and end of the DCE tasks and at a random position in the middle. Respondents were excluded if they gave a wrong answer to at least two of the three dominant pairs. The dominant pairs were excluded from the modelling analysis. Second, the QC procedure included a time criterion. Respondents were also excluded if they spent less than 150 seconds on all DCE tasks. We assumed that ‘speeders’, who finished the online survey too quickly, did not consider the health states in detail.

For the cTTO interviews, we applied the QC process established by the EuroQol Group for EQ-5D-5L valuation studies [27]. Poor interview quality is indicated by

  • a short amount of time spent in the wheelchair example to explain cTTO tasks,

  • missing explanation for the ‘worse than dead’ task (lead-time TTO),

  • less than 5 minutes spent on all cTTO tasks, or

  • obvious inconsistency in the cTTO ratings when the value of 33333 is not the lowest or at least 0.5 higher than the state with the lowest value.

If interviews had continuously poor quality, data were excluded from the analysis. Further, the interview included a feedback module by which each respondent was presented with the rank ordering implied by their cTTO valuations. Respondents were asked to review their responses and to flag any health state they felt should be reconsidered. These states could not be re-valued but were excluded from modelling [28, 29].

2.8 Data Analysis

Descriptive analyses were used to examine the sample characteristics and the responses to the cTTO. Sample characteristics with regard to experience with illness and self-reported HRQoL were compared with the characteristics of the representative adult sample of the German EQ-5D-5L valuation study [30]. The DCE data were analysed using choice models under a random utility framework with a linear, additive utility function, as in Eq. (1):

$$\begin{array}{l}{V}_{j}={\beta }_{1}MO2+{\beta }_{2}MO3+{\beta }_{3}SC2+{\beta }_{4}SC3+{\beta }_{5}UA2+{\beta }_{6}UA3+{\beta }_{7}PD2+{\beta }_{8}PD3+{\beta }_{9}AD2+{\beta }_{10}AD3. \end{array}$$

The ten independent variables are made up of two variables for each EQ-5D-Y dimension, representing the two levels beyond level 1 (‘no problems’; the reference category). The coefficients therefore indicate the decrement from level 1 to the respective level.

A mixed logit model specification was chosen, given the a priori expectation that there would be unobservable random preference heterogeneity in the data and that multinomial logit models cannot account for such heterogeneity [31]. In this model, each of the ten parameters were modelled as random and normally distributed using 5000 Halton draws. Coefficients from the model were transformed into relative attribute importance (RAI) scores to aid interpretation; these were obtained by dividing the utility range for each attribute by the total utility range.

To produce the value set, the coefficients from the mixed logit model need to be anchored onto the scale of full health to dead. There are several different anchoring approaches, including rescaling based on the mean value of the worst health state (33333 rescaling), mapping the DCE data onto the cTTO data (mapping), and hybrid modelling [19, 32]. Earlier studies noted that the ratio of the cTTO and DCE data is not well balanced in the EQ-5D-Y valuation protocol, so the performance of the hybrid model may be suboptimal [17]. Of the remaining two approaches, mapping takes into account the mean values of all ten health states valued in the cTTO task, relative to one with 33333 rescaling. Therefore, we chose mapping as the preferred anchoring method. Specifically, we mapped the DCE data onto mean cTTO values, which were adjusted for censoring at − 1 (obtained by estimating Tobit models for each state). This adjustment was deemed appropriate given that the cTTO task does not allow for utilities below − 1.

A range of specifications for the mapping model were examined, including linear models with and without constants, as well as the inclusion of a quadratic term. Based on a combination of parameter significance, adjusted R-squared values, and the alignment between cTTO values and the resulting value sets, the preferred specification was a linear function without a constant, as in Eq. (2):

$${\text{cTTO}}_{i} = { }\beta \left( {{\text{DCE}}_{i} } \right) + \varepsilon_{i} ,$$

where cTTOi is the adjusted mean cTTO utility and DCEi is the latent scale DCE utility for ith health state (1 ≤ i ≤ 10). The estimated β was used to rescale the latent scale DCE values from the mixed logit model.

To compare results by parental status, we split the two samples based on responses to the question “Do you or have you ever had primary responsibility for a child (as a birth parent, foster parent, adoptive parent, or similar)?” Respondents who answered “yes” were classified as having parental experience, referred to as ‘parents’, and those answering “no” were referred to as ‘non-parents’. For the DCE responses, we compared conditional logit model results by parental status. For cTTO, we compared the adjusted mean values of the ten cTTO health states for both groups. Value sets by parental status were also estimated and compared.

All statistical analyses were performed using Stata 15.

3 Results

3.1 Sample Characteristics

In total, 1030 respondents completed the DCE survey with appropriate data quality (309 failed QC: 277 because of timing and 32 because of dominant pairs). The DCE sample is representative for the German general population aged ≥18 years with respect to gender, age groups, educational level, and region. Table 1 shows marginal proportional differences. A comparison of characteristics of included and excluded DCE respondents is presented in Table S1.1 in the electronic supplementary material (ESM). A total of 215 respondents completed the cTTO interviews. The cTTO sample underrepresents male respondents, respondents aged ≥ 70 years, and lower and middle educated respondents, whereas respondents aged 18–29 years are slightly overrepresented (see Table 1). In particular, higher educated people are overrepresented.

Table 1 Sample characteristics

As Table 2 illustrates, the cTTO and DCE samples differ in terms of respondents’ experiences with severe illness and respondents’ HRQoL. While the DCE sample contained a higher proportion of respondents who had experienced severe illness themselves than did the cTTO sample (37.7 vs. 22.3%, respectively), the proportion of respondents that had experience with severe illness in terms of other people that they had cared for was higher in the cTTO sample than in the DCE sample (30.7 vs. 14.7%, respectively). The reported problems on EQ-5D-5L and in the mean visual analogue scale (VAS) value show that the cTTO sample reported fewer health problems than the DCE sample. However, the DCE sample corresponds better with the self-reported health of the German adult general population [30].

Table 2 Respondents’ experiences with severe illness and health-related quality of life

3.2 Modelling

In the feedback module, 13.77% of cTTO responses (n = 296) were removed by respondents. The following results include all cTTO valuations after the feedback module (2150–296 = 1854 observations). The mean cTTO values ranged from − 0.260 for health state 33333 to 0.970 for health state 21111 (Table 3). For the adjusted cTTO data, the value for health state 33333 was − 0.350.

Table 3 Composite time trade-off results

The coefficients from the mixed logit model, the RAI scores, and the rescaled coefficients (the value set) are shown in Table 4. The results from the mapping model that were used to create the value set can be found in Table S2.1 in the ESM. The predicted values ranged from − 0.283 (for 33333) to 1 (for 11111). The preference ranking, from most to least important, of the dimensions was as follows: (1) PD, (2) AD, (3) UA, (4) SC, and (5) MO. The utility decrements for a movement from MO1 to MO2 and from SC1 to SC2 were particularly small at approximately 0.02. In contrast, the decrement for a movement from PD1 to PD2 was approximately 0.13.

Table 4 Modelling results for the German EQ-5D-Y value set

Applying the value set, EQ-5D-Y health state utilities can be estimated by subtracting the relevant decrement for each problem on each dimension from 1. For example, the predicted EQ-5D-Y index value for health state 22233 can be calculated as follows:

$$U(22233) \, = \, 1 - 0.0242 - 0.0191 - 0.0837 - 0.4190 - 0.4019 \, = \, 0.0521.$$

The symptomatic dimensions (PD and AD) had similar RAI scores, with each about 30%, whereas functional dimensions (MO, SC, and UA) had far lower RAI scores, ranging from 9.2% for MO to 15.5% for UA (Table 4). The decrements of the two symptomatic dimensions, PD and AD, were also similar, with PD having the greatest overall impact. The only dimension with linear utility decrements by level was UA, with larger utility decrements occurring between level 2 and 3 for each of the other dimensions, compared with the decrement between levels 1 and 2 (Fig. 1).

Fig. 1
figure 1

Utility decrements of German EQ-5D-Y value set. AD feeling worried/sad/unhappy, MO mobility, PD having pain/discomfort, SC looking after myself, UA doing usual activities

3.3 Subgroup Analysis: Parental Status

The DCE and cTTO samples showed similar proportions of respondents reporting responsibility for children either at present or in the past (55–56% answering ‘yes’). Demographics differed between parents and non-parents (Table S3.1 in the ESM). For example, non-parents were typically younger, and a higher proportion had high education levels (or were still in education). The DCE results did not differ substantially between these two groups (Table S3.2 in the ESM). Non-parents had a slightly stronger preference for MO and a weaker preference for PD; however, the difference in RAI scores was only 1.4 pp (percentage points) in both cases. Figure 2 illustrates the adjusted mean utilities from the cTTO task for each health state in both subgroups. Mean utilities were always greater for parents than for non-parents. However, only three of the mean differences were statistically significant (two mild and one moderate state). When generating two separate value sets based on data from parents and non-parents, the value sets had significantly different scales (Table S3.3 and Fig. S3.1 in the ESM): the value for 33333 for non-parents was − 0.210 compared with − 0.358 for parents.

Fig. 2
figure 2

Comparison of adjusted mean utilities from the composite time trade-off task, by parental status. Mean differences; ***p < 0.01; **p < 0.05; *p < 0.1

4 Discussion

The EQ-5D-Y valuation study in Germany was one of the first studies to develop an EQ-5D-Y value set. We applied the recently published valuation protocol and obtained health state preferences using a combination of DCE and cTTO. The value set was modelled using a mixed logit model, and the latent DCE coefficients were anchored using a linear mapping approach. The developed German EQ-5D-Y value set can be applied alongside the EQ-5D-Y descriptive system, which is appropriate for use in children and adolescents aged 8–15 years (self-report) and children aged 4–7 years (proxy report); in special cases, the instrument can also be used for adolescents aged 16 or 17 years [12]. Indeed, there is discussion on whether values derived considering a 10-year-old in the valuation tasks are suitable for the whole age range of the instrument. A recently published qualitative study indicated that health state preferences for a 10-year-old child might not be representative for the full EQ-5D-Y age range [33], whereas quantitate studies did not find significantly different values when different age descriptions were used [34, 35].

The results illustrate that the German adult general public considers PD as the most important dimension for children and adolescents, followed by AD. Of the functional dimensions, UA had the highest decrements for level 2 and 3 compared with the two other dimensions (MO and SC). Overall, people consider it important that children and adolescents have no pain/discomfort, are not worried/sad/unhappy, and can do their usual activities without any limitations, as children without health problems do.

The same ordering of the three most important dimensions was observed in the Slovenian EQ-5D-Y valuation study [16]. However, the level decrements in all dimensions are higher in the Slovenian than the German EQ-5D-Y value set, with the exception of AD, where decrements are similar. Therefore, the value range is larger in Slovenia (− 0.691 to 1) than in Germany (− 0.283 to 1). Nevertheless, this comparison is limited because Prevolnik Rupel et al. [16] used the weighted censored average value of the worst health state 33333 for anchoring, whereas the means of all ten cTTO health states were used for rescaling in Germany. According to the recently published Spanish EQ-5D-Y value set, PD and AD were the two most important dimensions, followed by MO, which differs from the results in Germany and Slovenia. The value range in Spain is relatively large (− 0.539 to 1) and therefore more comparable to the Slovenian than to the German value set [36]. In terms of comparability of the value sets, it is worth noting that the Spanish and German EQ-5D-Y value sets were modelled differently [36]. The Japanese EQ-5D-Y value set differs from all other EQ-5D-Y value sets as the value range is particularly narrow (0.28–1) and the coefficients are accordingly much smaller. In particular, the level 3 decrements are smaller than those of the German EQ-5D-Y value set. The Japanese team deviated from the protocol by including 26 health states in the cTTO exercise, which also limits comparability [17]. Furthermore, there might be cultural differences in the context of valuing child health states between countries that influence the resulting EQ-5D-Y value sets. Notably, the Japanese values for the adult EQ-5D-5L instrument were higher than those of European EQ-5D-5L value sets [17, 37].

When comparing the German EQ-5D-5L [30] and EQ-5D-Y value sets, it is notable that the latter has a smaller value range and that single decrements per level differ. However, one similarity can be observed: the dimensions with the highest decrements are PD and AD. More detailed comparison is limited because of the different numbers of severity levels between the instruments, the different wording used in the adult- and youth-specific instruments, and the different valuation methods and modelling approaches [18, 30].

With the establishment of the valuation protocol, more EQ-5D-Y value sets will be produced in the future, and the influence of using different value sets for children and adolescents and adults in CUA will need to be further explored. There are no guidelines from international agencies on using youth-specific preference-based measures, and there are concerns about how to use youth-specific measures alongside adult measures or how to combine and/or compare these utilities [6].

When comparing results by parental status, mean cTTO values were always greater for parents than for non-parents. These differences were only statistically significant for a few health states, although this may partly be explained by the high variation in values for some states (particularly severe states) and the relatively small subgroup sizes. The observed differences are in line with earlier studies [9, 38]. Matza et al. [9] also found that parents were less willing to trade within TTO tasks, so parents’ responses revealed higher values than those of non-parents. Hartman and Craig [8] explored health state values for children using a DCE with a time component, showing that parents preferred a longer lifespan instead of a longer time in healthier states compared with non-parents. If the time component is the key driver of differences between parents and non-parents, this may explain why the DCE results did not differ substantially between these two groups in our study. As noted by Powell et al. [39], future valuation studies may benefit from being representative in relation to parental status given the potential impact on preferences when valuing child health. Our results indicate that this representativeness should apply to both the DCE and the cTTO samples.

This study has some limitations. The DCE sample is nationally representative in terms of gender, age groups, education, and region but not necessarily in terms of other variables. Furthermore, there is evidence of a tendency for low-level engagement and random responses when DCEs are administered online [40]. We attempted to address this issue by including the QC criteria, but we cannot be entirely sure that the sample consists of only individuals who were fully engaged in the task. However, there is debate within the literature about whether respondents with ‘irrational’ responses should be excluded from analyses [41]. Furthermore, a convenience cTTO sample was recruited (rather than a nationally representative sample), and highly educated respondents were overrepresented, which might have influenced the results. Furthermore, the latter portion of the cTTO data collection was affected by the COVID-19 pandemic. Most of the interviews were conducted before the pandemic outbreak, but 32 respondents were interviewed after the lockdown from March to May 2020. These respondents may have had slightly different preferences to the other respondents because of the pandemic. Additionally, the later interviews were online/video interviews, which may also have influenced values, although online interviews have been shown to be feasible, with acceptable data quality [42]. Moreover, demographics of the parent and non-parent groups differed, which might have affected health state valuations and the differences found between the two groups. However, differences were as typically expected between these two groups, and it is not possible to disentangle the effects. Additionally, this subgroup analysis was not explicitly considered at the design stage, nor is it part of the EQ-5D-Y valuation protocol, so it was not factored into the experimental design.

It is worth highlighting that there is ongoing discussion on the most appropriate way to value youth health states and analyse valuation data. The EQ-5D-Y valuation protocol represents an initial set of recommendations for achieving this goal but it can (and likely will) be updated over time as further research is conducted (including EQ-5D-Y valuation studies such as this).

5 Conclusion

The German EQ-5D-Y valuation study was one of the first studies to apply the recently published EQ-5D-Y valuation protocol. It confirms that the development of EQ-5D-Y value sets using the methods set out in the protocol is feasible. The results of the EQ-5D-Y valuation study in Germany show that the adult general population considers PD and AD to be the most important EQ-5D-Y health dimensions for children and adolescents. The availability of a German EQ-5D-Y value set enables a preference-based HRQoL measurement in children and adolescents in Germany and therefore enables the instrument to be used in economic evaluations, mainly CUAs, of paediatric healthcare interventions. Furthermore, the value set may also prove useful in other contexts (e.g., clinical contexts) in which summarising HRQoL into a single summary score would be helpful.