Introduction

Cost-utility analyses (CUAs) are an essential source of information for rational decision-making in resource allocation in health care. Their primary outcome is quality adjusted life years (QALYs), a parameter which integrates survival time and the “value” of a specific health state: the health utility. Health utilities are cardinal values that represent an individual’s preferences for specific health states, with “0” considered equivalent to death and “1” reflecting perfect health. Among the different preference-based methods used to obtain utilities, multi-attribute utility instruments (MAUIs) are popular. Well-known and frequently used MAUIs include the EQ-5D [1, 2] and the SF-6D [3]. Like most MAUIs, these are generic, i.e. they cover very general health aspects (such as mobility, pain, or self-care) which makes them applicable to a broad range of different health conditions. This breadth is valuable, but may come at the price of missing specific health issues relevant to certain conditions, such as nausea and vomiting in cancer [4, 5]. However, utility instruments using a disease-specific health state description system are relatively scarce.

Non-preference-based disease-specific quality of life (QOL) measures are widely used in clinical research. By definition, they include aspects of health relevant to a certain disease, which arguably may be more sensitive to clinical changes [5]. However, they do not allow utility scoring, and therefore cannot be used in cost-utility analysis.

The recent development of the EORTC Quality of Life Utility-Core 10 Dimensions (QLU-C10D) [6] contributes to closing this gap, as it is a MAUI based on the EORTC Quality of Life Questionnaire Core 30 (QLQ-C30) [7], the most widely used QOL profile measure in clinical oncology research [8]. It helps to overcome the lack of disease-specific utility instruments in the field of oncology by providing a health state classification system and utility algorithm using 13 key items of the parent instrument, covering 10 QOL dimensions.

QLU-C10D valuations are currently being performed for a range of countries and tariffs have already been published for Australia [9], Canada [10], Germany [11], and the UK [12]. In the present study, we aim to determine the utility weights for health states of the Austrian, Italian, and Polish version of the QLU-C10D. Furthermore, we descriptively compare the utilities of the three countries for selected QLU-C10D health states.

Methods

For QLU-C10 valuations a standardised survey has been developed consisting of a discrete choice experiment (DCE) for the valuation of health states as its core element, and of feedback questions on the participants’ experience of the DCE, self-report instruments on QOL (EORTC QLQ-C30, EQ-5D) and distress (Kessler K-10), and questions on basic socio-demographic and basic clinical information [13]. Before adopting the approach as method of choice for QLU-C10D valuations, feasibility and reliability have been established using quantitative and qualitative methods. The wording of the DCE tasks and different layouts have been pre-tested in general population respondents which showed that although perceived difficult the tasks were considered manageable [14]. Utility weights resulting from the DCE showed to be unbiased by the ordering of attributes in the DCE [15] and stable within respondents over time [16].

The QLU-C10D descriptive system [6] consist of 13 of the 30 items of the parent instrument EORTC QLQ-C30 [17] covering the 10 QOL domains physical functioning, role functioning, social functioning, emotional functioning, pain, fatigue, sleep disturbances, appetite loss, nausea, and bowel problems. Each can take on 4 levels from the best level “not at all” (coded 1) to the worst level “very much” (coded 4) (see Table 1 for entire health state classification system). For example a health state with “very much” problems in physical and role functioning and “no problems” on other QLU-C10D domains would be coded 4411111111. The combination of domain and level therefore is able to describe a total of 410 = 1,048,576 unique health states. These are the health states for which preferences need to be obtained in the valuation using the DCE.

Table 1 Health state classification system of the QLU-C10D

The DCE asks respondents to choose between pairs of (hypothetical) health states which are described by 11 attributes: the 10 domains of the QLU-C10D (in a randomised order across participants to control for any potential dimension ordering effect) and a survival time of 1, 2, 5, or 10 years. These durations were selected to be plausible to most respondents, with enough spread to ensure discrimination between them. Each respondent is presented with 16 binary choice sets randomly selected from a choice of 960 sets. Only five attributes differ between the health states in each choice set to minimise the cognitive burden. An example choice set as presented to respondents is shown in Fig. 1 for details on the DCE design please refer to King et al. (2018) [9].

Fig. 1
figure 1

English example choice set for discrete choice experiment valuation task

As with previous EORTC QLU-C10D valuations in other countries, recruitment and survey administration were performed web-based by survey engine (www.surveyengine.com), a survey company specialising in choice experiments. The survey was sent out as a weblink for the respondents to complete at their leisure. The approached potential respondents were members of an online panel of persons willing to complete surveys for a small payment. Survey Engine and its panel providers comply with the International Code on Market, Opinion and Social Research and Data Analytics (www.esomar.org).

In each country, we aimed to recruit 1000 respondents from the general population, aged between 18 and 80. Quota sampling by age and sex was applied to ensure that these variables were representative. The representativeness of educational level, marital status, and chronic disease (yes/no) were checked a posteriori by comparison with national census data. Data collection for all countries were performed in the years 2016–2017.

For the present study, validated translations of all required questionnaires were available. Also the entire Austrian survey was already available in German (see [11]). With regard to the QLU-C10D health state description system attribute descriptions were already available in all languages as they were taken from the parent instrument QLQ-C30. The respective questions are in past tense and were changed into a statement in the present tense (“Did you feel tired?” to “You feel tired.”). This and the translation of the remaining survey text (section headings and brief section summaries, DCE task descriptions, thank you notes) was contracted to translators, forward and backward translations were performed including in-country persons and pilot testing in convenience samples of 3–5 persons were conducted.

It has to be noted that for Austria, a revised German response format of the QLQ-C30 was used which has also been used for QLU-C10D valuations in Germany [11]. The reason for revision was that the original German wording for the category “quite a bit” (“mäßig”) is suspected to express a lower severity than the English version [18] and a revised German wording for this response level (“ziemlich”) is currently being investigated within the EORTC QLG [19]. The new translation appears to be a closer approximation of the severity level expressed by “quite a bit” (personal communication with study PI) but results are not yet published. The health states for QLU-C10D valuations for Austria are already based on the adapted German version, and consequently the utility weights presented here for Austria are valid for this version, which is abbreviated as “QLU-C10D Austria V2”.

Utilities score across countries were descriptively compared across a selection of QLU-C10D health states covering a continuum between best (i.e. 1111111111) and worst (i.e. 4444444444).

Statistical methods

Representativeness and feedback

Comparisons of socio-demographic and clinical characteristics with national census data [20,21,22,23,24,25] were performed by Chi-square tests. Feedback questions were analysed by descriptive statistical methods.

Utility estimation

Country-specific utility weights for the QLU-C10D were determined by conditional logistic regression using the method proposed by Bansback et al. (2012) [26]. The basic model for the utility of option j (scenario A or B) in choice set s for respondent i is given by

$$U_{isj} = \alpha TIME_{isj} + \beta X^{\prime}_{isj} TIME_{isj} + \varepsilon_{isj}$$

where TIMEisj is the survival time presented in option j and X’isj is a set of dummy variables related to the levels of QOL dimensions in the corresponding health state. The errors εisj were assumed to be independent and identically Gumbel distributed. The parameters α (scalar) and β (vector) were estimated by conditional logistic regression. To allow for within-subject correlations across different choice sets, a random subject-level term was included in the model using generalised estimation equation (GEE) models with first-order autoregressive covariance structure and a logit link function; this procedure gave rise to almost identical results as the conditional logistic regression analysis approach with a clustered sandwich estimator used in a previous study, implemented using STATA for the Australian QLU-C10D valuation data [9]. For QOL domains in which coefficients for levels did not show a monotonically increasing pattern with increasing severity levels, non-monotonic levels were combined; this is a common approach [3, 9]. All adjustments were conducted at once based on the raw coefficients; we checked the new results for potential non-monotonicities, but none were left. GEE model coefficients were then converted into utility decrements consisting of the ratio of the health state parameters b and the time coefficient a to reflect the trade-off between health-related QOL and length of life [26].

Statistical analyses were run using SPSS v24 and Stata v13.

Power considerations

Sample size determination was based on the confidence interval (CI) for the estimated utility decrements. Building on the findings of King et al. (2018) [9] and allowing for the possibility of a slightly larger spread due to a more heterogeneous response pattern (factor 1.2), the half-length d of the 95% CIs for the utility decrements for samples of size N = 1,000 was estimated to be < 0.05 ([u – d, u + d] with d ≤ 0.05. This sample size is towards the higher end of the spectrum of sample sizes used so far in DCEs [27].

Results

Sample characteristics

Complete cases and dropouts

An N of 1000 was reached in each country within a period of 2 months. Figure 2 shows the flow diagram of respondents who entered the survey and the number of dropouts in each section.

Fig. 2
figure 2

Flow diagram of number of respondents and dropouts per completed survey section and country

Socio-demographic and clinical variables, representativeness, and feedback

An overview of the distribution of socio-demographic and clinical variables in the valuation samples is given in Table 2. The proportion of respondents with a high educational level was significantly larger in the valuation sample compared to census data (p < 0.01, χ2 > 6.7). In all countries the majority of the respondents regarded the presentation of the DCE as clear/very clear (70% Austria, 73% Italy, 58% Poland), indifferent ratings were more frequent in polish respondents (29% vs 17% in Austria and 18% in Italy) and the percentages of those considering it unclear were very similar between 9 and 13%. Up to half of the respondents with (50% Austria, 36% Italy, 36% Poland) considered it difficult to choose between the health states and 26%—31% found it to be easy. Details of the feedback are provided in Figure A1 (Online Resource 1).

Table 2 Distribution of socio-demographic and clinical characteristics and comparison with national statistics

Utility estimates for QLU-C10D Austria V2, Italy, and Poland

Decrements were largely monotonic within each QOL dimension, i.e. a higher impairment level was associated with a higher utility decrement. Any movement away from the response category “not at all” was associated with negative utility except for social functioning in Austria and Poland and emotional functioning in Austria and Italy. Non-monotonicity was observed in three domains: lack of appetite (all countries), fatigue (Poland, Italy), and sleep disturbances (Italy) but none was statistically significant. For final utility scoring the values have been monotonicity-adjusted and are provided in Table 3 for all countries. Table A2 (Online Resource 2) provides the unadjusted model raw scores.

Table 3 QLU-C10D utility weights (decrements)

The largest decrements in Austria were found for physical functioning, followed by pain and role functioning. Among the cancer-specific symptoms, nausea received the highest decrement followed by bowel problems. The worst possible health state is − 0.111.

In Italy, the largest decrements were likewise found for physical functioning, then pain and role functioning, very closely followed by emotional functioning. The largest cancer-specific decrements were found for nausea and fatigue. The worst possible health state is 0.025.

In Poland, the highest decrements were again found for physical functioning, then role functioning, followed by pain; and the highest cancer-specific decrements were nausea and bowel problems. The worst possible health state is 0.048.

To get an overall impression of differences between countries on the utility level we compared index scores across a selection of unique QLU-C10D health states including best (i.e. 1111111111) and worst (i.e. 4444444444) health, some states with some mild and moderate impairments (1112211111, 3321111112, 2221122311, 3132123123) and some severe impairment (3332221144, 4444433211). It can be seen in Fig. 3 that utilities vary between countries for moderately and highly impaired health states and that for this spectrum of continuum Austrian utilities seem systematically lower.

Fig. 3
figure 3

QLU-C10D utilities across countries for different health states

QLU-C10D utility calculation for Austria, Italy, and Poland

For the calculation of QLU-C10D utility scores responses to the respective QLQ-C30 responses are converted into QLU-C10D levels (see Table 1) and attached with the monotonicity-adjusted decrements presented in Table 3 and Fig. 4. The decrements for each level on each domain are subtracted from 1 to obtain the final utility score. For instance, a health state with very much problems with Role Functioning (level 4), moderate problems with Social Functioning (level 3), a little Fatigue (level 2), and no problems on other dimensions (level 1) would be coded 1431112111 and result in a utility score of 1-(0.138 + 0.072 + 0.028) = 0.762 in Austria, a utility score of 1-(0.119 + 0.041 + 0.013) = 0.853 in Italy, and a utility score of 1-(0.196 + 0.008 + 0.012) = 0.784 in Poland.

Fig. 4
figure 4

Utility decrements for the QLU-C10D versions Austria V2 (response category “ziemlich” instead of “mäßig”), Italy, and Poland

Tables A3-A5 (Online Resources 3–5) provides an SPSS syntax code to implement the scoring.

Discussion

The major advantage of the EORTC QLU-C10D is that it is based on the EORTC QLQ-C30, and therefore, cancer-specific utilities can be determined using data previously or prospectively collected with the QLQ-C30 in addition to its traditional QOL profile scoring.

The current study provides value sets for Austria, Italy, and Poland. In all three countries, respective guidelines consider QALYs to be one of the preferred outcome measures in economic evaluations [28]. Normally, generic instruments are used to obtain utilities for QALY calculation as a result of conceptual considerations; however, disease-specific measures can be more sensitive to clinical differences in the respective diseases [5]. Whether this assumption holds true for the QLU-C10D requires clinical validity evaluation. An indication for the potential relevance of cancer-specific symptoms are the decrements found for nausea, bowel problems and fatigue which are consistent with the Australian QLU-C10D valuation [9] as well as those in Germany, the UK [12], and Canada [10].

In overall, we found that the impact of some QOL dimensions on QLU-C10D utilities appeared to differ across countries. This was especially true for emotional functioning, for which decrements were clearly higher in Italy than in Austria and Poland. Austrian utilities tended to be systematically lower in moderately and severely impaired health states.

Possible explanations of observed differences between countries may include different health care systems or culture-specific attitudes towards health, which could impact the willingness to trade-off life time for QOL. A possible methodological cause may arise from slight differences in meaning through translation into different languages. Although the EORTC follows a rigorous translation procedures, there is some tangential evidence for differential item function (DIF) analyses conducted by Scott et al. [18, 29, 30], who found that some QLQ-C30 items functioned slightly differently in a range of countries compared to the original English version. Region-related differences in how items function have also been found in other utility instruments, such as the EQ-5D [31, 32] and the SF-36 [33]. If utility differences do arise from differences in meaning as a result of translation, then they may very well disappear when using country-specific weights on the respective national data, which would be subject to the same translation effect. In fact, combined evidence of investigations by Scott et al. do suggest an important role of the lack of translation equivalence but also cultural DIF cannot be excluded. Further investigation on these issues is warranted, especially on the relative contribution of translation versus real differences in culture-specific attitudes towards health. Although these are complex topics to research, they are the key to truly understanding the inter-country differences we also have observed in the valuations presented here. A standardised and evaluated QLU-C10D valuation methodology is being used we will in future be able study utility differences between countries in a more sophisticated way.

A potential limitation of our study is that levels of education in the samples were slightly higher than in the respective general populations. The same was observed in the Australian valuation study [9] and is typical for online panels [34]. However, this would only be a problem if health valuation differed by education level. There is evidence that education level does not affect health valuation in different time trade of tasks [35, 36], however, this has not yet been investigated for DCEs. We intend to explore this with our data, but that is beyond the scope of the current paper.

The results from debriefing questions at the end of the valuation surveys indicated that clarity was not an issue in the vast majority of respondents. Task difficulty ratings do not raise severe concerns but require more caution in interpretation. If DCE tasks are too difficult or too easy they might be countering the required trade-off. The literature on the perceived difficulty of making decisions in health DCEs is somewhat scarce. The numbers we found compare well to the results from the QLU-C10D feasibility study [14] and are well within the range of what has been reported and considered acceptable in the literature so far. Mulhern et al. 2016 directly compared a DCE with a time-trade-off (TTO) for the EQ-5D-5L resulting in 57% considering the DCE tasks difficult to answer and 63% considering the TTO tasks difficult to answer [37]. In a study by Norman et al. (2013), a much lower number of 11% of respondents rated tasks of a DCE for the valuation of EQ-5D-5L to be either difficult or very difficult [38]. Other examples of DCEs in the context of health are Skedgel et al. (2013) who investigated societal preferences for the allocation of health care resources an found 65% to rate the presented DCE questions somewhat or extremely difficult to answer [39], and Green and Gerard (2009), likewise a societal DCE, where as many as 68% considered it fairly difficult or very difficult to complete the DCE tasks [40].

We conclude that the QLU-C10D enables the incorporation of QOL data collected via the QLQ-C30 into economic evaluation in Europe. This is especially important when QOL is a significant outcome of an investigated health intervention. Based on our results, we advise the use of country-specific QLU-C10D value sets for the evaluation of treatment effects whenever possible. In line with a recent suggestion recently also made for the EQ-5D [41], we advocate further investigation and discussion of the compatibility of translations and value sets, especially when used in multinational clinical studies. Future research will show how QLU-C10D cancer-specific utility values compare to generic ones in terms of sensitivity and responsiveness to clinical differences, and whether the choice of instrument will impact cost-utility ratios.