FormalPara Key Points for Decision Makers

Small differences are observed between the Quality-of-Life Utility Measure–Core 10 dimensions (QLU-C10D) and the EQ-5D-3L in metastatic melanoma.

The differences observed between instruments do not translate into difference in cost-effectiveness once the quality-of-life estimates are incorporated into a cost-utility analysis (CUA) model.

Utilities drawn from the EQ-5D-3L and QLU-C10D tools may be different, but the choice of one over the other may make little difference to CUAs.

1 Introduction

A common area of discussion, particularly during the reimbursement of new technologies, relates to whether a condition-specific quality-of-life (QoL) instrument used to derive utilities delivers similar cost-effectiveness results compared to the use of a generic instrument, such as the EQ-5D [1]. The EQ-5D is considered insensitive to changes in health status in cancer patients by some researchers [2], who advocate the use of condition-specific measures instead because they capture the disutility associated with treatment-related adverse events. Brazier et al. [3] found lower ceiling effects for condition-specific preference based measures compared to the EQ-5D. However, other studies have found that condition-specific measures like the Functional Assessment of Cancer Therapy (FACT) underestimate benefit in terms of quality-adjusted life years (QALYs) gained compared to the EQ-5D in patients with advanced cancer [4], while some have found that the EQ-5D and condition-specific measure (EORTC-8D) are equally sensitive to disease characteristics among cancer patients [1].

The EQ-5D is a common tool for capturing QoL utilities in clinical trials [5] and is accepted by health technology assessment (HTA) agencies around the world, including the Australian Pharmaceutical Benefits Advisory Committee (PBAC) [6] and the National Institute for Health and Care Excellence (NICE) in the UK [7]. Proponents of the EQ-5D argue that non-generic instruments reduce the comparability between technology assessments across different indications and, therefore, as advocated by NICE, have a preference for using the EQ-5D to estimate QALYs [8]. According to the NICE technical support documents, condition-specific instruments should only be used when the EQ-5D is not available or not appropriate [9, 10]. Other agencies, such as the PBAC, have taken a less prescriptive approach to which utility instrument should be used [6].

As expected, based on this guidance, many HTAs are performed using the EQ-5D even if it has not been demonstrated to be appropriate for the specific condition of interest. This is the case for metastatic melanoma, which was the first condition for which the new generation of immunotherapies were granted reimbursement by the PBAC and NICE [11].

The QLQ-C30 is one of the most widely used condition-specific questionnaires used in cancer studies [12]. King et al. [13] developed a cancer specific multi-attributed utility index (MAUI) based on the QLQ-C30 called the Quality-of-Life Utility Measure–Core 10 dimensions (QLU-C10D). The QLU-C10D may be more sensitive than generic MAUIs such as the EQ-5D due to the fact that it contains symptoms and adverse events commonly experienced by cancer patients. The first set of utility weights was published in 2018 for Australian cancer patients [14].

Using QoL data from the CheckMate-066 trial, this study aimed to compare the generic EQ-5D-3L instrument to the condition-specific QLU-C10D by applying both sets of values in a published cost-utility analysis (CUA) evaluating immunotherapy for the treatment of metastatic melanoma.

2 Methods

2.1 Cost-Utility Analysis of Nivolumab Versus Ipilimumab

2.1.1 Treatments

The treatment of metastatic melanoma has undergone evolution over the last decade, from chemotherapies, such as dacarbazine and fotemustine, to immunotherapies such as ipilimumab and nivolumab. Ipilimumab was the first immunotherapy approved to treat melanoma in 2011 [15]. It is a monoclonal antibody that binds to cytotoxic T lymphocyte-associated protein 4 (CTLA-4) and shifts the immune system balance towards T cell activation, thus increasing the number of activated T cells that can migrate to attack the tumour [16, 17]. Nivolumab is a human immunoglobulin G4 that acts as an immunomodulating agent by blocking the interaction between programmed cell death protein 1 (PD-1) and its ligands. This results in the activation of T cells and cell-mediated immune responses against tumour cells or pathogens [18].

2.1.2 Clinical Trial Data

The Food and Drug Administration (FDA) phase III clinical registration trial (CheckMate-066) randomly assigned 418 treatment-naïve patients with metastatic melanoma without a BRAF mutation to receive nivolumab or dacarbazine chemotherapy. Nivolumab improved 1-year overall survival (OS), with a hazard ratio of 0.42 (95% confidence interval [CI] 0.25–0.73) compared to chemotherapy. Median progression-free survival (PFS) was 5.1 months (95% CI 3.5–0.8) for nivolumab versus 2.2 months (95% CI 2.1–2.4) for chemotherapy [19]. QoL in CheckMate-066 was measured using the QLQ-C30 and EQ-5D-3L every 6 weeks on treatment and at two follow-up visits [20]. CheckMate-066 predated the EQ-5D-5L instrument, which is why the EQ-5D-3L was used. There were fewer adverse events for nivolumab than for chemotherapy.

2.1.3 Description of the Published Cost-Utility Analysis

A published CUA [21] comparing nivolumab to ipilimumab in an Australian setting was used for the basis of the present study. As no head-to-head evidence was available at the time of the analysis, an indirect comparison of nivolumab versus ipilimumab using data from CheckMate-066 (CA209066—nivolumab vs dacarbazine) [19] and trial MDX010-020 (ipilimumab vs gp100) [22] was undertaken to estimate the efficacy of nivolumab compared to ipilimumab. Efficacy, toxicity, and QoL (i.e. the EQ-5D-3L) were modelled over a 10-year period using a three-state Markov model with progression-free disease, progressive disease, and death as health states. PFS and OS were extrapolated from CheckMate-066 using lognormal distributions. Utility was estimated using the whole trial population regardless of treatment, and a discount rate of 5% per annum was applied to utilities and costs. A probabilistic sensitivity analysis (PSA) was performed, assigning probability distributions to key model parameters using a Monte-Carlo simulation with 10,000 iterations. Compared to ipilimumab, nivolumab yielded an additional 1.30 QALYs at an approximate incremental cost of US$39,000. The incremental cost effectiveness ratio (ICER) was US$30,475 per QALY gained.

2.2 Comparison of EQ-5D-3L TTO and EQ-5D-3L DCE Versus QLU-C10D

QoL in CheckMate-066 was measured using the QLQ-C30 and EQ-5D-3L every 6 weeks on treatment and at two follow-up visits [20]. QLU-C10D utility data were calculated by applying Australian weights derived from a discrete choice experiment (DCE) published by King et al. [14] to relevant parts of the QLQ-C30 from the individual patient data of CheckMate066. Similarly, Australian weights were applied to the EQ-5D-3L using weights from two different types of valuation study: a time trade-off (TTO) study by Viney et al. [23] and a DCE study also by Viney et al. [24]. Both weights for TTO and DCE were included as these are routinely used in Australia.

Whether a patient was progression free or had progressive disease was calculated at baseline and at corresponding time points throughout the trial. Health state utilities were examined for the study sample (pooling treatment arms) as well as treatment-specific values.

The long-term QALY gain was modelled for dacarbazine and nivolumab using the state-transition Markov model published by Bohensky et al. [21], described in the previous section. The differences in EQ-5D-3L TTO, EQ-5D-3L DCE, and the QLU-C10D were examined by looking at the total QALYs accumulated for each health state (progression-free and progressive disease). Furthermore, both total and total discounted values are reported. Standard CUA outputs such as ICERs, plots of the cost-effectiveness plane, and the cost-effectiveness acceptability curves were produced to study differences/similarities.

2.3 Statistical Analysis

Descriptive statistics for each utility measure (i.e. EQ-5D-3L TTO, EQ-5D-3L DCE, and the QLU-C10D) such as means and standard deviations (SDs) were used to compare distributions at baseline. Differences between health states (i.e. progression-free and progressive disease) were examined using a paired t test for differences between two samples of continuous data. The intent of the t test was not to draw any firm conclusion with respect to differences, but rather it was an explorative exercise to give an indication of the direction of the difference. Analyses were performed in SAS v9.4 and R on a Windows platform.

2.3.1 Concordance

The association between the EQ-5D-3L and QLU-C10D was quantified by assessing the Pearson correlation coefficient and Lin’s concordance correlation coefficient (CCC) between the two instruments [25].

The correlation was considered weak if the absolute value of the Pearson correlation coefficient was < 0.4, moderate if the absolute value of the Pearson correlation coefficient was 0.4–0.7, and strong if the absolute value of the Pearson correlation coefficient was > 0.7.

Assessment of Lin’s CCC was based on the recommendations of McBride [26]. If the lowest one-sided 95% CI limit for Lin’s CCC is:

  • 0.99, then concordance is considered almost perfect

  • Between 0.95 and 0.99, then there is substantial concordance

  • Between 0.90 and 0.95, then the concordance is moderate

  • < 0.90, then there is poor concordance.

Additionally, scatter plots and quantile-quantile (QQ) plots of the two instruments are used to analyse any potential differences.

3 Results

3.1 Clinical Trial CheckMate-066 Baseline Values

The mean baseline utility values as measured by the QLU-C10D (meanQLU-C10D = 0.744, SDQLU-C10D = 0.219) were not statistically different (p > 0.05) when compared to EQ-5D-3L TTO (meanEQ-5D-3L TTO = 0.735, SDEQ-5D-3L TTO = 0.239) and EQ-5D-3L DCE (meanEQ-5D-3L DCE = 0.742, SDEQ-5D-3L DCE = 0.280).

3.2 Concordance

The Pearson correlation was estimated to be 0.75 (p value < 0.0001, alternative hypothesis: true correlation is not equal to 0) and Lin’s CCC was estimated to be 0.74 (95% CI 0.69–0.79) at baseline.

For change from baseline, a Pearson correlation of 0.43 was observed (p value < 0.0001, alternative hypothesis: true correlation is not equal to 0) and Lin’s CCC was 0.40 (95% CI 0.30–0.48).

Scatter plots and QQ-plots are presented in Fig. 1.

Figure 1
figure 1

Scatter plots and quantile-quantile plots of EQ-5D-3L vs QLU-C10D at baseline and change from baseline. QLU-C10D Quality-of-Life Utility Measure–Core 10 dimensions

3.3 Clinical Trial CheckMate-066 by Health State

The values of EQ-5D-3L TTO and DCE were higher for both the progression-free and progressive states, with differences between the EQ-5D-3L measures and the QLU-C10D ranging from 0.027 for dacarbazine in the progression-free state to 0.075 for the progressive state in the combined cohort (Table 1).

Table 1 EQ-5D-3L vs QLU-C10D by health state

The largest differences were observed for the progressive state when combining the two treatment arms. Furthermore, the minimum utilities measured for the EQ-5D-3L were below the preference of ‘dead’ (i.e. state considered worse than death), ranging from −0.507 to −0.073 compared to the QLU-C10D, where the minimum is above zero (0.122 to 0.137).

3.4 Modelling the QALY Gain Over 10 years: Dacarbazine Versus Nivolumab

QALY gains modelled over a 10-year time horizon are presented in Table 2. The EQ-5D-3L generated higher QALY gains for both the progression-free state (e.g. nivolumabEQ-5D-3L DCE = 1.571 vs nivolumabQLU-C10D = 1.489) as well as the progressive disease state (e.g. nivolumabEQ-5D-3L DCE = 1.361 vs nivolumabQLU-C10D = 1.249) for all measures compared to the QLU-C10D. This resulted in higher total (2.975 vs 2.738) and total discounted QALY gains (2.525 vs 2.324).

Table 2 Modelled quality-adjusted life years over 10 years

The model produced the largest differences between the EQ-5D-3L and QLU-C10D when applying the combined utility measure.

3.5 CUA Ipilimumab Versus Nivolumab

The QALY gain decreased from 1.30 when using the combined utility value for EQ-5D-3L TTO to 1.21 when applying the QLU-C10D (see Table 3). This resulted in a 7.5% [(US$32,748 − US$30,475)/US$30,475] increase in the ICER.

Table 3 Results from cost-effectiveness analysis of ipilimumab vs nivolumab

A smaller decrease of 4.3% [(US$27,638−US$26,491)/US$26,491] was observed for the ICER when treatment-specific estimates were used.

The scatter plots of the PSA (Fig. 2) showed more dispersion when using the QLU-C10D than for the simulation with the EQ-5D-3L. Moreover, higher QALY values for the EQ-5D-3L were observed.

Figure 2
figure 2

Cost-effectiveness plane. DCE discrete choice experiment, QLU-C10D Quality-of-Life Utility MeasureCore 10 dimensions, TTO time trade-off

The cost-effectiveness acceptability curves (Fig. 3) differed only marginally between the two measures.

Figure 3
figure 3

Cost-effectiveness acceptability curves. DCE discrete choice experiment, QLU-C10D Quality-of-Life Utility MeasureCore 10 dimensions, TTO time trade-off

4 Discussion

The aim of the present study was to compare the generic EQ-5D-3L utility measure with the QLU-C10D in metastatic melanoma using a practical, real-world CUA.

The EQ-5D has been criticised for being insensitive to changes in health status of cancer patients due to the limited number of dimensions and levels [2]. However, the responsiveness of the EQ-5D is dependent on condition [27], and to our knowledge, no assessment has been made in metastatic melanoma.

In the present study, the generic EQ-5D-3L valued mean progression-free and progressive health states 5–10% higher than the condition-specific QLU-C10D, with comparable SDs. Furthermore, the EQ-5D-3L was consistently associated with a wider range of utility values (−0.507 to 1) compared to the QLU-C10D (−0.079 to 1).

The lower values and shorter range for the QLU-C10D resulted in 4–8% higher ICERs, thereby indicating that the QLU-C10D might value survival less when compared to the EQ-5D-3L. However, this did not result in a change in the conclusion regarding cost-effectiveness, as there was little difference between the acceptability curves for the two scenarios.

The concordance analysis of the baseline values of the two instruments showed that they were highly correlated, with a Pearson correlation of 0.75. However, the lower 95% CI limit for Lin’s CCC was 0.70, indicating that there was poor concordance between the EQ-5D-3L and QLU-C10D. The scatter plot and QQ plot reveal that there was seemingly concordance between the instruments for utilities from approximately 0.2 until 1 as quantiles of these values fell around the unity line. Quantiles for utilities below 0.2 were consistently lower for the EQ-5D-3L compared to the QLU-C10D. Change from baseline showed a moderate correlation of 0.42 between the two instruments. Concordance was poor with the lower 95% CI limit for Lin’s CCC estimated to be 0.30. The scatter plot and QQ plot showed that there seem to be concordance between −0.2 to 0.2. For quantiles above 0.2, the EQ-5D-3L has higher values, and for quantiles below −0.2, the EQ-5D-3L has lower values. Comparing QLQ-C30 data from the trial to the utility weights of the Australian QLU-C10D explores this issue further. The main QLQ-C30 data for this clinical trial were reported by Long et al. [20]. The QLQ-C30 single items with the largest influence on the QoL were fatigue and insomnia. These two items had the lowest disutility weights among the Australian QLU-C10D weights according to King et al. [14]. Moreover, the items that had the highest weights for the QLU-C10D were nausea and pain. These two items were not important for melanoma patients in the study. This suggests that the QLU-C10D weights are not suited to appropriately value the QoL for patients from the clinical trial of interest. A comparison with the Canadian [28] and UK [29] utility weights for the QLU-C10D was done to ascertain whether this is a general problem with the QLU-C10D. This did not appear so as the Canadian and UK weights had more than 50% greater disutilities for sleep than the Australian weights. Fatigue, the other item with high influence on QoL in the trial, was also valued higher for the Canadian QLU-C10D. Thus, it appears that it is an issue with the Australian QLU-C10D weights and not in general for the QLU-C10D.

Another potential issue is that the QLQ-C30, upon which the QLU-C10D was developed, is not able to capture the impact of treatment-specific adverse reactions [30]. QLQ-C30 items are tailored to capture issues related to chemotherapies such as nausea/vomiting, constipation, appetite loss, and diarrhoea. Newer cancer treatments like immunotherapies have different adverse event profiles [31]. Common immune-related adverse events include colitis, pneumonitis, hypothyroidism, and inflammatory arthritis, [32] which are not explicitly captured by the QLQ-C30.

The QLU-C10D would seem to be a more sensitive instrument than the EQ-5D-3L. For example, the physical dimension in the QLU-C10D is represented by a four-level item for walking, whereas mobility for the EQ-5D-3L only has three levels. As discussed, the weights for the QLU-C10D do not seem to put emphasis on the items important for the patients in this study. The EQ-5D-3L on the other hand has broader questions that capture additional aspects compared to the QLU-C10D.

It has been argued that late-stage cancer patients may be particularly burdened by participating in clinical research and that it is the responsibility of the clinical researcher to ensure that they are not subjected to more tests than needed [33]. As such, it would be desirable to reduce the burden of filling out QoL questionnaires. However, our study suggests that the QLU-C10D or EQ-5D-3L cannot be considered substitutes for one another, with Lin’s CCC showing low concordance between the two instruments. We therefore concur with recent recommendations by Faury et al. [12] that several QoL instruments might be needed to adequately cover the domains needed for immunotherapy.

There are several limitations to this study. Firstly, we did not have access to the full data set of the trial and were therefore not able to examine the two instruments in detail. For example, access to toxicity data would have enabled us to assess whether the QLU-C10D and the EQ-5D-3L capture disutilities connected with immune-related adverse events. Secondly, the conclusions from this study are limited to the Australian melanoma population as we only had access to an Australian decision model. Furthermore, the trial population comprised patients from other countries than Australia, and it is unclear whether the outcomes are directly translatable to the Australian melanoma population. Finally, the clinical data might not reflect current practice as it is from 2013, and future research is required for further validation.

5 Conclusion

To our knowledge, this is the first study to compare the EQ-5D-3L with the QLU-C10D using a CUA. This study demonstrates that there is no reason to consider the condition-specific QLU-C10D when using Australian weights for CUA in immunotherapy-treated metastatic melanoma as the existing generic EQ-5D-3L instrument adequately captures QoL impacts.