FormalPara Key Points for Decision Makers

Non-randomized evidence can be used to assess comparative effectiveness and inform cost-effectiveness analyses in the absence of randomized evidence. However, reviewers and decision makers should pay special attention to the adjustment methods used for accounting for differences between patient populations at baseline as they may have considerable impact on effectiveness and cost-effectiveness results.

Treatment recommendations can be changed in time, based on additional clinical and cost-effectiveness evidence and changes in the clinical landscape, as well as in prices.

Sometimes, the end-of-life criteria might not hold for all of the comparators. In such a situation, the decision maker might base the recommendation according to the end-of-life threshold for the comparisons that satisfy the end-of-life criteria. For the other comparisons, the decision maker might instead apply the standard threshold to the ICERs calculated (e.g. ‘costs per QALY gained’ or ‘costs saved per QALY lost).

1 Introduction

The National Institute for Health and Care Excellence (NICE) is an independent organization providing national guidance on promoting good health and preventing and treating ill health [1]. The single technology appraisal (STA) process is designed to provide recommendations on a single product, device or other technology with a single indication. The process covers new technologies and enables NICE to produce guidance shortly after the technology is introduced into the UK. The NICE Appraisal Committee (AC) obtains relevant evidence from several sources: the company submission (CS), a report from the appointed independent Evidence Review Group (ERG) and advice from consultees (i.e. patients, experts and other stakeholders). The CS includes a written report and a decision analytic model that describes the clinical and cost effectiveness of the technology under investigation. The ERG, an external organization independent of the NICE, reviews the CS and produces a summary report and provides a critique of the submitted evidence. After consideration of all the relevant evidence, the AC formulates preliminary guidance, in the form of the Appraisal Consultation Document (ACD), as to whether or not to recommend the intervention. The stakeholders are invited to comment on thisACD and the submitted evidence. A subsequent ACD may be produced or a Final Appraisal Determination (FAD) issued. Once published, NICE technology guidance provides a legal obligation for National Health Service (NHS) providers to reimburse technologies that have been approved. This paper presents a summary of the ERG report and the development of NICE guidance based on the findings of the AC for the STA of pomalidomide (POM) in combination with low-dose dexamethasone (DEX), for treating relapsed and refractory multiple myeloma (RRMM) after at least two regimens including lenalidomide (LEN) and bortezomib (BOR). Full details of all relevant appraisal documents can be found on the NICE website [2.]

2 The Decision Problem

The underlying indication of this appraisal is RRMM. Multiple myeloma is a rare, incurable malignant haematological disease arising from the monoclonal expansion of plasma cells in the bone marrow [3]. It represents approximately 1% of all incident cancers globally and results in more than 43,000 deaths annually worldwide. Multiple myeloma is primarily a disease of the elderly, and the median age at diagnosis ranges from 69 to 74 years [4,5,6]. Patients suffer from a range of debilitating symptoms, including bone pain and damage including fractures, mobility problems, anaemia and general ill health [7,8,9].

The course of the disease is characterised by cycles of remission and relapse [10]. With increasing lines of therapy, there is a decreasing duration of response and, ultimately, development of refractory disease in combination with greater symptomatic burden [10,11,12]. The main aims of therapy are to control disease, maximise health-related quality of life (HRQoL) and prolong survival [10,11,12,13]. Despite the availability of MM agents such as thalidomide (THAL), BOR and LEN, prognosis remains bleak, especially for patients who relapse or are refractory to these agents, with a median survival of 3–9 months [14,15,16,17,18.]

The clinical pathway of care for multiple myeloma patients differs between transplant-eligible and transplant-ineligible patients, especially in the choice of the first-line therapy. For transplant-eligible patients, NICE guidance TA311 recommends BOR + DEX or BOR + THAL + DEX induction therapy followed by autologous stem cell transplant (ASCT) [19], whereas for transplant-ineligible patients, THAL in combination with an alkylating agent (e.g. melphalan) and a corticosteroid (e.g. prednisone) is recommended. However, for THAL-contraindicated patients, THAL may be replaced with BOR (NICE guidance TA228 [20]).

In the second line, BOR monotherapy is recommended (TA129 [21]) as a treatment option for people who are at their first relapse, with a patient access scheme (PAS). Note that even though LEN is not recommended by NICE, patients might have received LEN in the first two lines of therapy as part of clinical trials, as well as through previous funding from the Cancer Drugs Fund (CDF). From the third line onwards, LEN + DEX has been recommended with the condition of a maximum of 26 cycles (TA171 [22]), while PANO + BOR + DEX has been recommended as an option for treating “adult patients with relapsed and/or refractory multiple myeloma who have received ≥ 2 prior regimens including BOR and an immunomodulatory agent, when the company provides PANO with the discount agreed in the patient access scheme” (TA380) [23]. Besides the recommended treatments above, conventional chemotherapy (CC) and bendamustine (BEN) combinations (e.g. BEN in combination with THAL and DEX-BTD) are commonly used in UK clinical practice, with the latter being funded via the CDF [24].

POM (Imnovid®) is an oral immunomodulatory agent for RRMM. POM in combination with DEX has a UK marketing authorisation for the following indication: “pomalidomide in combination with low dose dexamethasone for the treatment of adult patients with relapsed and refractory multiple myeloma who have received at least two prior treatment regimens, including both lenalidomide and bortezomib, and have demonstrated disease progression on the last therapy” [25].

Following a previous appraisal in 2015, NICE did not recommend POM + LoDEX within its market authorisation (TA338). The agent was re-evaluated in the current submission following a new PAS price for the technology and additional clinical and cost-effectiveness evidence. The remit of this appraisal was specified by NICE’s final scope [26], which was to assess the clinical and cost effectiveness of POM + LoDEX within its licensed indication. The proposed positioning of POM + LoDEX is from third-line treatment onwards. For patients who have had two prior therapies, the comparators listed in the scope were PANO + BOR + DEX, whereas for patients who have had three or more prior therapies the comparators were CC, BTD and PANO + BOR + DEX.

3 The Independent Evidence Review Group (ERG) Report

Kleijnen Systematic Reviews Ltd (KSR), in collaboration with Erasmus University Rotterdam, acted as the ERG, and reviewed the evidence on the clinical and cost effectiveness of POM + LoDEX for the treatment of adult patients with RRMM who have received at least two prior treatment regimens, including both LEN and BOR, and have demonstrated disease progression on the last therapy as submitted by the company (Celgene).

The review embodied three aims:

  • to assess whether the CS conformed to the methodological guidelines issued by NICE [1];

  • to assess whether the company’s interpretation and analysis of the evidence were appropriate;

  • to indicate the presence of other sources of evidence or alternative interpretations of the evidence that could inform NICE guidance.

The ERG critically reviewed the evidence in the CS, in the response to clarification questions, and evidence provided after publication of the ACD. Furthermore, it conducted additional searches, explored the impact of assumptions on the incremental cost-effectiveness ratio (ICER), revised the economic model and explored additional scenario analyses.

3.1 Summary of the Clinical Evidence

The company conducted a systematic review to inform the submission. The main direct evidence was taken from the MM-003 randomised controlled trial, which was the only trial that compared POM + LoDEX with any of the comparators listed in the final scope (i.e. HiDEX; used as a proxy for CC). Individual patient data (IPD) from patients receiving POM + LoDEX in the MM-002, MM-003 and MM-010 studies, and from the MUK-One [27], Gooding et al. [15] and Tarant et al. [16] studies of BEN formed the source of the comparison between POM + LoDEX and BTD. For the comparison of POM + LoDEX versus PANO + BOR + DEX, IPD from the POM + LoDEX arms of the MM-003 [28], MM-002 [29] and MM-010 [30] studies, as well as the aggregate data from the single-arm PANORAMA-2 study, were used. An overview of the trials used can be seen in Table 1.

Table 1 Eligible studies included in the quantitative analysis

3.1.1 Direct Evidence

The MM-003 randomised controlled trial was the only study that directly compared POM + LoDEX with any of the comparators listed in the final scope [26]. This trial compared POM + LoDEX (POM 4 mg/day on days 1–21 and DEX 40 mg each week in a 4-week cycle) with high-dose dexamethasone (HiDEX; DEX 40 mg on days 1–4, 9–12 and 17–20 of a 4-week cycle). The company considered HiDEX as a proxy for CC. MM-003 included 455 participants and was an open-label, multinational trial including participants recruited in 93 study sites, 68 of which are located in Europe. The number of centres located in the UK and number of patients recruited in the UK was unclear.

Using the latest data cut-off of MM-003 (1 September 2013, investigator assessment) at 15.4 months follow-up, there was an increase in median survival with pomalidomide. Overall survival (OS) was significantly better for patients treated with POM + LoDEX compared with those receiving HiDEX [13.1 months vs. 8.1 months; hazard ratio (HR) 0.72, 95% confidence interval (CI) 0.56–0.92]. POM + LoDEX significantly extended progression-free survival (PFS) compared with HiDEX (4 vs. 1.9 months; HR 0.50, 95% CI 0.41–0.62).

Since 56% of patients in the HiDEX arm received subsequent therapy with POM in the MM-003 trial, the company provided crossover-adjusted OS results, which are given in Table 2.

Table 2 Intention-to-treat and crossover-adjusted OS results from MM-003 trial

The company found that 247 of 300 patients (82.3%) in the POM + LoDEX group had at least one adverse event (AE) considered by the investigator to be related to POM. Furthermore, 190 patients (63.3%) had grade 3–4 treatment-emergent AEs (TEAEs) considered to be related to POM. However, the company stated that “with dose modifications and supportive care the safety profile was predictable, manageable and generally well tolerated”. Events occurring more frequently in the POM + LoDEX group included neutropenia (51.3 vs. 20.0% for neutropenia and 9.3 vs. 0% for febrile neutropenia). The main cause of treatment discontinuation was progressive disease, and discontinuations related to AEs were uncommon. There were more dose interruptions in the POM + LoDEX group than in the HiDEX group (67 vs. 30%).

3.1.2 Indirect Evidence

No studies directly compared POM + LoDEX with either BEN or PANO + BOR + DEX. Furthermore, no studies could provide a common comparator to support an indirect comparison or mixed treatment comparison (MTC). As a consequence, the company selected individual treatment arms from the available studies and performed separate analyses comparing POM + LoDEX with each of the comparators independently.

3.1.2.1 Indirect Comparison with Bendamustine

Comparison of IPD from patients receiving POM + LoDEX in the MM-002 [29], MM-003 [28] and MM-010 [30] studies with IPD from the MUK-One [27], Gooding et al. [15] and Tarant et al. [16] studies of BTD was achieved using Cox proportional hazards regression models to adjust for factors thought to be prognostic of OS and PFS (based on mean covariate values according to the baseline of the pooled trials). The prognostic factors to be included in the regression model in the CS were determined according to a selection procedure based on data availability, information derived from the systematic literature review of prognostic factors in RRMM and clinical expert consultation. Following the selection procedure, the CS reported the following covariates in the final analysis given in Table 3.

Table 3 Covariates used in the BTD vs. POM + LoDEX indirect comparison

The MM-002 trial [29] alone was selected for use for POM + LoDEX within the base-case analysis due to the lower levels of refractoriness exhibited within this trial (78%) compared with the remainder of the POM + LoDEX data (95%). This lower level of refractoriness was considered more comparable with the BTD data (18–25% across sources). As this covariate was identified by clinicians as the most important prognostic factor and is difficult to adjust for with the current datasets (given that the overlap between datasets is low), it was considered more important to select the more comparable dataset for analysis than to retain the maximum number of patients for analysis in the POM + LoDEX arm.

The company conducted three sensitivity analysis. In the first sensitivity analysis, the company incorporated data from all available POM + LoDEX trials (MM-002 [29], MM-003 [28] and MM-010 [30]), while in the second sensitivity analysis, the company incorporated International Staging System (ISS) stage as an additional covariate, on the dataset that includes patients whose ISS stage was available (i.e. the MM-002 [29] and Gooding et al [15.] studies were excluded). The last sensitivity analysis was conducted on the same dataset used in sensitivity analysis 2, but this time ISS stage was not included as a covariate. The unadjusted and covariate-adjusted results from the base-case analysis and sensitivity analyses are given in Table 4.

Table 4 Summary results (OS, PFS) from the POM + LoDEX vs. BTD comparison

From the results of the base-case and sensitivity analyses, the company concluded that POM + LoDEX improves OS and PFS compared with BTD, and the inclusion of POM + LoDEX data from the MM-003 [28] and MM-010 [30] trials did not substantially alter the results of the PFS, whereas it reduced the OS benefit of POM + LoDEX compared with the base-case. Furthermore, the company suggested that, based on the results of sensitivity analyses 2 and 3, even though ISS stage is a very important predictive factor of an increased hazard of death and progression, it had little impact on the HR of PFS and OS.

3.1.2.2 Indirect Comparison with PANO + BOR + DEX

For the comparison of POM + LoDEX versus PANO + BOR + DEX, IPD from the POM + LoDEX arms of the MM-003 [28], MM-002 [29] and MM-010 [30] studies, as well as the aggregate data from the single-arm PANORAMA-2 study [31], were used. For this comparison, not all patients in the POM + LoDEX trials, but a subgroup (n = 886), who were refractory to BOR but not primary refractory, was included to align with the PANORAMA-2 population. Since no IPD were available from the PANORAMA-2 study publication [31], covariate adjustment methods used in the comparison with BTD were not possible, and the company used the matching-adjusted indirect comparison (MAIC) method to adjust for the differences in patient characteristics between studies. The MAIC method reweights the patient-level data for POM + LoDEX to reflect a population of similar baseline characteristics as the PANO + BOR + DEX population. The results from the unadjusted and MAIC-adjusted analysis are given in Table 5.

Table 5 Summary results (OS, PFS) from the POM + LoDEX vs. PANO + BOR + DEX comparison

As can be seen, the application of the MAIC method resulted in a 1-month increase in median OS for patients receiving POM + LoDEX compared with the unweighted analysis, and, in both cases, the median OS was shorter than the median OS of patients receiving PANO + BOR + DEX. The application of MAIC seems to have little effect on the median PFS of patients treated with POM + LoDEX compared with the unweighted analysis. These results suggest that POM + LoDEX reduced the risk of progression but increased the risk of death compared with PANO + BOR + DEX; however, the company noted that the differences were not statistically significant and the evidence for PANO + BOR + DEX was based on only 55 patients compared with 886 patients receiving POM + LoDEX.

3.1.3 End of Life

The company stated that POM + LoDEX is considered to meet the NICE end-of-life criteria in comparison with BEN and CC. They stated that “The estimated survival benefit compared to BEN and conventional chemotherapy is > 5 months in all comparisons (covariate adjusted and unadjusted, crossover adjusted and unadjusted). Modelled mean survival increase is 7–8 months”.

They further stated that in relation to PANO “Evidence for end of life is less compelling in the comparison to PANO + BOR + DEX as no improvement was demonstrated in median outcomes for OS; difficulties in comparing to PANO + BOR + DEX are, however, considerable given the limited evidence available and lack of patient level data to correct for differences in patient population”.

3.2 Critique of the Clinical Evidence

The systematic reviews conducted by the company (for both the clinical studies and the prognostic covariate selection procedure) were deemed appropriate to the scope of the submission. Although the ERG identified a number of problems in relation to searching for studies of clinical effectiveness, it was satisfied that the evidence presented in the submission was the best available in this limited area. The ERG agreed that a meta-analysis could not have been conducted as only one trial was deemed directly relevant to the decision problem (MM-003).

Although MM-003, the main trial forming the direct evidence for pomalidomide, was a reasonably large, well-conducted, multicentre trial, the main comparator (HiDEX) is no longer optimal in current practice; therefore, the comparator can only be viewed as a proxy for CC. Additionally, over 50% of patients in the trial were aged 65 years or under, therefore may reflect a younger population than that typically seen in practice. The ERG noted an underrepresentation of non-White participants. Under 1% were of Asian origin and 1.5% were of Black or African American origin. The trial was in a heavily treated population who had received a median of five therapies (range 2–17). Results were presented for only 17 patients receiving two prior therapies, thus the trial is not representative of POM + LoDEX as a third-line therapy. Within these constraints, POM + LoDEX appears to extend OS and PFS in comparison with HiDEX in a heavily treated population who are refractory to BOR and LEN. The AE profile appears to be manageable with appropriate dose reductions and interruptions. However, the slightly higher incidence of serious AEs (grade 3 and 4) attributed to POM + LoDEX was drawn to the attention of the committee, along with the more frequent occurrence of neutropenia.

No studies directly compared POM + LoDEX with either BTD or PANO + BOR + DEX. In addition, the available studies did not include a common comparator that would permit an indirect treatment comparison or MTC. As a result, the company presented evidence based on indirect comparisons. The ERG noted that the covariate adjusted results were very similar to the unadjusted results in terms of both PFS and OS for the base-case and the sensitivity analysis of POM + LoDEX compared with BTD, indicating that the differences between studies in the selected covariates (patient characteristics) have relatively little impact on the outcomes observed. The selection of different datasets for POM + LoDEX alters the results for OS. Results suggested that the survival benefit of POM + LoDEX was less for patients in the MM-003 and MM-010 studies than for patients in the MM-002 study of POM. The ERG is doubtful about the exclusion of the MM-003 and MM-010 trials in the base-case analysis of the BTD versus POM + LoDEX comparison on the basis of higher LEN refractoriness (approximately 20% in BTD trials, 78% in MM-002 and 95% in MM-003 and MM-010).

For the comparison with PANO + BOR + DEX, as in the comparison with BTD, the matching adjustment did not substantially alter the results. This implies that the differences between studies in the selected covariates have relatively little impact on the outcomes observed. The ERG noted that the number of patients receiving PANO + BOR + DEX was small (n = 55) and there were some differences between POM studies and PANORAMA-2 in terms of the number of prior lines of therapy.

The ERG recognised that the lack of all patient-level data might have excluded many of the standard alternatives/methodologies used in the indirect comparison of non-randomised data. However, the ERG considered that some of the choices (e.g. adjustment methods as well as datasets included) were rather arbitrary and the company could have followed a more systematic approach (e.g. following the recommendations in NICE DSU TSD 17 and TSD 18 [34]). Furthermore, there can be unmeasured confounding factors, which could have been quantitatively assessed.

In terms of the end-of-life criteria, the ERG agrees that the patient group, being at least at the third line of treatment for RRMM, have a short life expectancy, normally < 24 months. Hence, the first criterion for end of life has been met. As regard the second criteria, the ERG agrees that POM + LoDEX appear to meet end-of-life criteria of increasing survival in relation to BTD and DEX; however, the evidence suggests that POM + LoDEX does not meet this criterion compared with PANO + BOR + DEX. It is noted though that the evidence for PANO + BOR + DEX is based on a small number of patients (n = 55) and the analysis was limited by the lack of studies comparing these treatments.

3.3 Summary of the Cost-Effectiveness Evidence

The company performed an update of the systematic review of cost-effectiveness studies and evidence on healthcare resource use and HRQoL. Furthermore, the key issues raised by the committee from the previous appraisal (TA338) were extracted and are summarised in Table 6.

Table 6 List of issues raised by the committee in TA338 and the company’s description on how these issues are addressed in the current company submission

An electronic model was developed in Microsoft® Excel 2010 (Microsoft Corporation, Redmond, WA, USA) using a semi-Markov partitioned survival structure. The objective of the model was to estimate the cost effectiveness of POM + LoDEX when compared with BTD, PANO + BOR + DEX and CC for the treatment of patients with RRMM who were previously treated with LEN and BOR and whose disease progressed during the last therapy.

The model structure presented four health states: a pre-progressive state split into ‘on-treatment’ and ‘off-treatment’, a post-progression state (progressive disease), and death. The cycle length of the model was 1 week and the time horizon was 15 years (i.e. lifetime, since virtually every patient in the model died within 15 years). The model considered an NHS and personal social services perspective, and costs and utilities were discounted using a yearly rate of 3.5%.

The transition probabilities between health states were estimated from the parametric survival functions fitted to OS, PFS and time to treatment discontinuation (TTD) data from the relevant data sources. The data sources and methodological approaches for covariate adjustment for each comparison are summarized in Table 7.

Table 7 Data sources and methodological approaches for covariate/crossover adjustments for each comparison

Seven parametric distributions (exponential, log-normal, log-logistic, Gompertz, gamma, extreme value and Weibull) were examined for each clinical outcome (OS, PFS and TTD). The fit of each covariate-adjusted parametric model to the Kaplan–Meier survival data was explored using visual inspection, log-cumulative hazard and Q–Q plots, Akaike/Bayesian information criterion (AIC/BIC) goodness-of-fit statistics and clinical plausibility. In Table 8, the survival analysis methods and the curves chosen for the base-case are summarized.

Table 8 Survival analysis methods and curves chosen in the base-case for each comparison

TEAEs of grade 3/4 were included in the economic model based on a 2% occurrence threshold in the MM-003 trial dataset. For BTD and PANO + BOR + DEX, in the base-case, the AE rates of POM + LoDEX were multiplied by risk ratios for treatment discontinuation due to a TEAE (under the comparator vs. under POM + LoDEX). In a scenario analysis, instead of risk ratios, safety scores provided by an advisory board were used.

Utilities for each health state were found using a regression model based on EQ-5D data from the MM-003 trial. While many covariates were assumed to be the same between treatments, treatment-specific utilities were obtained by using treatment-specific values for the following covariates: disease progression, best overall response (applied after week 12, maintained lifelong), hospitalisations and AEs. Furthermore, a utility decrement of 0.025 (based on two previous STAs on lung cancer treatments [36, 37]) was applied for patients receiving subcutaneous or intravenous treatments in the base-case.

The drug acquisition costs of BTD, PANO + BOR + DEX and CC treatments were based on list prices. For CC, cyclophosphamide in combination with THAL + DEX was assumed to reflect the UK practice. A PAS price discount agreement was available for POM + LoDEX. In addition, for pomalidomide, cost savings due to less drug use from dose interruptions longer than 28 days were incorporated. A questionnaire completed by six clinical specialists was used to collect data on monitoring, concomitant medication and AE costs. End-of-life cost estimates from a UK study were used [38]. Costs associated with intravenous and subcutaneous administration visits were based on the TA311 appraisal for BOR in first-line MM therapy [19]. In the base-case, no subsequent therapies were included following treatment discontinuation, but the impact of this assumption was explored in a scenario analysis based on Haematological Malignancy Research Network registry data [4, 39]. All costs used in the model calculations were based on their 2015 (original or inflation-corrected) values.

The company did not provide a full incremental analysis including all comparators. Instead ICERs of POM + LoDEX versus each comparator were presented: £39,665 per quality-adjusted life-year (QALY) gained (vs. BTD), £141,793 savings per QALY lost—southwest quadrant (vs. PANO + BOR + DEX), and £44,811 per QALY gained (vs. CC).

Probabilistic analysis that included parametric uncertainty as well as structural uncertainty around the fitted curve distribution choice, generated similar mean ICERs and indicated the following probabilities of cost effectiveness of POM+LoDEX for each comparison: 92.8% versus BTD, 100% versus PANO+BOR+DEX (at list price for PANO), and 56.9% versus CC at a willingness-to-pay threshold of £50,000 per QALY gained.

The one-way sensitivity analyses showed that the parameters with the greatest impact on the outcomes were the coefficients used within the utility regression analysis for the comparison against BTD and CC, and the MAIC HRs (OS and PFS) for the comparison with PANO + BOR + DEX. The model is relatively insensitive to the majority of other parameters.

The scenario analyses demonstrated that for the comparison with BTD, time horizon, choice of dataset (e.g. MM-002, MM-003 and MM-010) and assumptions surrounding subsequent therapy, administration and AE costs had considerable impact on the ICER. For the PANO + BOR + DEX comparison, time horizon, OS/PFS curve distribution choices and the covariate adjustment method were the most influential on the ICER. Finally, for the comparison with CC, choice of the OS/PFS curve distribution and the method for the crossover adjustments had the biggest impact on the ICER.

3.4 Critique of the Cost-Effectiveness Evidence

The searches and economic model were considered by the ERG to meet the NICE reference case; however, in the final scope, NICE requested that at third line only PANO + BOR + DEX, and at later lines all comparators (BTD, PANO + BOR + DEX and CC), should have been considered. This line-based comparator stratification could not be achieved due to the lack of data.

In the CS base-case, a different dataset was used for each of the three comparisons, which implied that each comparison was based on a slightly different population. The ERG considered that an assessment based on a fully incremental analysis (e.g. an analysis that involves the calculation of incremental QALY gains and costs along all treatment options ranked by ascending cost), using the same dataset (MM-002, MM-003, MM-010 and all BTD trials) for the POM + LoDEX PFS/OS estimates and applying the MAIC-based PFS/OS HRs for the PANO + BOR + DEX comparison and PFS HR and crossover-adjusted OS HR from the MM-003 trial for the CC comparison, might have been more consistent.

In general, the process for extrapolating survival curves was in line with the NICE DSU 14 guideline [40]. However, it was not clear how the visual fit has been assessed (i.e. it seemed as if the covariates of the parametric survival curves were adjusted according to the baseline of the pooled trials, whereas treatment-specific baselines might have been more appropriate for visual fit). Furthermore, the ERG noted that different baselines were assumed in the corrected group prognosis (CGP) method (baseline from the pooled trial datasets) and in the mean covariates method (baseline representing UK clinical practice).

Similarly, the ERG had some concerns related to the implementation of AEs. The followed approach would mirror the frequency order of the AEs of POM + LoDEX (MM-003 trial) for each of the comparators, in the same magnitude. The ERG considered this assumption not to be plausible because each drug has a different working mechanism and different safety profiles, and it is unlikely that the AE frequency order would be mirrored for other comparators in the same magnitude.

The ERG did not encounter major issues with the approach used to include quality of life in the model. Nevertheless, the data and the assumptions about the covariates that are included in the utility regression model (e.g. best overall response rate, hospitalisation and AEs) might lead to bias in utility estimates (e.g. assuming the same hospitalisation rate from the HiDEX arm of the MM-003 trial for all comparators, inconsistencies within the categorisation of best overall response rate among different comparators, and the uncertainty around the disutility associated with intravenous/subcutaneous treatments).

The current submission re-estimated some of the resource use inputs compared with TA338 (e.g. monitoring costs were based on an extensive questionnaire completed by six clinical experts). The ERG considered that the input parameters derived from the resource use questionnaire should be considered with care due to the low number of experts (n = 6) who completed the questionnaire, as well as the length and level of detail of the questionnaire, which might have provided a challenge to fill in all fields with the required attention. Furthermore, the model allowed for a decrease in drug acquisition costs based on treatment interruptions lasting longer than 28 days only for POM and panobinostat, whereas the dose interruptions of BOR (within PANO + BOR + DEX), BTD and CC were not taken into account at all, creating a potential inconsistency. Finally, regarding the costs of subsequent treatment, the ERG does not agree with the base-case choice of excluding these costs. Since the effects of the subsequent treatments were implicitly incorporated into the OS results, it would be rational to also include the costs required to achieve those effects. The CS provided two cost estimates for the subsequent treatments to be used in scenario analyses. These estimates differed greatly, and the information on them was insufficient for the ERG to judge which estimate was better to use.

3.5 Additional Exploratory Analyses Conducted by the ERG

In the original CS model, the ERG identified some programming errors. After these errors were corrected, the base-case analyses of the company were repeated with the ERG-corrected model. The ICER results were approximately £45,000 per QALY gained for POM + LoDEX versus BTD, £143,000 savings per QALY lost for POM + LoDEX versus PANO + BOR + DEX—southwest quadrant—and £49,000 per QALY gained for POM + LoDEX versus CC.

The ERG then conducted a list of exploratory full-incremental analyses in which the effectiveness of BTD and POM + LoDEX treatments were based on the pooled dataset of MM-002, MM-003, MM-010 and all other BTD trials. In the first analysis, the CGP method was used for the covariate adjustment and POM + LoDEX and BTD OS and PFS curves were generated. For PANO + BOR + DEX, the HRs obtained from the MAIC were applied on top of the POM + LoDEX curves; and, similarly for CC, the HRs from the intention-to-treat analysis of the OS and PFS data from the MM-003 trial were applied on top of the POM + LoDEX curves. The second full-incremental analysis had the same assumptions as the first full incremental analysis, except the OS HR from the MM-003 trial that was applied on top of the POM + LoDEX curve was corrected for the treatment switching using a two-stage method. Finally, in the third full incremental analysis, for generating POM + LoDEX and BTD curves, the mean covariate adjustment method was used (not using the baseline of the pooled trials but a baseline reflecting real-world data from UK centres) instead of the CGP method. The results of these full incremental analyses (with list prices for panobinostat) are provided in Table 9.

Table 9 Results from the full incremental exploratory analyses

In all of these exploratory analyses, CC, POM + LoDEX and PANO + BOR + DEX were on the cost-effectiveness frontier. POM + LoDEX seems to be cost effective in comparison with PANO + BOR + DEX, however the cost effectiveness of POM + LoDEX versus CC was dependent on underlying assumptions of the analyses.

Furthermore, in several scenario analyses, the ERG explored the uncertainty surrounding the assumptions on dose interruptions, subsequent treatment costs, wastage drug costs, utility regression model inputs, AE rates and other utility estimates. The ICER results were more or less similar across these analyses. Finally, the ERG implemented the confidential PAS price for PANO + BOR + DEX, which decreased the ICER of PANO + BOR + DEX versus POM + LoDEX comparison (still in the southwest quadrant), however it still remained far above acceptable thresholds.

3.6 Conclusions of the ERG Report

Based on the MM-003 trial, POM appeared to extend OS and PFS in comparison with HiDEX in a heavily treated population who were refractory to BOR and LEN. The AE profile appears to be manageable with appropriate dose reductions and interruptions. The ERG drew attention to these AEs occurring more frequently in the POM arm, notably neutropenia.

For the comparison with BEN, the company used covariate adjustment methods to adjust for the differences between studies in patient characteristics. The covariate-adjusted results were very similar to the unadjusted results in terms of both PFS and OS, indicating that the differences between studies in the selected covariates (patient characteristics) have relatively little impact on the outcomes observed. However, the selection of different datasets for POM + LoDEX seemed to alter the results for OS and PFS.

For the comparison with PANO + BOR + DEX, the POM + LoDEX arms of a subpopulation from the MM-002, MM-003 and MM-010 studies were compared with the PANO + BOR + DEX arm of the PANORAMA-2 study using the MAIC method, which resulted in a 1-month increase in median OS for patients receiving POM + LoDEX compared with the unweighted analysis. In both cases, the median OS was shorter than those patients receiving PANO + BOR + DEX. The hazard of death seemed to be reduced by a similar amount for patients receiving PANO + BOR + DEX compared with POM + LoDEX (in both MAIC weighted and unweighted analysis, HR was approximately 0.75).

The economic model described in the CS was considered by the ERG to meet the NICE reference case to a reasonable extent and was mostly in line with the decision problem specified in the scope. The submitted model included some errors and the ERG corrected the model and conducted some exploratory analyses based on the pooled MM-002, MM-003, MM-010 and all BTD trials dataset. The full incremental exploratory analysis revealed that CC, POM + LoDEX and PANO + BOR + DEX were on the cost-effectiveness frontier and BTD was either dominated by CC or extendedly dominated by POM + LoDEX.

When POM + LoDEX was compared with CC and BTD, it can be considered as satisfying the end-of-life criteria, but when compared with PANO + BOR + DEX, the end-of-life criteria did not hold. The pairwise comparisons on the corrected company base-case revealed that the ICERs of POM + LoDEX versus CC and BTD were both below the £50,000 threshold. The ICER of POM+LoDEX versus PANO + BOR + DEX showed that POM + LoDEX was less effective but less costly, and the savings per QALY lost for POM + LoDEX suggested it was cost effective, without applying the end-of-life criteria threshold. Various scenario analyses revealed that the ICER is relatively robust against changes in input values and assumptions.

3.6.1 Key Methodological Issues

Due to the lack of randomized evidence, the company used different data sources and methodological approaches to assess the comparative effectiveness of POM + LoDEX versus BTD and PANO + BOR + DEX. Even though the chosen methods were deemed to be appropriate, the ERG noted that there was no formal method selection process. For instance, no justification was given by the company as to why MAIC was chosen instead of other alternatives such as simulated treatment comparison. Furthermore, the ERG was concerned that there was no assessment for possible unobserved confounding, which might have added additional uncertainty on the clinical and cost-effectiveness estimates. Finally, the ERG identified some inconsistencies in the implementation of these methods; for instance, CGP and the mean covariate methods assumed different baseline characteristics, or the indirect use of unrelated data from the BTD trials in the POM + LoDEX versus PANO + BOR + DEX comparison, etc. Therefore, the ERG has the opinion that a more systematic approach could have been followed in choosing and applying these methods for the analysis of nonrandomized observational data (e.g. in line with the recommendations in NICE DSU TSD 17 and TSD 18 [34]).

The lack of a full incremental analysis including all comparators in the CS was another concern. Instead of a full incremental analysis, the company provided individual comparisons and each of the comparisons was conducted on a different dataset, which implied a slightly different population for each of the comparisons. The ERG was aware that a full incremental analysis would break the randomization link between POM + LoDEX and CC, however considered that the additional insights gained from full incremental analyses should not be dismissed.

4 National Institute for Health and Care Excellence Guidance

The committee concluded that the appropriate positioning of POM + LoDEX, in line with clinical practice and the evidence base, was after third or subsequent relapses, and the relevant comparators were PANO + BOR + DEX, BTD and CC. The committee acknowledged that the indirect comparisons were associated with considerable uncertainty, but recognised that the company had presented all the appropriate evidence available for its decision making.

The committee agreed that the model structure was appropriate. The ICERs varied based on the clinical datasets included, and using crossover adjustment and covariate adjustment methods. The committee understood the full incremental analyses provided by the ERG, but concluded that it would base its decisions on the company’s base-case ICERs, corrected by the ERG for errors, for the sake of not breaking the randomisation between the POM + LoDEX and CC comparison.

The most plausible ICERs for POM + LoDEX versus CC and BTD were below £50,000 per QALY gained, and the committee concluded that POM meets the end-of-life criteria compared with these two comparators. The end-of-life criterion for an additional 3 months survival gain was not met for the comparison with PANO + BOR + DEX, and, after incorporating all PAS prices, the ICERs reflected ‘savings per QALY lost’, i.e. POM + LoDEX was less effective but less costly. The committee concluded that the savings per QALY lost for POM + LoDEX compared with PANO + BOR + DEX were high enough for it to be considered a cost-effective use of NHS resources without applying the end-of-life criteria.

4.1 Final Guidance

POM + LoDEX was recommended as an option for treating multiple myeloma in adults at third or subsequent relapse, i.e. after three previous treatments, including both LEN and BOR, only when the company provided POM with the discount agreed in the PAS.

5 Conclusions

In the absence of randomized evidence, alternative data sources (including non-randomized evidence) can be used to assess comparative effectiveness and to inform cost-effectiveness analyses. However, adjustment methods are needed to account for the differences between patient populations at baseline. This STA showed that the selection of the method and its implementation can have quite an impact on the ICER; therefore, it is important to base all of these decisions on a systematic approach.

This STA also showed that a full incremental analysis based on a common dataset for all comparisons might provide additional insights (e.g. the treatments that were on the cost-effectiveness frontier) compared with the individual comparisons based on different datasets, which led to different cost and QALY outcomes for the same treatment (POM + LoDEX) in each comparison. However, the full incremental analysis in this STA required the breakdown of the randomisation between POM + LoDEX and CC, and the decision makers might prefer their decision to be based on direct head-to-head evidence rather than indirect evidence.

Another interesting outcome of this STA was the fact that an intervention might satisfy the end-of-life criteria for some of the comparators, whereas for others, it might not since it leads to a lower total QALY outcome. In such a situation, the decision maker can still recommend the treatment if the ‘costs per QALY gained’ were below the end-of-life threshold for the comparisons with less effective comparators, and if the ‘costs saved per QALY lost’ were higher than the standard threshold for the comparisons with more effective comparators.

Finally, this STA showed that a negative treatment recommendation from a previous appraisal can be changed in the presence of additional evidence, changes in the clinical landscape, and in the costs and drug prices.