The ERG reviewed the clinical effectiveness and cost-effectiveness evidence of R2 for this indication. As part of the STA process, the ERG and NICE had the opportunity to ask for clarification on specific issues in the CS, in response to which the company provided additional information [3]. Based on this information, the ERG produced an ERG base case by modifying the health economic model submitted by the company, and assessed the impact of alternative assumptions and parameter values on the model results. Sections 3.1–3.6 summarise the evidence presented in the CS, as well as the review of the ERG.
Clinical Effectiveness Evidence Submitted by the company
The CS included six studies that were deemed relevant. Four of these studies evaluated R2, of which one was a randomised controlled trial (RCT) of R2 versus R-mono (the AUGMENT trial) [4]; the remaining three studies did not include relevant comparators according to the NICE scope. A fifth relevant study, by van Oers et al., evaluated rituximab combined with cyclophosphamide, doxorubicin, vincristine, and prednisolone (R-CHOP) versus CHOP [5]. A further study evaluated O-Benda versus bendamustine monotherapy (the GADOLIN trial [6]). These last two studies were included for unanchored indirect comparisons. The trial by van Oers et al. (2006) [5] was used to compare R2 versus R-CHOP although this study only included rituximab-naïve patients and was therefore not representative for the UK patient population. The GADOLIN study was included for an indirect comparison of R2 with O-Benda.
The AUGMENT trial was a randomised, double-blind, multicentre, controlled trial comparing R2 versus R-mono in non-rituximab refractory patients with FL grade 1, 2, or 3A or MZL. The study was conducted across 96 sites in 17 countries outside the UK. Intravenous (IV) rituximab 375 mg/m2 was given every week in cycle 1 (days 1, 8, 15, and 22) and on day 1 of every 28-day cycle for cycles 2–5. R2 arm patients received lenalidomide once daily on days 1–21 of every 28-day cycle up to 12 cycles. Dose modification rules allowed for dosing down lenalidomide to 2.5 mg. Treatment continued until progression or unacceptable toxicity.
Baseline demographics for the population in the AUGMENT trial were similar between arms. Overall, 261 patients (73%) had Ann Arbor stage III–IV disease; 123 patients (34%) had a Follicular Lymphoma International Prognostic Index (FLIPI) score ≥ 3; and 183 patients (51%) had high tumour burden per Group d’Etude des Lymphomes Folliculaires (GELF) criteria.
Results from the AUGMENT trial show favourable results for R2 when compared to R-mono in terms progression-free survival (PFS) with a greater median PFS (results were confidential). However, there was no evidence of a difference in overall survival (OS) with a hazard ratio of 0.61 (95% confidence interval 0.33–1.13) for patients treated with R2 compared to R-mono. At the time of the analysis the OS data were immature with 16 deaths on R2 and 26 deaths on R-mono. Overall response rate (ORR) was significantly greater for R2 compared with R-mono (78% vs. 53%; p < 0.0001). The complete response (CR) rate was also greater for the R2 arm compared with R-mono (34% vs. 18%; p = 0.001). In terms of health-related quality of life, no clinically meaningful change from baseline in the Global Health Status/Quality of Life (GHS/QoL) domain of the European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire-Core30 (QLQ-C30) was observed across any of the post-baseline assessment visits. Between-group differences in mean changes were small and not clinically meaningful across all assessment visits.
Treatment-emergent adverse events (TEAEs) during AUGMENT for the total population (FL and MZL) were reported in 174 patients (99%) in the R2 arm and 173 patients (96%) in the R-mono arm. More patients in the R2 arm (69%) experienced a grade 3 or 4 TEAE compared with those in the R-mono arm (32%), and two patients in each treatment arm reported a grade 5 TEAE. Additionally, a greater proportion of patients reported serious adverse events in the R2 arm (26%) compared with those in the R-mono arm (14%).
The company performed three unanchored indirect comparisons, two using data from published evidence and one using data from the Haematological Malignancy Research Network (HMRN) [7]. The HMRN is a population-based cohort covering the Yorkshire and Humber & Yorkshire Cancer Networks for all patients newly diagnosed with a haematological malignancy between 2004 and 2016.
The unanchored indirect comparisons were as follows:
-
R2 versus R-CHOP for non-rituximab refractory patients, using van Oers et al. (2006) [5] comparing R-CHOP with CHOP (only the R-CHOP arm was used in the analyses).
-
R2 versus O-Benda for rituximab refractory patients, based on comparator data from a study by Sehn et al. (2016) [6] comparing O-Benda with bendamustine monotherapy (only the O-Benda arm was used in the analyses).
-
R2 versus R-CHOP/rituximab combined with cyclophosphamide, vincristine, and prednisolone (R-CVP) for non-rituximab refractory patients. This was done using data from HMRN.
The two unanchored indirect comparisons using published evidence have not been used by the ERG in their deliberations because the study by van Oers et al. is not representative for UK patients, and O-Benda is not a relevant comparator according to the NICE scope.
Results from the remaining matching-adjusted indirect comparison (MAIC) (R2 versus pooled data for R-CHOP/R-CVP for non-rituximab refractory patients using data from HMRN) show a significant improvement in OS and time to next anti-lymphoma treatment (TTNLT) for R2 compared with R-CHOP/R-CVP, but no evidence of a difference in PFS. All results were confidential.
Critique of Clinical Effectiveness Evidence and Interpretation
The CS and response to clarification provided sufficient details for the ERG to appraise the literature searches conducted as part of the systematic review to identify clinical effectiveness studies. A good range of databases and resources were searched.
The CS included one relevant study, for the comparison of R2 versus R-mono: the AUGMENT trial [4]. All patients in this trial were rituximab refractory. In addition, the company performed an unanchored indirect comparison of R2 versus R-CHOP and R-CVP, using data for R2 from the AUGMENT trial and pooled data for R-CHOP/R-CVP from the HMRN database.
The results of the MAIC should be treated with a high degree of caution. This is because of the exclusion of potentially important covariates were excluded from the matching models, small sample sizes, assumptions about the equivalence of R-CHOP and R-CVP in the HMRN data, and differences in the PFS definitions and length of follow-up between the two data sources. The analysis used an unanchored MAIC involving two single treatment arms from different studies, as there was no relevant comparative trial data. This analysis is based on the assumption that all effect modifiers and prognostic factors are accounted for in the model, which in practice is difficult to achieve as not all studies measure all relevant variables.
Cost-Effectiveness Evidence Submitted by the Company
The company conducted searches for cost-effectiveness, health-related quality of life, and healthcare resource use evidence. Although four economic evaluations from a UK perspective were identified, none included R2, and the company therefore chose to base their submission on a de novo cohort-level partitioned survival model (PSM) with three health states: progression-free (PF), post-progression (PP), and death (Fig. 1). The company argued that a PSM was more appropriate than a state transition model (STM) because of a lack of data on post-progression survival (PPS).
The analysis took an NHS and Personal Social Services (PSS) perspective. The model had a time horizon of 40 years with a cycle length of 28 days, and a half-cycle correction was applied. All costs and quality-adjusted life years (QALYs) were discounted at a rate of 3.5% per year.
The patient population considered in the model was in line with the proposed licence: adult patients with previously treated FL or MZL. Due to the similar prognosis of FL and MZL patients, and the difficulty in sourcing MZL-specific data, FL and MZL populations were pooled throughout the economic analysis. After the final marketing authorisation, which did not include MZL, the company provided an addendum containing evidence on only the FL population.
Lenalidomide and rituximab are administered orally and by (IV) infusion, respectively. The comparators in the economic model were rituximab in combination with chemotherapy, i.e. R-CHOP or R-CVP, and O-Benda. The ERG did not include O-Benda in its review, as NICE explicitly stated that it was not considered a relevant comparator for disease that is rituximab refractory.
The main source of evidence on treatment effectiveness used for intervention and comparators was the AUGMENT study [4] for R2 and HMRN data [7] for R-CHOP and R-CVP.
Based on HMRN data and clinical opinion, the efficacy (OS and PFS) of R-CHOP and R-CVP were assumed to be similar, and hence HMRN data for R-CHOP and R-CVP were pooled. For the economic model, this implied that the comparisons of R2 versus R-CHOP and R-CVP had identical outcomes for effectiveness (QALYs) and only differed with respect to costs.
Parametric survival curves were fitted to the matched patient-level data from AUGMENT and HRMN and were then used to extrapolate survival beyond study follow-up. Survival analysis was performed for OS, PFS, TTNLT, and time on treatment (ToT). PFS and ToT data were used to determine the number of patients staying in the PF (on- and off-treatment) health state. The proportion of patients moving to the PP (on- and off-treatment) health state was based on PFS, TTNLT, and OS data. The curves were adjusted for treatment waning, which in the company’s base case was assumed to occur at 5 years, consistent with previous NICE submissions in the same disease area (TA472 [8] and TA137 [9]). After this time point, the comparator hazard of progressing or dying was applied to the R2 arm.
Utility values for health states PF and PP on and off treatment were estimated by means of a mixed effects model using EQ-5D-3L data collected in AUGMENT. As the disease characteristics that were used to derive utility values from the mixed effects model were population dependent, the utility values for R2 versus R-CHOP/R-CVP and R2 versus R-mono differed by population (see Table 1). The utility values resulting from the mixed effects model were used to inform the health states in the model for all treatments. Utility values from the study of Wild et al. [10], which were substantially lower for patients in particularly the PP state, were tested in a scenario analysis. Utility decrements for grade 3 and four adverse events were applied in the model for the expected duration of each adverse event, based on literature and previous appraisals.
Table 1 Health state utility values used in the economic FL-only model The cost categories included in the model were costs associated with treatment (drug acquisition costs including subsequent therapies, drug administration costs including subsequent therapies, costs associated with treatment-related adverse events), disease monitoring costs, and costs associated with end of life care. All costs were based on or inflated to the 2018 price level. Unit prices were based on the NHS reference costs [11], Personal Social Services Research Unit (PSSRU) [12], Monthly Index of Medical Specialities (MIMS) [13], and the electronic Market Information Tool (eMIT) [14]. Dosing data for lenalidomide were taken directly from AUGMENT. Cost calculations were adjusted for treatment reductions and missed treatment cycles. The same method was applied to calculate rituximab costs for the R2 arm. Drug administration costs were based on NHS reference costs tariffs, pharmacy costs for the preparation of the infusion, and NHS transport costs [11]. Costs of a full blood count were added to each treatment cycle for lenalidomide per visit to monitor the dose-limiting toxicities of neutropenia and thrombocytopenia. Costs of disease monitoring were separately estimated per health state and based on previous FL submissions [15, 16]. Costs of autologous stem-cell transplant (ASCT) were assigned to 11.8% of patients in R-CHOP. For R-CVP and R2, ASCT was considered not to occur in clinical practice and therefore there were no costs of ASCT in these comparators. The frequency of grade 3–4 adverse events that occurred in ≥ 2% of patients was applied to the incidence rate for each treatment to obtain a one-off upfront cost for each treatment arm in the model. Terminal care was also applied as a one-off cost when a patient died. Lastly, subsequent treatments were applied in the model as an average one-off cost to patients entering the PP (on-treatment) health state, based on AUGMENT data for R2 and the HMRN database for R-CHOP and R-CVP.
In the company’s base-case analysis for the FL-only population, total life years and QALYs gained, as well as total costs, were higher in the R2 arm compared with the R-CHOP and R-CVP arm. Incremental QALYs were mainly driven by QALY gains in the PP (off-treatment) health state. Incremental costs mainly resulted from higher drug acquisition costs. All cost and QALY results were confidential. The deterministic (probabilistic, based on 1000 iterations) incremental cost-effectiveness ratio (ICER) amounted to £15,909 (£27,768) per QALY gained for R2 versus R-CHOP and £23,746 (£41,602) per QALY gained for R2 versus R-CVP. For R2 versus R-CHOP, the ICER was most sensitive to the cost of ASCT, the total subsequent treatment costs for R-CHOP and the proportion of patients who receive ASCT. For R2 versus R-CVP, the ICER was most sensitive to the total subsequent treatment costs for R-CVP (including ASCT costs), administration costs, and resource use costs. The considerable difference between deterministic and probabilistic ICERs was attributed to increased uncertainty in the R2 OS extrapolations in the FL-only population compared to the initial FL + MZL population.
Similarly, for R2 versus R-mono, the company’s base-case analysis (provided after the clarification phase upon request of the ERG) resulted in higher total life years and QALYs gained and higher costs for R2. Incremental QALYs were mainly driven by QALY gains in the PF health state. The cost difference was mainly caused by higher drug acquisition costs. The deterministic ICER amounted to £20,274 per QALY gained, and the probabilistic ICER was £23,412 per QALY gained. The deterministic sensitivity analysis revealed that the ICER was most sensitive to the total subsequent treatment costs for R2 and R-mono and the frequency of haematologist visits PP.
Critique of Cost-Effectiveness Evidence and Interpretation
Searches were clear, transparent, and reproducible and unlikely to have missed any relevant studies. The ERG agreed with a de novo approach to modelling the cost-effectiveness of R2. The CS was largely in line with the NICE reference case, but deviated from the scope concerning the comparators modelled. More specifically, R-mono was excluded while direct evidence existed for R2 versus R-mono, and in the refractory population, O-Benda was the sole comparator while NICE had explicitly stated it was not a relevant comparator for this appraisal. Most crucially, the ERG had concerns about the appropriateness of the PSM approach and its superiority over an STM and would have liked to see both approaches properly explored, particularly in the light of the limitations of PSM highlighted in NICE Technical Support Document (TSD) 19 [17]. PSM models have the advantage that they are easy to estimate from the trial time to event data, and because such data are employed to summarise treatment effectiveness, they are also easy to explain. However, they have a major disadvantage that each time to event function used to calculate the probability of remaining in each health state (PF or PD) is estimated independent of the other. Not only does this method lead to bias in that it is unlikely that the functions are not correlated, but it often leads to implausible scenarios such as the probability of remaining in the PF state exceeding the probability of remaining alive. Indeed, in this model, the curves were adjusted to ensure that long-term PFS estimates would not be higher than TTNLT or OS. Also, avoiding implausible curve crossing seemed to be the main argument for selection of survival function. Although the ERG requested the company provide an STM during the clarification phase, the company did not provide it until late in the process, and it only contained R2 and R-mono as comparators, which hampered the ERG’s assessment of the implications of using a PSM approach.
The ERG was concerned about the company pooling MZL and FL populations in the model, assuming they were comparable. The ICER for the company’s FL-only scenario was substantially higher for the R-CHOP and R-CVP comparisons. This raises serious doubts about the validity of this assumption, and the ERG considered this to be a relevant source of uncertainty. In the re-submitted model following the final marketing authorisation that was granted for the FL population only, this was no longer an issue.
A main concern of the ERG was the trustworthiness of R2 efficacy estimate resulting from the indirect comparison, which seemed to be inflated relative to the direct comparison data from AUGMENT. This could be concluded from the fact that QALYs for R2 were substantially lower in the R2 versus R-mono (direct) comparison than in the R2 versus R-CHOP/R-CVP (indirect) comparison. So, the efficacy of R2 was sensitive to the method used and therefore may have been biased. Although the ERG did not have the necessary data to quantify this uncertainty, the use of efficacy estimates from the MAIC may have impacted the ICER substantially in favour of R2.
The ERG had concerns about the way survival curves were selected and validated. For the FL-only analyses presented in the company addendum, OS as predicted by the parametric survival curves was very different from OS curves presented in the original submission (which included both FL and MZL populations). No clinical validation of these new OS curves was performed. The ERG considered this process to deviate from TSD 14 recommendations [18] on survival analysis. The choice of OS likely introduced substantial uncertainty in the analyses.
The ERG considered utility values to be potentially overestimated, being higher than or comparable to those in the general population. With utilities remaining high throughout the model, any adjustment in survival curves had little impact on the ICER, as a high utility PP (relative to pre-progression) implied there was hardly any penalty on progression in terms of quality of life.
The ERG considered the costs of subsequent treatment for R-CHOP and R-CVP to be likely overestimated, as they were based on a mixed R-chemo population from HMRN, while also data specific to R-CHOP and R-CVP separately were available from this source. This was adjusted for in the ERG base case. The ERG was also concerned about the fact that in the PP on-treatment phase, there would be a one-off cost for subsequent treatments only, which may not be reflective of the long-term situation in this health state. As patients in the R2 arm remain in this health state for a longer time on average, applying costs as a one-off possibly favoured R2.
Additional Work Undertaken by the ERG
Based on all considerations highlighted in the ERG critique, the ERG defined a new base case for the FL-only population, in which various adjustments were made to the company’s base case. This included correction of an operational error in the implementation of the “van Oers” scenario for R-CHOP efficacy, using subsequent treatment rates for R-CHOP and R-CVP taken from the pooled R-CHOP/R-CVP population instead of from a larger mixed R-chemo population, and capping utilities at the general population level. Furthermore, the ERG applied all six possible distributions to extrapolate OS in both arms. This was decided based on the divergent results of the different OS curves and the substantial uncertainty surrounding parametric survival model selection. In addition, exclusively for the R2 versus R-CHOP and R-CVP comparisons, the log-logistic distribution was used to estimate PFS in the R2 arm, and Weibull was used to estimate PFS in the R-CHOP/R-CVP arm. In this analysis, TTNLT was estimated with a log-logistic distribution in both arms. The probabilistic ERG base case for R2 versus R-CHOP ranged from £16,874 to £44,888 per QALY gained (based on 1000 iterations). For R2 versus R-CVP, the ICER ranged from £23,135 to £59,810 per QALY gained, and for R2 versus R-mono, it ranged from £18,779 to £27,156 per QALY gained.
Furthermore, the ERG explored alternative PFS distributions and treatment waning effects, an alternative source for adverse events in R-CHOP and R-CVP, the application of the same subsequent treatment costs for R2 as for R-CHOP/R-CVP, lowered utilities, and an alternative source for R-CHOP efficacy. Applying the PP utility value by Pereira et al. [19] (0.45) was the most influential scenario (ICER R2 vs. R-CHOP £33,626 per QALY gained, ICER R2 vs. R-CVP £47,281 per QALY gained) that was explored by the ERG.
Conclusions of the ERG Report
The clinical evidence relied on an MAIC. The results of the MAIC should be treated with a high degree of caution. This is because of the exclusion of potentially important covariates were excluded from the matching models, small sample sizes, assumptions about the equivalence of R-CHOP and R-CVP in the HMRN data, and differences in the PFS definitions and length of follow-up between the two data sources. The analysis also used an unanchored MAIC involving two single treatment arms from different studies, as there was no relevant comparative trial data. This analysis makes the assumption that all effect modifiers and prognostic factors are accounted for in the model, which in practice is difficult to achieve as, in this case, one or both studies did not measure a specific variable.
Even though the ERG base-case ICER for R2 versus R-CHOP was below £20,000 per QALY gained, the uncertainty around the cost-effectiveness of R2 was substantial, mainly caused by the possible bias introduced by the indirect treatment comparison, which could not be accounted for in the ERG analyses. In addition, specific to the FL-only population analyses presented in the company addendum [20], the uncertainty around the OS estimates and the lack of clinical validation of these estimates would warrant even more caution in the interpretation of results. The ICER for R2 versus R-CVP is higher and suffers from the same uncertainty.