FormalPara Key Points for Decision Makers

All three model structures provided a reasonable fit to the Kaplan–Meier data.

The long-term overall survival predicted by each structure resulted in non-trivial differences in the mean estimated survival benefit.

The evidence from this study highlights the need to validate assumptions regarding survival and the application of treatment effects against what is known about the characteristics of the disease.

1 Introduction

Decision models in healthcare are typically structured around a series of health states descriptive of the disease and characterized by associated costs and health-related quality of life (HRQoL). The expected costs and HRQoL of alternative treatments can be compared by estimating the amount of time spent in each health state over the timeframe of the analysis. In state-transition models (STMs; such as Markov or semi-Markov models [SMMs]), state occupancy is estimated by applying a set of transitions that define the probability that an individual currently in a health state (i) will move to another health state (j) in the following time period. A simple three-state model commonly applied in oncology modeling—progression-free (PF), progressed disease (PD), and death—has three sets of transitions: from PF to PD, from PF to death, and from PD to death (Fig. 1). Transition probabilities are conditional on the starting health state and can be time- and/or treatment-dependent. In a partitioned survival model (PSM), health state occupancy is estimated directly from the area under the relevant survival curve. In the three-health state example, overall survival (OS) is partitioned into those alive and PF (i.e., progression-free survival [PFS]) and those alive and with PD. The proportion alive at time t is given by the area under the OS curve, and the proportion alive and PF is given by the area under the PFS curve. The proportion alive and with PD is given by the difference between the OS and PFS curves. In this framework there are two implicit transitions: from PF to PD or death, and from PD to death. An inherent assumption of these model structures is that the disease progression is irreversible: patients can move from the PF to the PD health state but they cannot return to the PF health state.

Fig. 1
figure 1

Three health state structures used by the partitioned survival model, post-progression survival Markov, and post-progression survival–progression-free survival Markov. TP transition probability

In a recent review of the technology appraisals in oncology conducted by the National Institute for Health and Care Excellence (NICE) between May 2013 and February 2016, 23 of 30 manufacturers’ submissions presented a PSM rather than a Markov or semi-Markov approach [1]. Regulatory approval for new oncology interventions is increasingly being granted on the basis of evidence from early-phase, single-arm trials with limited follow-up [2], and, against this background, the PSM may be seen as a relatively straightforward and intuitive approach because state occupancy can be estimated directly from trial-based estimates of OS and PFS. Individual patient data are usually available to estimate survival functions for the target intervention, but not for comparators outside of the clinical trial. An advantage of the PSM approach is that OS and PFS curves can be constructed from Kaplan–Meier curves or from a synthesis of published evidence. However, there are some perceived limitations to this approach [3].

In a within-trial analysis, the two approaches are expected to produce similar results, and both should provide a good fit to the trial data. However, the two approaches employ intrinsically different assumptions that may affect extrapolations of survival. For example, the PSM assumes independence between survival functions: in the three-state example, OS and PFS curves would be extrapolated from the trial data independently. However, depending on the disease, it is possible that the probability of PD is a function of the length of time spent in the PF state, and mortality risk after progression is a function of the time in PD as well as the time spent in PF. Extrapolating PFS and OS independently could also result in the analysis yielding implausible results. For example, the OS and PFS curves could cross, or the extrapolations could produce unrealistic estimated hazards (e.g., the risk of an event in the OS curve being lower than the risk of an event in the PFS curve). In an STM, survival functions are explicitly linked, and extrapolations can be informed by an interaction between underlying disease processes. The extent to which these structural differences affect the ability of either approach to inform decisions about cost effectiveness is uncertain.

In this analysis we have used individual patient data from a randomized clinical trial of patients with advanced, metastatic renal cell carcinoma (RCC; CheckMate 025) [4] to compare survival outcomes predicted by a PSM and two different STMs. In the PSM approach, parametric survival curves were fitted directly to PFS and OS trial data and extrapolated to a lifetime horizon. The proportion of patients in each health state (PF, PD, and death) at time t was estimated from the areas under the PFS and OS curves. The first STM is described as a post-progression survival (PPS) Markov. In this approach, parametric curves were fitted to PFS, time to progression (TTP), and PPS data from the trial. The proportion of patients in each health state was estimated from a set of time-dependent transition probabilities. The transition probability from PF to PD was estimated from the TTP curve, and is dependent on the time spent in the PF state. The transition from PF to death is based on differences between the PFS and TTP curves. Transitions from PD to death were estimated from the PPS curve. Transitions are dependent on the time spent in PD, but are not dependent on time spent in the PF state. This assumes that the mortality risk after progression does not depend on the time at which the disease progressed. The second STM is described as a PPS–PFS Markov. The difference here is that the transition from PD to death is modeled using tunnel states, which allow the mortality risk to depend on the length of time the patient has been in the PF state. Different PPS curves are generated conditional on the length of PFS. This structure attempts to model the PPS more realistically by using the relationship between a prognostic variable at progression (TTP) and survival. The Markov structures discussed in this study require access to TTP and PPS data, which are not usually published but are available through patient data from a clinical trial. All of the models were implemented in Microsoft Excel® (Microsoft Corp., Redmond, WA, USA).

2 Methods

2.1 Data Sources

Patient-level data were available from the CheckMate 025 trial (ClinicalTrials.gov identifier NCT01668784) with 14 and 38 months of minimum follow-up. CheckMate 025 is the pivotal randomized controlled trial for nivolumab compared with everolimus in patients with advanced or metastatic RCC [4]. The dataset with 14 months’ minimum follow-up was used to derive clinical inputs for each model structure in the base case, as these are the data that were available at the time of health technology assessment (HTA) submissions for nivolumab in RCC. The primary endpoint was OS. The median OS was 25.0 months (95% confidence interval [CI] 21.8–not estimable) for patients treated with nivolumab compared with 19.6 months (95% CI 17.6–23.1) for patients treated with everolimus (hazard ratio 0.73; 95% CI 0.57–0.93), indicative of a clinically and statistically significant improvement in OS achieved in patients treated with nivolumab. The OS curves from this trial are shown in Fig. 2a. The median PFS as defined by RECIST (Response Evaluation Criteria in Solid Tumors) 1.1 assessment was 4.6 months (95% CI 3.7–5.4) with nivolumab and 4.4 months (95% CI 3.7–5.5) with everolimus (hazard ratio 0.88; 95% CI 0.75–1.03) [4]. The PFS curves from this trial are shown in Fig. 2b [4].

Fig. 2
figure 2

Overall survival (a) and progression-free survival (b) curves from the CheckMate 025 trial. CI confidence interval, NE not estimable

CheckMate 025 was an open-label trial. A protocol amendment after data read-out allowed patients in the everolimus arm to cross over to the nivolumab arm. For the 14-month data cut, 55.4% of patients in the nivolumab arm received subsequent therapy (2.4% of patients received immunotherapy, 53.9% received another approved agent, 3.2% received another investigational agent). In the everolimus arm, 63% of patients received subsequent therapy (5.1% received immunotherapy, 60.8% received other approved agents, 5.4% received other investigational agents including 1% who crossed over to receive nivolumab). Survival was not adjusted for subsequent treatment received and more patients in the everolimus arm went on to receive subsequent treatment. Therefore, the relative benefit of nivolumab may have been under-estimated. In CheckMate 025, median PFS was essentially the same for both drugs, which at first glance may suggest that differences in OS only occurred during the post-progression stage and that the difference in OS may be due to subsequent treatment. However, some patients starting immunotherapy have demonstrated an initial increase in tumor lesion size, followed by tumor regression. This could be due to the mechanism of action of immunotherapy, which causes T cells to infiltrate the tumor site. This initial pseudo-progression could be classified inaccurately as disease progression based on RECIST 1.1 assessment [5]. CheckMate 025 allowed treatment beyond progression if there was investigator-assessed clinical benefit and tolerability. Analysis of patients treated beyond first progression in the CheckMate 025 study demonstrated that of 316 patients who progressed in the nivolumab arm, 153 patients were treated ≥ 4 weeks after first progression and 18 patients were treated < 4 weeks beyond progression. Approximately half of patients who were treated with nivolumab beyond first progression had a reduction in tumor burden post-progression and 14% had a ≥ 30% reduction in tumor burden post-progression. Of the 320 patients who progressed in the everolimus arm, 65 were treated ≥ 4 weeks beyond progression and 111 were treated < 4 weeks beyond progression. Approximately one-quarter of patients treated with everolimus beyond first progression had a reduction in tumor burden post-progression and none had a ≥ 30% reduction in tumor burden post-progression. Treatment beyond progression may have contributed to the OS benefit observed with nivolumab, suggesting that RECIST-defined PFS may not be a good surrogate of OS, especially with novel checkpoint inhibitors such as nivolumab [6].

After this analysis was conducted, a data cut with 38 months’ follow-up subsequently became available. These data were used to assess the predictive accuracy of each model structure. A sensitivity analysis using the dataset with 38 months of follow-up was also conducted.

Longer-term data were also available from the CheckMate 003 trial (ClinicalTrials.gov identifier NCT00730639), which was a phase I study of nivolumab in patients with a range of malignancies, including RCC [7].

2.2 Model Structure

Three economic models were developed in Microsoft Excel®. All have the same structure with three health states (PF, PD, and death [Fig. 1]), weekly cycle length, and lifetime (20-year) horizon.

The PSM uses parametric regressions fitted to the Kaplan–Meier curves for OS and PFS from the CheckMate 025 trial with predictions of survival extrapolated over the time horizon of the analysis. The proportion of patients in each health state at time t was estimated from the areas under the PFS and OS curves.

The PPS Markov and PPS–PFS Markov models use additional information on TTP and PPS from the trial to derive time-dependent transition probabilities (TPs). Three transitions are possible: from PF to PD (TP1), from PF to death (TP2), and from PD to death (TP3). The PPS Markov and PPS–PFS Markov use the same method for calculating TP1 and TP2. The method used to derive TP3 is different, as described later in this section.

TTP data were derived from the clinical trial by censoring PFS on death. Parametric curves were then fitted to the TTP data and extrapolated over a lifetime horizon in order to calculate TP1. Because the shape of the PFS curve reflects both death and PD, differences between PFS and TTP were used to calculate TP2. Changes in the shape of the PFS and TTP curves over time reflect the changing hazards, meaning that TP1 and TP2 are both dependent on the time the patient has spent in the PFS health state. The equations for TP1, TP2, and TP3 to calculate the transition probabilities from one cycle to the next are described as follows, with TP1 and TP2 used as an example below

$$ {\text{TP}}_{1} \left( {t,t - u} \right) = 1 - \frac{{S^{\text{TTP}} \left( t \right)}}{{S^{\text{TTP}} \left( {t - u} \right)}} $$
$$ \text{TP}_{2} = 1 - \text{TP}_{1} - \frac{S(t)}{S(t - u)} $$
$$ {\text{where}},\;{\text{in}}\;{\text{general}},\;{\text{for}}\;u > 0 $$
$$ \begin{aligned} {\text{TP}}\left( {t,t - u} \right) & = \Pr \left( {T \le t\left| {T > t - u} \right.} \right) \\ & = 1 - \Pr \left( {T > t\left| {T > t - u} \right.} \right) \\ & = 1 - \frac{{\Pr \left( {T > t} \right)}}{{\Pr \left( {T > t - u} \right)}} = 1 - \frac{(S(t))}{(S(t - u)} \\ \end{aligned} $$

where STTP is the predicted survival at time t from the TTP parametric curve; SPFS is the predicted survival at time t from the PFS parametric curve; TP (t, t − u) is the transition probability at time t for cycle length u; and Pr is the probability.

PPS was extrapolated by fitting a parametric curve to the PPS Kaplan–Meier curve from the CheckMate 025 trial (Fig. 3). This curve was then used to calculate TP3 for the PPS Markov model. Changes in the hazard of the PPS curve over time mean that TP3 is a function of the time spent in the PPS health state, and hence tunnel states are used to track patients entering the PD health state. A limitation associated with the use of a PPS curve is that data from the patients who are censored before progression or death do not contribute to the estimation of PPS transition probabilities. Data from these patients may still inform the PSM as they are still included in the likelihood function. A further limitation of this method is the application of a single Kaplan–Meier curve for PPS, which assumes the same survival trajectory for all patients entering the PD state irrespective of the time spent in the PF state. TP3 is therefore independent of the time spent in PF, and this assumption is problematic if there is a correlation between PFS and PPS. A positive relationship between PFS and PPS in individual metastatic RCC patients on first-line treatment has previously been shown [8].

Fig. 3
figure 3

Post-progression survival curves from the CheckMate 025 trial (14 months’ follow-up) (a), including for patients who progressed in < 8 weeks (b), between 8 and 24 weeks (c), and after 24 weeks (d)

In order to account for possible differences in survival between progressed patients based on time spent in the PF state, three different Kaplan–Meier PPS curves were generated from the CheckMate 025 trial data for the PPS–PFS Markov model (Fig. 3). Patients were divided into three subgroups (tertiles) dependent on the time at which they progressed (the resulting PFS intervals are based on the selected percentiles). PPS curves were generated for patients who progressed within 8 weeks of randomization (< 33.3rd percentile: n = 224), patients who progressed between 8 and 24 weeks of randomization (33.3rd–66.6th percentile, n = 195), and beyond 24 weeks of randomization (> 66.6th percentile, n = 198). The initial 8-week timepoint was chosen to correspond with the first tumor assessment point in CheckMate 025 and the portioning of data into the aforementioned periods gave roughly equal sample sizes in each group. The discrepancy in patient numbers in each tertile arose due to the fact that progression was not a continuous measure, but rather was measured at discrete 4-week intervals in CheckMate 025. Each PPS curve was extrapolated by fitting a parametric curve to the Kaplan–Meier data. These curves were then used to derive transition TP3 for the PPS–PFS Markov. On entering the PD health state, tunnel states are used to track patients, and the model then assigns different transition probabilities based on PPS estimates conditional on TTP. In the PPS–PFS Markov, transition probabilities are dependent on the time spent in the PF state and the time after PD.

2.3 Data Analysis

For the initial analysis parametric curves were fitted to the OS, PFS, TTP, and PPS Kaplan–Meier data from CheckMate 025 (14-month analysis, with minimum follow-up period of 14 months) and extrapolated over a lifetime horizon. For the base-case analysis, the best-fitting models were chosen based on Akaike’s information criterion (AIC), Bayesian information criterion (BIC), and visual inspection. For OS for everolimus, the three best-fitting distributions according to AIC and BIC criteria (log-normal, log-logistic, and generalized gamma) provided OS curves that plateaued after 3 years, meaning that OS in the everolimus arm was higher than in the nivolumab arm. This is not likely to be clinically plausible, given the OS curves for nivolumab demonstrate higher OS throughout the CheckMate 025 study period. The fourth best-fitting distribution (gamma) was therefore selected.

In addition, a number of scenario analyses were run in order to determine how changes to the parametric curve choices affected the results. The scenarios included exponential distributions fitted to all endpoints, Weibull distributions fitted to all endpoints, generalized gamma distributions, gamma distributions fitted to all endpoints, and log-normal distributions fitted to all endpoints. These distributions were chosen as they provided the best fit to the CheckMate 025 trial data for at least one endpoint.

Additional trial data have since become available with ~ 38 months’ minimum follow-up. These data have been used to evaluate the long-term survival predictions of the analysis. The parametric survival modeling was conducted using the flexsurv package in the statistical software, R.3.3.0 [9]. Table 1 shows the final functional choices for the trial data in the base case and for the scenario analyses.

Table 1 Parametric distributions used to model each curve for the CheckMate 025 data

3 Results

The OS estimates produced by each model were initially compared in terms of goodness of fit to the CheckMate 025 trial data (Table 2). All of the models provide a reasonable fit to the observed data at 1 and 2 years, particularly for nivolumab, and this is evident from visual inspection of the respective survival curves (Fig. 3). Three-year OS rates from a later cut of the CheckMate 025 data were used to evaluate the long-term estimates of each model structure. The 3-year OS rate for the nivolumab arm of CheckMate 025 with 38 months’ follow-up was 39.1%. The PPS–PFS Markov was able to predict this most closely at 36.2%, followed by the PSM at 33.1%. The PPS Markov under-predicted the 3-year OS at 27.6% (Table 2). The 3-year OS rate in the everolimus arm from the CheckMate 025 data with 38 months’ follow-up was 29.5%. The PPS–PFS Markov was able to most closely predict this at 26.5%, closely followed by the PSM at 25.6%. The PPS Markov under-predicted the 3-year OS at 19.2% (Table 2). When comparing the results from the 14-month and 38-month analyses from the everolimus arm, it is important to consider that 47 patients in the trial crossed over from everolimus to nivolumab by 2 years, and this could have implications for the survival observed in the everolimus arm. At 3, 4, and 5 years there are meaningful differences in predictions of survival between the models (Figs. 4, 5). The differences between the PPS–PFS Markov and PPS Markov are the most marked, and the PPS Markov consistently produces the lowest survival estimates for both nivolumab and everolimus. Additional assessment of the nivolumab arms of each model was conducted using the conditional survival from 1 to 5 years for patients who received nivolumab in the CheckMate 003 trial (Table 2) [7]. Estimates of 5-year conditional survival years for the PSM, PPS Markov, and PPS–PFS Markov are 18.0%, 9.9%, and 22.1%, respectively, for nivolumab, versus 48.6% in the CheckMate 003 trial. The PPS–PFS Markov produces estimates that are closest to the CheckMate 003 cohort [7], although the discrepancy is large. The discrepancy may be due to different baseline characteristics of patients entering CheckMate 025 and CheckMate 003 and to dosing differences between the two trials. The 5-year conditional survival estimates for everolimus patients, predicted by the PSM, PPS Markov, and PPS–PFS Markov model structures were comparable to those predicted by each model structure in the nivolumab arm at 14.9%, 8.4%, 20.7%, respectively (Table 2).

Table 2 Comparison of nivolumab and everolimus overall survival as predicted by each model type with data from CheckMate 025 and CheckMate 003
Fig. 4
figure 4

Extrapolations of overall survival for nivolumab (a) and everolimus (b) over a lifetime horizon. Kaplan–Meier data from the CheckMate 025 trial with 14 months’ follow-up. PFS progression-free survival, PPS post-progression survival, PSM partitioned survival model

Fig. 5
figure 5

Markov traces for nivolumab (a) and everolimus (b) over a 10-year horizon. PD progressed disease, PF progression-free, PFS progression-free survival, PPS post-progression survival, PSM partitioned survival model

From the viewpoint of modeling cost effectiveness, the most relevant comparison between the different model structures is in estimates of incremental mean survival over the modeling horizon because this drives differences in quality-adjusted life-years (QALYs). Estimates of the difference in mean survival between nivolumab and everolimus over 20 years are 6.62 months (PSM), 7.6 months (PPS Markov), and 7.4 months (PPS–PFS Markov). The latter two reflect non-trivial differences of + 14% and + 11%, respectively, compared with PSM.

In order to determine how changes to the parametric curve choice might affect the results, a scenario was run with exponential distributions fitted to all endpoints. Although the exponential curve was not the best choice for goodness of fit for most endpoints according to AIC/BIC criteria, each model still provided reasonable fit to the trial data for the nivolumab arm. A reduction in goodness of fit was observed for the everolimus arm. The dataset with 38 months’ follow-up was used to assess the long-term survival predictions of each model. Both the PSM and the PPS–PFS Markov provided a very close fit to the 3-year OS rate for nivolumab from CheckMate 025 at 3 years (39.1%), with estimates of 38.4% and 40.6%, respectively (Table 2). The PPS Markov under-predicted the 3-year OS rate, with an estimate of 32.2%. The PSM provided the closest fit to the 3-year OS rate for everolimus from CheckMate 025 (29.5%), with an estimate of 28.0%. The PPS–PFS Markov and PPS Markov estimated 3-year OS rates of 25.1% and 18.2%, respectively (Table 2). The incremental OS between nivolumab and everolimus estimated by the PSM, PPS Markov, and PPS–PFS Markov was 9.5, 8.9, and 16.7 months, respectively. This reflects non-trivial differences of − 6.3% and 75.8% between the latter two models and the PSM. Additional scenarios were run using Weibull, generalized gamma, and log-normal curves fit to all endpoints. As expected, the results of these scenarios demonstrate that the outcomes predicted by each model are sensitive to the choice in parametric curves, which reiterates the importance of validating each choice (Table 2).

4 Discussion

PSM and Markov decision-analytic models were developed to compare the differences in estimated survival outcomes between these modeling approaches. Two different STM analytic approaches were investigated, one in which PPS transition probabilities are independent of time spent in the PF state, and the other in which PPS transition probabilities were dependent on time spent in the PF state, based on a Kaplan–Meier analysis to estimate PPS for each of the three PFS groups (PFS ≤ 8 weeks, PFS > 8 and ≤ 24 weeks, and PFS > 24 weeks). The three approaches were easily implementable in Microsoft Excel®, making them accessible approaches for most analysts and appropriate for most HTA submission processes across the world.

The main conclusion from this analysis is that differences in the assumptions underlying the choice of model structure are likely to produce different estimates of QALY gains, which could have a significant impact on estimates of cost effectiveness. The difference in the estimated longer-term OS probabilities of each model was significantly decreased when a sensitivity analysis was conducted using more mature data, which highlights the additional uncertainty associated with the use of datasets with a short minimum follow-up. When using the earlier 14-month data cut, the PPS–PFS Markov was able to predict most closely the observed 3-year data for nivolumab. This may have implications in a HTA landscape where drug manufacturers are increasingly seeking reimbursement of new drugs based on early data cuts from randomized clinical trials.

There is evidence to suggest that some patients in the CheckMate 025 trial may have experienced pseudo-progression. It is worth noting that pseudo-progression has implications for any economic model that includes progression as a key determinant of costs and/or outcomes. A possible method to overcome the issue of pseudo-progression is to use time to treatment discontinuation curves from clinical trials, which may be a better indicator of actual progression, in place of PFS curves [10]. The specific relationship between the estimates produced by different models in this example is not necessarily generalizable to any other type of cancer or disease, and there is no independent way in our example of deciding which approach is most appropriate. This will be a topic of future research. The most that we can say is that the choice of model structure should be based on an explicit consideration of the underlying relationships between progression and mortality risk, which are specific to the disease, and to the expected mode of action of different treatments. Goodness of fit with short-term survival data from a clinical trial is not necessarily an appropriate criterion on its own for model selection.

Our results are consistent with those of other similar studies. The work by McEwan et al. [11] compared observed trial-based survival outcomes for patients with small-cell lung cancer with estimates produced by a PSM, and Markov models with time-independent, time-dependent, and time-and-treatment-dependent (TTD) transitions. As expected, the PSM produced the closest estimates to the observed (within clinical trial) survival data. The PSM uses survival curves directly fitted to the PFS and OS data, whereas the Markov models estimate OS indirectly using a combination of PFS, TTP, and PPS curves. The three Markov variants produced different estimates, but the more flexible TTD structure produced the closest fit. Differences in survival estimates are most likely to be evident in extrapolating trial outcomes. Goeree et al. [12] compared incremental cost-effectiveness ratio (ICER) estimates generated by a PSM and a standard Markov model for patients with non-small-cell lung cancer and found that the two approaches produced very similar results. However, in this analysis, PPS was not derived directly from the trial, but calculated indirectly using OS and TTP data. Briggs et al. [13] compared estimates of OS and PFS produced by a PSM, a standard Markov model, and a SMM with time-dependent transitions using data from the CheckMate 003 trial. The PSM and standard Markov produced different estimates, but estimates produced by the PSM and semi-Markov approaches were similar, although the semi-Markov did not show as good a fit to the short-term trial data. Coyle and Coyle [14] used a simulated dataset to compare ICERs generated by a three-state PSM and a traditional Markov cohort model. Few details are available on the structure of the models, but the authors conclude that an analysis based on a PSM approach has an inherent bias in favor of interventions that impact disease progression rather than mortality within health states, and suggest that this approach should not be used as a basis for reimbursement decisions. Williams et al. [15] compared extrapolated QALY gains and ICERs generated by a PSM and an STM in which transition probabilities were estimated by a multistate regression model. The authors concluded that different assumptions underlying the two approaches gave rise to a difference in ICERs (£16,000 with PSM; £29,000 with STM) that could affect decisions about cost effectiveness.

There are a number of limitations associated with this analysis. The PPS–PFS Markov involves dividing the patients into subgroups to measure PPS. This requires time to event analysis to be conducted on groups with a small sample size and who are non-randomized at the point of progression. Also, while the selection of cut-off points for time in PFS were chosen to give approximately equal numbered groups of patients, following the methodology of a previous study in metastatic RCC [8], selection of any cut-off point by an analyst may be considered arbitrary. Additional statistical tests, such as the Chow test, clinical validation, and sensitivity analysis to determine the effect of different cut-offs on model outcomes should be conducted to select appropriate timepoints at which to divide PFS patients.

There have been a number of recent research publications exploring additional survival extrapolation techniques for predicting long-term outcomes associated with immunotherapy such as the use of spline models and cure fraction models [16,17,18]. These were outside the scope of the analysis; the survival extrapolation methods used in this paper were restricted to the use of standard parametric models commonly used in HTA submissions. However, it would be possible to include additional survival extrapolation methods, such as spline models, within these model structures, and this is another area for future research.

Use of a complex modelling approach such as a cure model, a multistate regression model or patient simulation model may overcome some of the key limitations described in this analysis, such as more fully capturing the heterogeneity of the patient population in terms of their survival. These modelling approaches were outside the scope of this study and should be explored in future research. However, until predictions of long-term survival from models are fully validated against long-term data it is not clear at this stage whether adding complexity to modelling short-term (uncertain) data is advantageous when trying to estimate long-term survival versus simpler or more standard modelling approaches.

5 Conclusion

The evidence from our study and previous work highlights the importance of the assumptions underlying any model structure, and the need to validate assumptions regarding survival and the application of treatment effects against the known disease characteristics.