FormalPara Key Points for Decision Makers

Many new treatments in a range of disease areas offer the potential for cure in some patients.

Cure models are not commonly used to inform health technology assessment decision-making.

We describe the key characteristics of mixture and non-mixture cure models and explain how they should be used by analysts and interpreted by reviewers and decision-makers.

1 Introduction

It is common for parametric survival models to be used to estimate the long-term survival benefits associated with new health technologies [1, 2]. New technologies are typically evaluated using randomised controlled trials (RCTs), but these have limited follow-up—often many participants remain alive at the end of the trial. It is therefore necessary to extrapolate to obtain complete estimates of survival benefits. This is required for health technology assessment (HTA), where lifetime costs and benefits are compared to inform healthcare resource allocation decisions [3]. Traditionally, a suite of ‘standard’ parametric survival models has been used to perform the extrapolation task [2, 4, 5], but with the development of new immuno-oncology and chimeric antigen receptor (CAR) T-cell therapies that appear to cure some patients with cancer [6,7,8,9,10,11,12,13,14], the use of cure models to inform HTA has attracted increasing attention [8, 9, 11, 13, 15].

There is logical rationale for the use of cure models to estimate long-term survival for treatments that may offer a cure, but these models are not commonly used to inform HTA decision-making [16]. To some extent, this lack of use may be due to a lack of understanding of how cure models work, what they assume and how reliable they are. In this tutorial we describe mixture and non-mixture cure models, focusing on key aspects that we believe analysts, reviewers and decision-makers should be aware of when considering these models.

In Sect. 2 we provide a rationale for why cure models may be useful for HTA, as well as a brief summary of applied and methodological research on their use in an HTA context. In Sect. 3 we explain the differences between mixture and non-mixture cure models, describe how cure fractions and cure timepoints should be interpreted, explain what different cure models assume about survival in cured and uncured patients, and discuss model specifications and the use of standardised mortality ratios (SMRs) and lifetables. In Sects. 4 and 5 we provide a demonstration of the application of mixture and non-mixture cure models in a range of scenarios based on colon cancer data. In Sect. 6 we discuss scenarios in which different types of cure models may or may not be appropriate, highlighting key issues that must be taken into account by analysts when fitting these models, and by reviewers and decision-makers when interpreting their predictions. We provide code that allow the various models to be applied.

2 Cure Models in HTA

2.1 ‘Standard’ Survival Modelling and Rationale for Cure Models

For many years it has been common to use a range of ‘standard’ parametric survival models to extrapolate survival data for HTA purposes, largely because these were the focus of the first National Institute for Health and Care Excellence (NICE) Decision Support Unit (DSU) technical support document on survival analysis (TSD 14), published in 2011 [4]. However, it is well known that these models have important limitations—in particular, exponential, Weibull, Gompertz and Gamma models cannot cope with any turning points in the hazard function over time (that is, the rate at which the event of interest occurs over time), and log-logistic, log normal and Generalised Gamma models can only cope with one turning point. In some circumstances, such as when a treatment cures a proportion of patients, this can mean that these standard parametric models cannot adequately represent the hazard function.

Clinical trials usually involve strict eligibility criteria, which means that participants are unlikely to die quickly [17]. However, trials often investigate treatments for serious diseases with substantial mortality risks. Therefore, the hazard of death in an RCT may initially be low, but will rise in the short-term. Participants with the worst prognosis are likely to die first, changing the prognostic mix of those remaining in follow-up. This may result in a turning point in the hazard function, with the hazard of death reducing in the medium term. In the long term, hazards are likely to continue to fall and may even drop to levels expected in the general population—in which case, remaining patients may be considered to be cured. In these patients, hazards will eventually rise again, due to age-related mortality risks. In this scenario, the hazard function would have two turning points, which none of the standard parametric models could accurately reflect.

This issue was the motivation for a second DSU technical support document on survival analysis, published in 2020 (TSD 21) [12]. TSD 21 describes flexible models that are able to represent complex hazard functions, including flexible parametric spline-based models, mixture models, landmark models, piecewise models, relative survival models and cure models.

2.2 Cure Models – the Answer?

Cure models are not commonly used to inform HTA decision-making. A review published in 2020 examined 26 NICE appraisals of immuno-oncological interventions and found that cure models were included in 8, and were considered appropriate to inform decision-making in only 3 [16]. More recently, a review of HTA reports from eight major jurisdictions for cell and gene therapies found that cure models were increasingly being used, but exact numbers were not presented [18].

The infrequent use of cure models in HTA may reflect concerns around the accuracy of their extrapolations. Simulation studies have shown that cure models can substantially over-estimate survival benefits when there is in fact not a cure, or when trial follow-up is very short [12, 15, 16]. Othus et al. analysed 6 leukaemia trials, fitting mixture cure models to initial data-cuts used in regulatory submissions and then fitting the same models to subsequent data-cuts with 3–10 years more follow-up [19]. In 4 of the 6 trials, estimates of extrapolated survival were higher based on models fitted to the original, short-term data-cut.

However, studies have also shown the potential value of cure models. Bullement et al. revisited a NICE appraisal of ipilimumab for advanced melanoma, comparing models fitted to the 3-year data-cut available at the time of the appraisal with survival subsequently observed in a later data-cut [9]. Cure models provided more accurate extrapolations than non-cure models. Similarly, Vadgama et al. compared survival predictions made by models fitted to 1-year data from the ZUMA-1 trial of axicabtagene ciloleucel for diffuse large B-cell lymphoma with survival observed in the 4-year data-cut [20]. Cure models provided accurate extrapolations, but standard and spline-based parametric models did not. Other similar studies exist [8, 10, 21], and simulation studies have shown that cure models can extrapolate well when a cure exists and follow-up is relatively mature [15, 16].

2.3 Cure Models – Unanswered Questions

Two tutorial papers on the use of cure modelling in the context of HTA have been published [11, 13]. These focus on a subset of cure models – mixture cure models (MCMs). TSD 21 highlights that there are two types of cure model – MCMs and non-mixture cure models (NMCs) [12]. Each have advantages and disadvantages which may make them more or less appropriate for extrapolating from the relatively short-term, relatively small clinical trial datasets typically seen in HTA. In this tutorial, we consider both types of cure model, with the aim of providing information on how these models can be used in different situations, and how they should be interpreted.

3 Methods

In the context of evaluating overall survival, which is the focus of this tutorial, all cure models estimate the cure fraction (the proportion of patients who will not die from their disease), and survival for uncured patients. If cure models are used to analyse different endpoints, definitions change. For instance, if used to analyse progression-free survival, cure models estimate the proportion of patients who will not die or experience disease progression, and progression-free survival for uncured patients. Irrespective of the endpoint being analysed, MCMs and NMCs estimate cure and survival in different ways, and can be applied using different frameworks, with implications for the definition of the cure fraction. In this Section we describe frameworks for fitting cure models, key characteristics of MCMs and NMCs, model specification and the use of lifetables and SMRs.

3.1 Frameworks for Cure Models

It is important to determine what is meant when a cure model is used, and this depends on the framework in which the model is fitted. In the context of healthcare interventions that prevent people from dying from the disease the treatment is for, it is logical to consider cure as occurring when the all-cause hazard function for death, \({h}_{i}\left(t\right),\) converges with the general population hazard function, \({h}_{i}^{*}(t)\). In this context, it is typical to fit cure models in an ‘excess mortality’ framework, also known as a ‘relative survival’ framework; where we model the difference between the hazard function observed in the trial and the general population hazard (i.e. the excess hazard that is associated with the disease of interest, \({\lambda }_{i}\left(t\right)\)). That is, we partition the total all-cause hazard as follows:

$${h}_{i}\left(t\right)={h}_{i}^{*}\left(t\right)+{\lambda }_{i}\left(t\right).$$

We can transform from the hazard to the survival scale, and rearrange the above equation to show that relative survival [\({R}_{i}\left(t\right)\)] is the ratio of the all-cause survival \({S}_{i}\left(t\right)\) and the expected survival in the background population \({S}_{i}^{*}(t)\):

$${R}_{i}\left(t\right)=\frac{{S}_{i}\left(t\right)}{{S}_{i}^{*}\left(t\right)}.$$

Using this framework, cure models are fitted to the relative survival function, which will plateau if and when the all-cause hazard approaches general population mortality rates, and the excess hazard function approaches 0. All-cause survival estimates are derived by multiplying the estimated relative survival function by the expected survival function for the background population. Taking this approach means that we directly use general population mortality rates when fitting models to the trial data, and allow these rates to govern the long-term survival function. If we expect that the disease will have a lasting effect on survival, even in cured patients, SMRs can be applied to general population mortality rates (see Sect. 3.5), and the relative survival function will plateau if and when the all-cause hazard approaches general population mortality rates with the SMR applied.

It is possible to instead fit cure models using different frameworks. A ‘disease-specific’ framework could be used, where models are fitted to disease-specific survival functions (using cause of death information), or an ‘all-cause’ framework could be used, where models are fitted to all-cause survival functions. Under either approach, estimates from these models would need to be combined with general population mortality rates in the long term. For models fit to disease-specific survival functions, this could be done using data on other-cause deaths for the trial period, and then from lifetables beyond the trial period. For models fit to all-cause survival functions it is more complex – in particular, decisions have to be made about when to build in background mortality rates: if this is done from time 0, this would double count early events; if it is done at a later timepoint questions would be asked about how the timepoint was chosen.

Because the relative survival approach is intuitive, does not need information on cause of death and avoids the need to make assumptions around when to begin incorporating general population mortality rates, we focus the rest of this tutorial on cure models fitted in a relative survival framework. This has implications for the definition of the cure fraction: in a relative survival framework, because we directly model the relative survival function, the cure fraction corresponds to the proportion of patients alive in a world where patients can only die from the disease of interest. Therefore, while the cure fraction does represent the proportion estimated not to die from their disease, it will always be higher than the all-cause survival function at the cure timepoint (the point at which excess disease-related hazards fall to zero), because some people will have died of other causes before this timepoint.

3.2 Mixture Cure Models (MCMs)

MCMs assume that there are two groups of individuals – those who are cured of their disease and those who are not [22,23,24]. When fitted in a relative survival framework, general population mortality rates are incorporated directly into the model and the model uses these, combined with the parametric distribution chosen to represent the uncured patients, to estimate the cure fraction. General population mortality rates are taken from relevant lifetables, with rates from the appropriate calendar year used, and these are further stratified by characteristics such as age and sex, so that each trial participant can be assigned an expected background mortality rate. MCMs can be fitted using standard software packages, such as strsmix in Stata [25], and flexsurv and cuRe in R [26, 27].

3.2.1 When Does Cure Occur?

Strictly speaking, MCMs assume that at the study baseline there is a group of patients who experience no excess mortality compared with the general population – that is, ‘cured’ patients are ‘cured’ at baseline [25]. This makes interpretation awkward, but it is also useful to consider MCMs with respect to the timepoint at which no uncured patients remain – representing the timepoint after which all remaining patients are assumed to be cured. MCMs place no constraint on this timepoint – it could occur early in the trial if the difference between hazards observed in the trial and hazards in the general population disappears quickly, or it could occur much later or not at all: sometimes MCMs will predict a 0% cure fraction. It is also important to note that the assignment of cure in an MCM is probabilistic rather than deterministic – individuals are not segregated into cured and uncured groups, they are assigned a probability of being cured and the cure fraction is estimated at the population level.

3.2.2 What is Assumed for Uncured Patients?

MCMs can be fitted using a range of ‘standard’ parametric models – for example, Weibull, log normal, log-logistic, etc., to represent survival for the uncured group of patients. Hence, it is important to consider which models are appropriate for uncured patients – for example, is it likely that the hazard function in uncured patients will be monotonically increasing or decreasing, or will have turning points? Formulations of MCMs using flexible parametric models have been developed, but are seldom used in practice [28].

3.2.3 What About the Cure Fraction?

The choice of parametric distribution used within an MCM to represent uncured patients can have an important impact on the cure fraction estimated by the model. For example, if a log-logistic MCM is used, the survival distribution for uncured patients is likely to have a decreasing hazard in the long term, and the estimated cure fraction may be low because the distribution used for the uncured group is able to represent a reducing hazard and long-term survivors. In contrast, if a Weibull MCM is used, the survival distribution for uncured patients may have an increasing hazard, and the estimated cure fraction may be high, because the distribution used for the uncured group is unable to represent long-term survivors. The two models may in fact result in similar survival curves for the cured and uncured populations combined, but these survival curves would be based on very different assumptions about survival in uncured patients, and would therefore be associated with very different cure fractions.

It should not necessarily be a concern if different MCMs give very different estimates of cure fractions: this is a function of the parametric distribution chosen for uncured patients. Instead, the focus should be on selecting appropriate distributions for the uncured patient group.

3.3 Non-Mixture Cure Models (NMCs)

The key difference between MCMs and NMCs is that NMCs do not split the population into cured and uncured groups directly, although the cure fraction and the survival of the uncured can still be estimated from these models [24, 25, 29, 30]. NMCs can be fitted using standard parametric or flexible parametric distributions. As for MCMs, when fitted in a relative survival framework, general population mortality rates are incorporated directly into the model and should be taken from relevant lifetable sources, using appropriate calendar years and stratifying for key characteristics such as age and sex. NMCs can be fitted using standard software packages such as strsnmix and stpm2 in Stata [25, 31], and flexsurv, cuRe and rstpm2 in R [26, 27].

3.3.1 When Does Cure Occur?

Unlike MCMs, NMCs do not assume that there is a group of patients who are ‘cured’ at baseline. The timepoint at which cure occurs depends on when the modelled hazards converge with those observed in the general population. When fitted using standard parametric models, there is no constraint on when this convergence will occur and typically the estimates of MCMs and NMCs using the same parametric form will be similar to one another. When NMCs are applied using flexible parametric models, the analyst can specify the point at which hazards meet background population levels – these have also been referred to as latent cure models.[30]

3.3.2 What is Assumed for Uncured Patients?

NMCs do not split patients into cured and uncured groups, so there is not a survival model specific to uncured patients. The parametric distribution used within the NMC must be sufficiently flexible to model the survival experience of the cohort as it approaches the cure fraction.

Fitting NMCs using flexible parametric models allows for a complex hazard function to be captured prior to the cure timepoint, and provides the analyst with an additional tool to control when the cure timepoint will occur, dictated by the placement of a ‘boundary knot’ [31]. This may be useful if external data or clinical expert opinion allows for the cure timepoint to be estimated with some level of confidence. Care must be taken with this – specifying a boundary knot at (say) 5 years may mean that hazards begin to decrease rapidly much earlier, allowing gradual convergence at 5 years. To protect against an unrealistically early sharp decrease in the hazards, it may be necessary to set the boundary knot further into the future. Research has been undertaken to investigate the estimation of cure timepoints [32, 33], and this may help inform boundary knots used within flexible parametric NMCs.

3.3.3 What About the Cure Fraction?

The cure fractions associated with NMCs fitted with different parametric distributions are likely to differ. As for MCMs, this is to be expected. The focus should be on selecting a distribution that is likely to be adequate for representing the hazard function prior to the cure timepoint.

3.4 Model Specifications

MCMs and NMCs can be fitted to each treatment group independently, or with treatment group as a covariate. In HTA, it is common to fit independent survival models to each treatment arm, due to concerns around assuming that the treatment effect (in the form of a hazard ratio for proportional hazards models, or a time ratio for accelerated failure time models) is constant over time [2, 5]. In a MCM and NMC setting, the considerations are similar. If dependent models are used, it is possible to allow a treatment effect on the cure proportion parameter, and also on the parameters that define the survival function for the uncured [24, 25]. When choosing between dependent and independent cure models, it is important to consider the validity of assumptions around the treatment effect enforced by using dependent models, to inspect modelled estimates of hazards and survival compared with the observed data, and to consider the validity of long-term extrapolations. Independently fitted models do not enforce assumptions around the treatment effect, but the treatment effect implied by the models should still be assessed, as recommended by TSD 21 [12].

It is rare to include baseline covariates in survival models used for HTA. However, this may be more relevant for cure models because of their use of lifetable data, stratified by age and sex. Over time, if older patients are more likely to die from their cancer, the relative age mix in the remaining trial population will change, and the conditional hazard function at later timepoints will be based on younger people than would have been the case if general population mortality were the only cause of death. Therefore, if disease-related deaths are likely to be associated with age (or sex), cure models that include these as baseline variables should be superior to models that exclude them. If covariates are included in the model, it will often be necessary to obtain the marginal all-cause survival function for the cohort as a whole, especially in a HTA setting where typically marginal rather than conditional survival functions are required. This can be achieved through regression standardisation and re-incorporation of general population mortality rates. Accounting for changes in age and sex distributions over time is relatively rare in survival models used for HTA – however, this can be important and more details on standardisation are available from Lambert et al. [34].

3.5 Standardised Mortality Ratios (SMRs)

SMRs relate to whether cancer survivors are at a higher risk of death than the age- and sex-matched general population, separate from their disease-related mortality risk, perhaps due to co-morbidities or lasting ill-effects of intensive treatment. Various studies have reported SMRs relevant for different groups of patients with cancer, ranging from values of 6 or higher [35, 36] to values close to 1 [37].

When cure models have been used in NICE appraisals, it has been common for SMRs to be applied. Recent appraisals in ovarian cancer and gastric cancer tested SMRs ranging between 1.4 and 1.8, though these appeared to be based primarily on assumption and clinical opinion, rather than data [38,39,40,41].

SMRs can be particularly important when the cure fraction is large. It is important to assess hazards observed during trial periods to see whether they approach background levels, but if the cure timepoint has not been reached during the study, this will not provide information on whether applying an SMR > 1 would be appropriate. It may be helpful to analyse relevant registry datasets that contain long-term survivors with the same disease to investigate whether those patients exhibit mortality rates that are similar to, or higher than, background population levels.

When SMRs are applied, they should be applied to the general population mortality rates in the lifetables being used, ensuring that the adjusted rates are incorporated directly into the cure models fitted. This is done in the same way, irrespective of whether MCMs or NMCs are being used.

3.6 Lifetables

Cure models require the use of general population lifetables, and therefore the source of lifetable data must be chosen. Lifetables are available for different countries (and sometimes regions) and are usually split by sex, calendar year and age. In the context of fitting a cure model to data from an international clinical trial, it could be argued that it is most appropriate to use different lifetables for each patient according to their country (or region) of residence, their age and sex and the year that they entered the trial. Alternatively, it could be argued that it is more relevant to use lifetables for the country the analysis is designed for (still stratified for age, sex and calendar year).

We suggest that this choice should be dictated by the specified purpose of the analysis – is the aim purely to project survival for trial participants, or is it to project survival for a different population (i.e. the population that a decision is being made for)? In practice, as shown by TSD 21, the use of ‘incorrect’ lifetables is unlikely to have a large impact on survival predictions [12], although this might not be the case if the cure fraction is large and if alternative lifetables have large differences in life expectancy.

When cure models are used, they should ideally be fitted to patient-level data. This allows for the distribution of age, sex and calendar year to be accounted for in the model, as well as how the age and sex mix in the surviving population changes over time. If patient-level data are not available, published survival curves can be digitised to reconstruct the data using commonly used methods [42], but the reconstructed dataset will not include information on age, sex and calendar year of recruitment for each patient. Assumptions are then required to assign values for these variables for each patient (likely based on published means and distributions), which make resulting model predictions prone to additional error, though these are likely to be small.

4 Demonstrating the Application of Cure Models

4.1 Data

To illustrate the application of a range of MCMs and NMCs, we use data on 15,564 people diagnosed with colon cancer in a North European country between 1975 and 1994, with follow-up until 1994. Variables include age, sex, diagnosis date, clinical stage at diagnosis (localised, regional, distant), survival status and time. A total of 70% died during the 20 years of follow-up. We sampled from the complete dataset to generate datasets more similar to those collected in RCTs, representing a ‘medium cure’ scenario (Scenario 1), a ‘low cure’ scenario (Scenario 2) and a ‘no cure’ scenario (Scenario 3):

  1. (i)

    Scenario 1: 440 patients randomly selected from those with regional disease. Approximately 28% of these patients were alive at 10 years.

  2. (ii)

    Scenario 2: 391 patients randomly selected from those with distant disease. Approximately 5% were alive at 10 years.

  3. (iii)

    Scenario 3: 481 patients randomly selected from those with distant disease who did not live beyond 8 years.

Summary characteristics of the datasets generated for each of these scenarios are presented in the supplementary materials.

To replicate circumstances whereby models are used to extrapolate from immature RCT datasets, we conducted analyses with data artificially censored at 24 months and 48 months for each scenario. Kaplan–Meier plots for Scenarios 1–3 are presented in Fig. 1.

Fig. 1
figure 1

Kaplan–Meier survival plots

Dataset files and lifetable data are provided in supplementary materials. These are also available from the strs package in Stata [43], and the cuRe package in R [26].

4.2 Models Applied

We fitted the following models to the 24- and 48-month data for each scenario:

  • MCMs

    1. o

      Log normal distribution

    2. o

      Weibull distribution

    3. o

      With and without including age as a baseline covariate

  • Flexible parametric NMCs

    1. o

      With a 5-year boundary knot

    2. o

      With a 15-year boundary knot

    3. o

      With 2 and 4 interior knots placed at default centiles of the distribution of the uncensored log survival times (50% and 95% for 2 knots; 25%, 50%, 75% and 95% for 4 knots)

    4. o

      With and without including age as a baseline covariate

For models that included age as a baseline covariate we used standardisation to obtain marginal all-cause survival curves. Appropriate lifetable data were used to incorporate background hazards, matched by age, sex and calendar year. We conducted analyses with no adjustment to the background hazard rates (i.e. SMR = 1) and repeated this with an SMR of 2.5, where we multiplied the lifetable hazards used in the models by 2.5.

We recorded model estimates of restricted mean survival time (RMST) at 10 and 20 years, survival proportions at 5, 10 and 20 years and constructed plots of estimated hazard and survival functions. We compared these estimates with observed outcomes (without artificial censoring), and assessed whether model estimates lay within 95% confidence intervals (CI) of observed values. However, the aim of our analyses was not to provide a definitive evaluation of the performance of the models, rather, it was simply to demonstrate their application.

In the supplementary materials we provide Stata code for conducting the analyses. We fitted MCMs using strsmix combined with stexpect3 [25, 44], and fitted NMCs using stpm2 combined with standsurv [45, 46].

5 Results

Results for Scenarios 1–3 are presented in Tables 1, 2, 3 and Figs. 2, 3, 4. Results for models that used an SMR of 2.5 are presented in the supplementary materials. For ease of interpretation, in Figs. 2, 3, 4 we do not include all models. In general, including age as a baseline covariate and the number of internal knots included in NMCs made little difference to model estimates, so we include plots only for models that included age, and for NMCs with 2 internal knots. The only exception to this is for Scenario 3, where MCMs with age as a baseline covariate did not converge: MCMs that excluded age are presented instead.

Table 1 Scenario 1 (medium cure fraction) model predictions and observed survival, SMR = 1
Table 2 Scenario 2 (low cure fraction) model predictions and observed survival, SMR = 1
Table 3 Scenario 3 (no cure) model predictions and observed survival, SMR = 1
Fig. 2
figure 2

5y bk 5-year boundary knot, 15y bk 15-year boundary knot

Scenario 1 (medium cure fraction) survival and hazard plots.

Fig. 3
figure 3

5y bk 5-year boundary knot, 15y bk 15-year boundary knot

Scenario 2 (low cure fraction) survival and hazard plots.

Fig. 4
figure 4

5y bk 5-year boundary knot, 15y bk 15-year boundary knot

Scenario 3 (no cure) survival and hazard plots.

For each scenario we briefly describe the results, and then explain why the models produced these results.

5.1 Scenario 1 (Medium Cure Fraction)

5.1.1 Model Estimates

When models were fit to 24-month data, only NMCs with a 15-year boundary knot produced estimates that fell within the 95% CIs of the observed survival proportions at 5, 10 and 20 years (Table 1, Fig. 2). MCMs substantially under-estimated long-term survival, and NMCs with a 5-year boundary knot predicted survival curves that flattened too early. Including age as a baseline covariate in the models made little difference to the NMCs, but improved MCM estimates.

Estimated cure fractions ranged considerably. As explained in Sections 3.2 and 3.3, cure fractions should be interpreted with extreme care, and it is likely to be more appropriate to consider long-term all-cause survival predictions. We report both in Table 1 to illustrate this point. Cure fractions ranged from 0–21% with MCMs to 48–62% with NMCs, whilst survival at 10 years ranged from 6–21% with MCMs to 30–39% with NMCs. Observed survival at 10 years was 28% (95% CI 23–34%). RMST estimates at 20 years ranged from 3.7–5.7 years for MCMs, and from 7.1–8.3 years for NMCs, whilst the observed value was 7.1 years (95% CI 6.3–7.9).

When models were fit to the 48-month data, only NMCs with a 5-year boundary knot, and those with a 15-year boundary knot which included age in the model, produced estimates that fell within the 95% CIs of the observed survival proportions at 5, 10 and 20 years. MCMs again substantially under-estimated long-term survival, although their estimates were closer to the observed values than those associated with the 24-month models. Cure fractions ranged from 0–38% with MCMs to 42–53% with NMCs, and survival at 10 years ranged from 18–23% with MCMs to 26–33% with NMCs [observed value: 28% (95% CI 23–34%)]. RMST estimates at 20 years ranged from 5.4–6.1 years for MCMs, to 6.5–7.4 years with NMCs [observed value 7.1 years (95% CI 6.3–7.9)].

5.1.2 Explaining the Results

Figure 2b shows that observed hazards increased until just after 2 years, before beginning to decrease. MCMs fitted to the 2-year data could not identify the turning point in the hazard and estimated low or zero cure fractions, under-estimating survival as a result. In contrast, the NMCs were ‘told’ that hazards will return to background population levels at the boundary knot timepoint, and were therefore forced to predict a turning point in the hazard.

Estimates from the MCMs improved when they were fit to the 48-month data because the turning point in the hazard had occurred by this point. However, the MCMs estimated hazards that decreased too slowly, and long-term survival continued to be under-estimated.

NMCs with a 5-year boundary knot forced survival curves to flatten too quickly, because, in fact, observed hazards reached background levels at 8 years. NMCs with a 15-year boundary knot more closely approximated the decrease in hazards, but slightly underestimated long-term survival. Figure 2d illustrates that observed hazards fell below background hazards between 8 and 15 years, likely due to chance and small sample sizes. Therefore, it is reasonable to conclude that NMCs with 15-year boundary knots provided good predictions when fitted to 24- and 48-month data in this scenario. Notably, cure fractions estimated by these models were substantially higher than the proportion alive at the 10-year timepoint because, as explained in Sect. 3.1, the cure fraction corresponds to the relative survival function, not the all-cause survival function.

Models that included an SMR of 2.5 substantially underestimated long-term survival (Table S2, Fig. S1) because observed hazards fell to background levels: an SMR > 1 was not appropriate.

5.2 Scenario 2 (Low Cure Fraction)

5.2.1 Model Estimates

When fitted to 24-month data, all models except the log normal MCM appeared to estimate too sharp a decrease in the hazard function between 1 and 6 years (Fig. 3b), resulting in over-estimated survival (Table 2, Fig. 3). The log normal MCM provided good estimates of survival at 5 years, but under-estimated survival in the longer term. Cure fractions ranged from 3–16% with MCMs to 10–18% with NMCs, and survival at 10 years ranged from 2% with the log normal MCM to 9% with the Weibull MCM, and from 6 to 11% with NMCs, compared with the observed value of 5% (95% CI 3–7%). RMST estimates at 20 years ranged from 1.3 with the log normal MCM to 2.3 years for the Weibull MCM, and from 1.9 to 2.7 years with NMCs, whilst the observed value was 1.8 years (95% CI 1.3–2.2). Including age as a baseline covariate made little difference.

When fitted to the 48-month data, models performed much better: in particular, NMCs with a 15-year boundary knot and Weibull MCMs produced estimates that closely approximated observed survival proportions at 5, 10 and 20 years. Log normal MCMs continued to under-estimate long-term survival, and NMCs with a 5-year boundary knot tended to over-estimate survival. Cure fractions ranged from 2–10% with MCMs to 7–13% with NMCs, and survival at 10 years ranged from 2% with the log normal MCM and 5% with the Weibull MCM, and from 4 to 8% with NMCs [observed value: 5% (95% CI 3–7%)]. RMST estimates at 20 years ranged from 1.2 with the log normal MCM to 1.7 years for the Weibull MCM, and from 1.6 to 2.2 years with NMCs [observed value 1.8 years (95% CI 1.3–2.2)].

5.2.2 Explaining the Results

Figure 3b illustrates that the turning point in the observed hazard occurred more quickly in Scenario 2 than in Scenario 1. MCMs were able to identify the turning point and estimate reducing long-term hazards, similar to those predicted by the NMCs, even when fitted to the 24-month data. Hence, the range in estimates from the different cure models was narrower in Scenario 2. However, none of the models fitted to the 24-month data could accurately predict the long-term gradient of the decrease in the hazards. This was vastly improved when models were fit to the 48-month data, when the gradient of the decrease was more established. NMCs with a 5-year boundary knot forced hazards to decline too quickly, whereas those with a 15-year boundary knot predicted the observed hazards closely. These and the Weibull MCM produced credible estimates of long-term survival in this scenario.

5.3 Scenario 3 (No Cure)

In this scenario all models except the log normal MCM substantially over-estimated survival at 5, 10 and 20 years when fitted to the 24-month data, with estimates slightly improved when using the 48-month data. The log normal MCM produced accurate survival estimates because it predicted a low cure fraction and its distribution provided a reasonable approximation of the observed hazards.

6 Discussion

In this tutorial we have outlined key characteristics of MCMs and NMCs, and demonstrated their application in three scenarios. Here we outline aspects to consider when using these models.

6.1 Think About the Hazard Function

If hazards in the observed period are increasing, MCMs will estimate a 0% cure fraction and extrapolations will be based on the distribution assigned to uncured patients. There is no value to fitting MCMs in this case. Flexible parametric NMCs are forced to predict a turning point in the hazard function, so that hazards return to background levels at the boundary knot timepoint. Therefore, if hazards are increasing in the observed period, but a cure can be confidently predicted (perhaps on the basis of external data, or strong clinical opinion), flexible parametric NMCs are the only logical cure model option.

When hazards have begun decreasing during follow-up, MCMs and NMCs may produce credible extrapolations, but this will depend on how well established the decreasing hazard is, and it is difficult to determine whether data are mature enough for the shape of the hazard function to be well enough established for models to extrapolate accurately. With less severe disease the rate of events will be lower and it may take longer for the shape of the hazard function to become established: in general, longer-term follow-up is required for less severe diseases, if cure models are to be relied upon to provide accurate extrapolations. If hazards are observed to begin to decrease only shortly before the end of follow-up, the decrease in the hazards may not be well enough established for models to extrapolate accurately. In this case, applying flexible parametric NMCs with a range of boundary knots could be used as a form of sensitivity analysis, as the boundary knots influence the rate of decline of the extrapolated hazards.

In our analyses, a range of MCMs and NMCs extrapolated accurately in Scenario 2 when fitted to 48-month data, when 88% of patients had died and the hazard function was well established. This was not the case in Scenario 1, when only 48% had died at 48 months, the turning point in the hazard function was later, and the subsequent decrease in hazards was less well established. However, it is notable that NMCs with 15-year boundary knots did extrapolate well in Scenario 1, even when fitted to 24-month data. This indicates that when a cure assumption is valid but data are heavily censored, flexible parametric NMCs may be more likely to extrapolate credibly than MCMs, though uncertainty will be high and the choice of boundary knots will be important.

In addition, when considering the hazard function and model fits, it is important to interpret plots with care. The plots of observed hazards that we present use a smoothing function because unsmoothed plots are likely to be volatile. With or without smoothing, plots of observed hazards may not provide reliable values towards the tail of the curves where numbers at risk are low. In addition, because the hazard is an instantaneous event rate rather than a cumulative measure, models may appear to provide a worse fit when comparing their predictions with observed hazards than when comparing their predictions with observed survival. Hence, a model that does not closely follow a smoothed plot of the observed hazards for its entire length does not necessarily represent a poor model, though it is desirable for model estimates to reflect any clear turning points in the observed hazards and to lie within the confidence intervals of the smoothed plots.

6.2 Choosing Boundary Knots for Flexible Parametric NMCs

Predictions from NMCs may vary substantially, depending on where the boundary knot – that is, the cure timepoint – is placed. Our analyses demonstrate that predicted hazards will begin to fall sharply substantially before the boundary knot. Setting a 5-year boundary knot resulted in hazards that fell steeply at around 2 years, so that they were almost at background levels by 3.5 years. Setting a 15-year boundary knot resulted in hazards that were not much higher than background levels at 5 years, but they fell more gradually. The shape of the hazard should be discussed with clinical experts using recognised elicitation techniques [47, 48], and when cure timepoints are uncertain, sensitivity analysis is likely to be useful. However, given that flexible parametric NMCs are likely to result in hazards that become close to background levels well before the boundary knot timepoint, setting boundary knots at relatively late timepoints is advisable.

6.3 Think About Survival in Uncured Patients

If MCMs are deemed appropriate, the shape of the hazard function in uncured patients should dictate the choice of MCM distribution. If hazards in uncured patients are likely to be monotonic, a Weibull MCM may be reasonable; if hazards may increase and then decrease, a log normal, log-logistic or Generalised Gamma MCM may be more appropriate. For NMCs, the chosen distribution must be able to represent the hazards before the cure timepoint. In our analyses, the number of interior knots included in the flexible parametric NMCs made little difference to model estimates. In a non-cure setting, the number of knots included in flexible parametric models dictates the portion of data upon which extrapolations are based, and therefore the number of knots chosen is often very important. In a cure setting, boundary knots have the largest influence on extrapolations – though it remains important to select a reasonable number of interior knots to capture the shape of the hazard function without over-fitting to the data [12]. In general, flexible parametric NMCs should allow a reasonable fit to observed data whilst also allowing the cure timepoint to be defined: this seems particularly useful in an HTA context.

6.4 Interpreting Cure Fractions

Cure fractions are a function of the framework models are fitted in, and the distributions used within the models. They should be expected to differ between models.

Cure models can predict long-term survivors even when estimating a 0% cure fraction if the distribution used for the uncured has a long tail. For cure models fitted in a relative survival framework, the cure fraction corresponds to the relative survival function and will always be greater than the all-cause survival function. The extent to which this is the case will depend on the relative importance of other cause mortality. Fundamentally, we recommend focusing on the survival proportions predicted by each model over time, rather than the estimated cure fraction.

6.5 SMRs and Model Specification

Applying inappropriate SMRs can lead to poor extrapolations, especially when there is a large proportion of long-term survivors. It may not be appropriate to apply a SMR > 1, as shown by our case study, especially in Scenarios 1 and 2, where observed hazards fell to general population levels. However, there is nuance to this. In Scenario 2, a proportion of patients were cured, and hazards fell to general population levels. Several of the cure models over-estimated the cure fraction, and their long-term survival predictions improved when a SMR of 2.5 was used (see supplementary materials), because this forced survival curves downwards. This is a case of two modelling errors causing bias in opposite directions and to some extent cancelling out. We believe it is preferable to attempt to determine which cure models produce credible survival extrapolations when a realistic SMR is applied, rather than applying an increased SMR in an attempt to protect against over-estimated cure fractions. SMRs should be considered on a case-by-case basis, informed by external data (ideally including an assessment of hazards in long-term survivors with the same condition) and clinical expert opinion.

In epidemiological research, it is standard practice to include age in relative survival models, including cure models [49]. This made little difference in our analyses, in which we focused on average survival curves, but it is likely to improve model accuracy within subgroups by age and is more important when there is a large proportion of long-term survivors.

6.6 Choosing Between Models

Survival models used for extrapolation should never be chosen purely on the basis of statistical fit to the observed data – the plausibility of extrapolations is more important [4, 5, 50]. This is especially the case for cure models, where expectations around credible ranges of long-term survival proportions and cure timepoints are crucial. In Scenario 1, MCMs and NMCs had similar Akaike Information Criterion values but produced widely diverging extrapolations – NMCs extrapolated adequately, but MCMs did not.

6.7 Fitting Cure Models When There is no Cure

Fitting any type of cure model when there is in fact not a cure is likely to result in over-estimated long-term survival. MCMs with distributions for the uncured that can reflect decreasing hazards may protect against this, because these models are more likely to estimate zero cure fractions. However, in our analyses, log normal MCMs were amongst the worst performing models when there was a cure. Therefore, it is inadvisable to fit log normal MCMs ‘just in case’ there is not a cure. Instead, when cure is uncertain, it is more sensible to fit cure and non-cure models to allow reviewers and decision-makers to assess the sensitivity of effectiveness and cost-effectiveness estimates to the cure assumption.

6.8 Other Modelling Approaches

This tutorial focuses on MCMs and NMCs, but these are not the only models that can extrapolate cure-like long-term survival. TSD 21 describes other models, including piecewise, landmark-based and relative survival models [12]. Each approach has advantages and disadvantages and this tutorial is not an endorsement of any particular method. Landmark-based piecewise models may combine a parametric curve up until a specified landmark timepoint and general population mortality rates beyond the landmark, and in some respects may appear similar to an NMC. However, these models do not use the data to estimate the cure, and may result in discontinuities in the predicted hazard function. They also incorporate background mortality rates in a less sophisticated way. Therefore, if this type of approach is to be taken, we suggest that using a flexible parametric NMC with carefully selected boundary knots (and sensitivity analyses) would be preferable.

It is important to acknowledge that in this demonstration we present NMCs using flexible parametric models, with extra information added in the form of boundary knot timepoints, but only present MCMs using standard parametric models. This may place NMCs at an advantage. The rationale for this is that flexible parametric NMCs are well recognised for the analysis of population-level survival studies [12, 31, 51, 52], whereas, to the best of our knowledge, MCMs have very rarely been applied using flexible parametric models.

In addition, Felizzi et al. developed ‘informed’ MCMs, whereby the cure fraction is obtained from external information and used as an input to the MCM [11]. Like using boundary knots within flexible parametric NMCs, this allows the analyst some control over the survival curve estimated by the model, and is worthy of further consideration. However, it is arguable whether it is preferable to use a cure fraction as an input to a cure model or a cure timepoint (as used within flexible parametric NMCs), especially given the difficulties associated with interpreting cure fractions estimated using different frameworks (i.e. relative survival or all-cause mortality), and the differences between fractions estimated by different parametric models. We have not included these models in this tutorial because they necessitate consideration of several additional factors and a separate tutorial focused solely on these models exists [11].

Botta et al. also introduced a potentially valuable extension to MCMs, whereby they attempted to estimate and adjust for the increased risk of non-cancer deaths in patients with cancer. This is similar in concept to using SMRs, but where the increased risk is estimated using the data to which the cure models are applied [53].

6.9 Limitations

In addition to limiting our focus to a specific subset of cure models that can be used to extrapolate survival, we have not applied an exhaustive set of analyses for the cure models that we have tested. We applied MCMs using a subset of standard parametric models, and NMCs using flexible parametric models. This is because we expect MCMs and NMCs using the same standard parametric models to give similar results. NMCs that use flexible parametric models provide the analyst with an additional tool with which to ‘control’ long-term survival predictions through the setting of boundary knots, and therefore we considered it valuable to demonstrate the use of these models. For simplicity, and to avoid presenting results for a very large number of analyses, we chose to apply MCMs using only Weibull and log normal distributions. These represent models where there can (the log normal MCM) and cannot (the Weibull MCM) be a turning point in the hazard function for the uncured population.

We also limited the specification of the models we fitted with respect to baseline covariates (including or excluding age) and SMRs (SMR of 1 or 2.5). We could also have included sex as a baseline covariate. Standardisation would then be required to obtain marginal survival functions, as we demonstrated for analyses that included age as a baseline covariate. In epidemiological research it is generally considered more important to include age in relative survival models, rather than sex. However, in some circumstances, such as when survival differs considerably by sex, including sex could be important.

With respect to SMRs, we did not undertake a review or an expert elicitation process to determine a potentially relevant value – in practice this should be done. We did, however, observe that in the long-term data that we derived our scenario datasets from, hazards appeared to return to background levels (albeit with associated uncertainty), indicating that a SMR of 1 is likely to be reasonable.

Finally, in the datasets constructed for each of the scenarios we analysed, for the 24- and 48-month analyses we simply censored data for patients whose observed follow-up was greater than 24 (or 48) months. Given that entry into clinical trials is usually staggered over a period of time, our approach means that there will be less uncertainty in the tails of the observed survival curves in our scenarios than there might be in a clinical trial setting. More sparse long-term data is likely to mean that hazard functions take longer to become established, which is relevant when considering the results of the analyses that we demonstrate. However, this does not negate our conclusions or interpretation, and we refer readers particularly to Sect. 6.1 for a discussion on data maturity in relation to cure modelling.

6.10 Further Research

As commented on in Sects. 6.8 and 6.9, this tutorial is not exhaustive with respect to models that can extrapolate cure-like long-term survival, and does not attempt to formally evaluate the performance of these methods. A neutral comparison study of the relevant model types would be valuable [54]. Demonstrating the use of flexible parametric NMCs in a range of clinical trial datasets would also be of value, as would research into appropriate methods for determining SMRs for a range of cancers. Research into cure timepoints for different cancers (and stages of cancer) would also be highly valuable – for instance, analysing timepoints at which mortality hazards reach background population levels in registry datasets with long-term follow-up.

7 Conclusions

Cure models are generally interpreted with extreme caution in HTA. This is reasonable because these models can produce highly variable extrapolations and are likely to be extremely inaccurate if a cure assumption is not valid. However, when a cure assumption is credible, it is reasonable to explore extrapolations from cure models. When a cure exists and data are relatively mature, with well-established hazard functions, MCMs and NMCs are likely to produce similar, credible extrapolations. However, at the time of HTA submission, survival data are usually immature. In such cases, standard parametric MCMs are unlikely to be able to extrapolate accurately. Flexible parametric NMCs are more able to produce accurate extrapolations in cure scenarios when trial follow-up is short, provided that sensible and reasonably accurate cure timepoints are selected. Therefore, flexible parametric NMCs are likely to be more useful than standard MCMs in the context of HTA. However, extrapolations from these models will be prone to substantial uncertainty in such scenarios, and their validity rests on the credibility of the cure assumption and the placement of boundary knots.