Background

Lung cancer is the second most common cancer in both men and women. [1] It is the leading cause of cancer death in both men and women. Excluding mesothelioma, non-small cell lung cancer (NSCLC) accounts for about 85% of all lung cancers. [2] Many patients have a delayed diagnosis and are unsuitable for surgery so that most receive some form of first line pharmacotherapy. In the past, following failure of first-line therapies most NSCLC patients received docetaxel [3], however in recent years targeted therapies and immunotherapies have been developed, the latter acting as immune checkpoint inhibitors with the aim of boosting anti-tumour immunity rather than directly targeting cancer cells. About 12 agents now have a label indication for second- or further line NSCLC treatment. A 2017 study [4] of cancer drugs approved by the EMA from 2009 to 2013 concluded that most of these drugs entered the market without evidence of benefit on survival or quality of life, and that after a median of 3.3 years post-market entry, there was little or no conclusive evidence of extended or better life for most cancer indications.

The effectiveness of the second line new agents for treating NSCLC in absolute terms is unknown because previous trial analyses focused mostly on the relative benefit (versus standard chemotherapy mainly consisting of docetaxel), usually expressed in terms of OS and PFS hazard ratios [5, 6].

In this systematic review we estimated the survival benefit (i.e. mean number of months) from licensed therapies for NSCLC. It is hoped that the findings focused on new drugs may contribute to more informed discussion between patients and clinicians and will support the decision-making process.

Methods

We registered a protocol for this review in PROSPERO (CRD42017065928).

Inclusion/exclusion criteria

We included RCTs of adult patients with advanced or metastatic (IIIB and/or IV) NSCLC with non-squamous (adenocarcinoma, large cell) or squamous histology who had experienced failure to prior first line chemotherapy (i.e., those receiving second line treatment and beyond); had either predominantly negative or 100% negative expression of anaplastic lymphoma kinase (ALK); had either predominantly negative or 100% negative expression of epidermal growth factor receptor (EGFR). Studies enrolling only patients with ALK+ and/or EGFR+ expression were excluded since according to current practices they would be offered targeted therapies (erlotinib or gefitinib for EGFR+; osimertinib for EGFR T790 M; crizotinib or ceretinib for ALK+). [1]

RCTs were included if interventions or comparators had an EMA (European Medicines Agency) label indication as of June, 2017 for the population described above. The drugs meeting these criteria were Docetaxel (DOC), Pemetrexed (PEM), Ramucirumab plus docetaxel (RAM + DOC), Erlotinib (ERL), Nintedanib plus docetaxel (NIN + DOC), Afatinib (AFA), Nivolumab (NIVO), and Pembrolizumab (PEMBRO). We also included Atezolizumab (ATEZO) which obtained an EMA license following the Committee for Medicinal Products for Human Use (CHMP) positive opinion of 20 July 2017. Only studies in which drugs were used with a dose regimen as described in the summary of product characteristics were included. The following drugs such as Crizotinib, Ceretinib, Gefetinib, Osimertinib which are used in people with ALK+ and/or EGFR+ expression were excluded.

Studies were included if either an overall survival or progression-free survival or both parameters were reported in published Kaplan-Meier plots.

Search strategy

Electronic databases (MEDLINE; EMBASE; Web of Science) were searched for relevant literature from January, 2000 up to present (see MEDLINE search strategy in Additional file 1). The electronic searches were limited to English language. The lower time limit for the search period was chosen in accordance with the emergence of docetaxel as the standard second-line treatment. Reference lists of relevant articles were hand-searched to identify additional potentially relevant citations. The search was first updated up to early July 2017 retrieving 274 additional records but no further studies were included. A final update of the search was undertaken up to February 2019 to identify additional original articles relevant to the included studies. The latter retrieved 651 records of which six were selected for further scrutinity.

Selection of studies

Three reviewers independently screened all titles/abstracts and then full texts of publications potentially relevant for inclusion. Disagreements were resolved through a consensus. The study flow and reasons for exclusion at the full text screening level are presented in the PRISMA study flow diagram [7] (Additional file 2).

Data extraction

The data extracted included study author, trial acronym, patient characteristics (age, sex, diagnosis, tumour stage/histology), type, mode, dose and duration of treatments. Extracted data was cross-checked by a second reviewer.

Published Kaplan-Meier (KM) survival plots were used to make estimates of mean survival benefit. Two reviewers digitised the KM plots, extracted patient numbers at risk, numbers of events, and published hazard ratios.

Assessment of risk of bias

Two independent reviewers assessed the risk of bias (RoB) in the included studies using the Cochrane RoB tool for RCTs; [8] this categorises studies according to the following domains of potential bias: selection bias (random sequence generation, allocation concealment), performance bias (blinding participants and personnel), detection bias (blinding of outcome assessment), attrition bias (incomplete outcome data), reporting bias (selective outcome reporting), and “other” bias (e.g. between-group baseline distribution of important prognostic factors). Summary ratings of high RoB were assigned if at least one of the domains of selection, attrition, and other bias was rated as high RoB. If information was insufficient to judge, then an unclear RoB rating was assigned. Quality assessment was performed by two independent reviewers and then cross-checked. Any disagreements were resolved by a third reviewer through a discussion.

Data analysis and synthesis

We used the algorithm of Guyot et al. [9] to estimate underlying individual patient data, which was then used to reconstruct KM plots and to derive estimates of mean survival. The reliability of KM reconstructions was tested by inspection of reconstructions overlaid onto published plots, comparison of reconstructed and published risk table of patients at risk, and correspondence of reconstructed HRs with published HRs (Additional file 3).

Mean survival was estimated in several ways. Restricted mean survival (RMS) [10] and mean difference in RMS between compared drugs in each trial, were estimated to the longest time common across the compared studies of interest using the Stata module of Cronin et al. 2016 [11].

In order to account for any potential gains beyond the longest observation time common across trials, we undertook analysis of total mean survival using parametric survival modelling. Total mean survival was estimated: [a] with Weibull models (fit separately by study arm) using the stgenreg package of Crowther and Lambert 2013 [12]; mean survival time and 95% confidence intervals (CIs) were estimated from the AUC of the model and its upper and lower 95% CIs using 0.01 month increments over 96 months. The CIs around the central AUC estimate were somewhat asymmetric (as would be expected from the delta method for estimating CIs around parametric models). The SE for the AUC value was therefore estimated from the difference between 95% LCI and UCI AUC values divided by 2 × 1.96. In two instances Weibull models were inferior to generalised gamma models in which case the latter were used; [b] Total mean survival was also calculated using the equations for mean survival published by Davies et al. 2012 [13] for Weibull parametric survival models; [c] Lastly, total mean survival was also estimated using the “stci, emean” command in Stata; this command uses an exponential extension from the tail of the KM plot to the time axis; and mean survival is then estimated from the AU the KM plot plus that under the extension. Similar methods were applied for progression free survival (PFS) (Additional file 4).

We did exploratory analyses to investigate the relationships between PFS and RMS and modelled total survival, and between published hazard ratios and median survival values and RMS and modelled total survival.

Analyses were done using Stata® versions 12 or 14.2 (Stata Corp, College Station, TX, USA).

The outcome estimates are presented in KM plots, model plots, forest plots, and tables. Where possible, the analyses were stratified by histologic subtypes (squamous and non-squamous).

For completeness, we undertook a network meta-analysis to estimate the mean differences in RMS and in OS. The description of corresponding methods was reported as Additional file 7.

Results

Our search retrieved 1949 records, of which 1855 were excluded at title/abstract level leaving 94 records to be examined for full-text. We subsequently excluded 81 records with reasons as illustrated on the PRISMA flow chart and included 13 records [14,15,16,17,18,19,20,21,22,23,24,25,26], corresponding to 11 primary RCT studies with 7581 participants (REVEL, LUME LUNG-1, LUX LUNG 8, OAK, POPLAR, KEYNOTE-01, CHECKMATE-017, CHECKMATE-057, TAILOR, HORG, and Hanna et al. 2004). No studies were omitted because of a lack of KM plot, but some included studies did not provide plots for all histology subgroups.

Study characteristics and quality

The 11 RCTs compared nine different drugs with the majority of comparisons were against DOC. Two comparisons, ATEZO vs DOC [17, 24] and NIVO vs DOC [14,15,16] were tested in more than one study. The NIVO studies employed histology-specific inclusion criteria. Table 1 summarises the main characteristics reported for the 11 studies. Study sample size ranged from 208 to 1314 patients; studies included predominantly people with stage IV NSCLC and performance status 1. The mean age at inclusion ranged from 57 to 66 years and the majority of patients were male. There was no evidence of substantial imbalance in potential effect modifiers.

Table 1 Characteristics of included studies

Nine studies [15,16,17,18, 20,21,22, 24, 26] were considered as high-risk of bias due to the lack of blinding of participants and personnel. The five RCTs [15,16,17, 21, 24] evaluating checkpoint inhibitors versus DOC were open-label and were considered as high-risk due to performance bias. LUME-LUNG-1 [23] was rated at low risk of bias for all the key domains. Only HORG and TAILOR [18, 22] had public funding, so the remaining studies were rated as high-risk due to “other source bias”.

Overall survival analyses in mixed histology populations

These analyses were based on mixed populations of patients whose tumour histology was either squamous or non-squamous.

Overall survival from observed data

Reconstructed KM plots from studies reporting OS in populations unselected according to tumour histology are shown in Fig. 1 (for completeness of analysis corresponding plots for PFS are presented in Additional file 4). Only the plots for ATEZO (OAK [24] and POPLAR [17] trials) and for PEMBRO (KEYNOTE-010 [21]) imply appreciable survival gains over DOC. ERLO was not beneficial compared to DOC (TAILOR [18]) or PEM (HORG trial, [22]).

Fig. 1
figure 1

Reconstructed Kaplan-Meier plots (95% CI) of overall survival; studies recruiting patients irrespective of tumour histology. Time axis is months, vertical axis is proportion alive

RMS estimates are summarised in Additional file 5. Over the observed period of 19 months common to all studies of new treatments, the RMS delivered by DOC (alone or combined with placebo) ranged between 9.30 (95% CI 8.02–10.57) months (TAILOR) and 10.68 (95% CI 10.03–11.33) months (OAK), while in the older study of DOC vs. PEM (Hanna et al., 2004 [20], post hoc analysis by Scagliotti et al. 2009 [25]) DOC delivered only 8.70 months (95% CI 7.96–9.44) RMS (Table 2). The RMS gain relative to DOC from new drugs over 19 months was modest ranging from minus 1.64 months (95% CI minus 3.36–0.08) for ERLO, 0.48 months (95% CI minus 0.23–1.18) and 0.99 months (95% CI 0.24–1.73) for NIN + DOC and RAM+DOC respectively, to between 1.45 months (95% CI minus 0.11–3.00) and 1.62 months (95% CI 0.70–2.55) for ATEZO and 1.58 months (95% CI 0.48–2.68) for PEMBRO (Table 1 and Additional file 6).

Table 2 Mean survival (months) estimates from studies of patients with mixed histologies

Overall survival from extrapolated data (survival modelling)

Exponential extrapolation from the tail of the KM plots (Stata command: stci, emean) suggests losses for ERLO relative to DOC, and gains over DOC of less than 1 month for RAM+DOC and NIN + DOC (the latter licensed only for adenocarcinoma), and potentially impressive gains over DOC of 7.9 to 8.5 months for PEMBRO and ATEZO respectively (Table 2). However, the alternative procedure of modelling OS using Weibull fits to the whole of the KM plot suggests more modest gains for immunotherapies relative to DOC (Fig. 2 and Table 2). Across industry-sponsored studies of immuno- and targeted therapies Weibull models of overall survival (Fig. 3) with DOC yielded between 11.10 months (95% CI: 9.98–12.88) (KEYNOTE-010 [21]) and 13.59 months (95% CI: 12.11–15.32) (OAK), and suggest mean survival gains over DOC of 5.74 months (95% CI minus 0.14–11.61) and 5.34 months (95% CI 2.25–8.43) for ATEZO (POPLAR [17] and OAK [24] respectively), 5.04 months (95% CI 1.57–8.52) for PEMBRO (KEYNOTE-010), but of less than 2 months for targeted therapies RAM+DOC (REVEL [19]) and NIN + DOC (LUME LUNG-1 [23]), and no gain for ERLO. Weibull modelling of the publicly funded HORG trial indicated a possible modest gain from ERLO over PEM (1.16 months; 95% CI: minus 3.5–5.82), and modelling of the Hanna study indicated likely equivalence of the chemotherapies DOC and PEM.

Fig. 2
figure 2

Weibull models of overall survival for studies depicted in Fig. 1. Time axis is months, vertical axis is proportion alive

Fig. 3
figure 3

Compared mean survivals (calculated using Weibull model) irrespective of histology (a), in squamous histology (b), and non-squamous histology (c); orange bars denote immunotherapies, blue bars targeted therapies, and green bars chemotherapies

Overall survival analyses per histology (squamous or non-squamous)

These analyses were based on the studies where KM plots for trial participants stratified according to histology were presented.

Mean survival from observed data

Figure 4 summarises the reconstructed KM plots for licenced drugs for squamous histology and non-squamous histology. These suggest likely modest gains from RAM+DOC irrespective of histology and for NIN + DOC in the treatment of adenocarcinoma (the licensed indication), little or no gain from PEM over DOC irrespective of histology, but more substantial likely gains over DOC from the checkpoint inhibitors (NIVO and ATEZO) for both histology types. No KM plots per histology were available for PEMBRO.

Fig. 4
figure 4

Reconstructed Kaplan-Meier (95% CI) plots of overall survival; studies recruiting patients with specified tumour histology. Time axis is months, vertical axis is proportion alive

Over the observed periods of 24 and 27 months common to all squamous and non-squamous studies respectively, NIVO and ATEZO delivered between about 2 and 4 months RMS gain over DOC, while RAM+DOC and NIN + DOC only between 1 and 2 months, results supporting the apparent superiority of the checkpoint inhibitors (Tables 3 and 4, Additional file 6).

Table 3 Estimates of mean survival (months) based on studies of patients with squamous histology
Table 4 Estimates of mean survival (months) based on studies of patients with non- squamous histology

Overall survival from extrapolated data (survival modelling)

Weibull models provided satisfactory fits for non-squamous histology but the shapes of the KM plots for squamous histology for the checkpoint agents were irregular and gamma models provided a better fit. Parametric models are summarised in Fig. 5.

Fig. 5
figure 5

Parametric models of overall survival for studies depicted in Fig. 3. Time axis is months, vertical axis is proportion alive. All are Weibull models except where specified

For the industry-sponsored studies of targeted and immunotherapies Weibull model estimates of mean survival with DOC treatment in patients with squamous histology ranged between 9.41 (95% CI: 7.78–11.41) months (CHECKMATE-017) and 11.73 (95% CI: 10.131–13.38) months (LUME LUNG-1), and in patients with non-squamous histology between 13.32 (95% CI: 11.73–15.8) months (CHECKMATE-057) and 15.02 (95% CI: 13.05–17.43) months (OAK) (Tables 3 and 4, and Fig. 3). The gain in overall mean survival over DOC from targeted and immunotherapies therapies for patients with squamous histology ranged from less than 1 month for RAM+DOC and NIN + DOC to 4.08 months (95% CI minus 0.09–8.25) for ATEZO and 6.51 months (95% CI 2.50–10.52) for NIVO (CHECKMATE-017) (Tables 3 and 4, Additional file 6). Mean survival gains over DOC of 4.81 months for ATEZO and 7.45 months for NIVO were obtained if better fitting gamma models were substituted for Weibull models for squamous histology (Fig. 5), while gamma models for targeted therapies yielded smaller gains than Weibull models. Survival gain from AFA over ERLO was estimated to be 2.14 months (95% CI 0.45–3.83) (the reported HR for OS was 0.82); this gain would probably diminish if the comparator had been DOC since the TAILOR trial [18] found superior performance for DOC over ERLO in both squamous and non-squamous histology populations (HR 1.11, 95% CI: 0.61–2.03, and 1.49 95% CI: 1.06–2.10, respectively), however AFA might be expected to have a superior safety profile to DOC. The mean gain in survival over DOC from targeted and immunotherapies therapies for patients with non-squamous histology (Table 4) ranged between 2.42 months (95% CI minus 0.20–5.04) and 2.84 months (95% CI minus 0.05–5.63) for RAM+DOC and NIN + DOC, to 4.72 months (95% CI minus 1.44–8.00) and 5.68 months (95% CI minus 1.61–9.75) for ATEZO and NIVO respectively.

In network analysis immunotherapies consistently ranked higher than alternatives irrespective of population histology and outcome measure (Additional file 7).

Exploratory analyses on PFS and OS relationships

PFS is often specified as a primary or co-primary outcome in trials of cancer drugs. We conducted analyses to explore if PFS in NSCLC might be an indicator for overall survival in second line therapies. Weibull model estimates of gains in PFS over DOC for targeted therapies and immunotherapies were modest ranging from + 1.18 months (RAM + DOC) to minus 1.33 months (ERLO) in studies recruiting patients unrestricted by histology (Additional file 4). Available data for squamous and non-squamous histologies indicated similarly small gains except in the case of CHECKMATE-017 (squamous histology) in which the estimated gain was more substantial (3.11 months). Across the included studies there was a poor relationship between modelled estimates of PFS and of OS, and between modelled PFS gains and reported PFS hazard ratios, whereas strong associations were seen between modelled OS and reported median OS, and between modelled OS gains and reported OS hazard ratios (Additional file 8). These finding suggests that PFS is unlikely to be a good indicator for subsequent OS in this case.

Discussion

In this study we estimated the mean number of months of survival benefit from therapies licensed for the treatment of advanced NSCLC. An estimation of survival in the absence of treatment can be obtained from two early RCTs in NSCLC patients, previously treated with platinum chemotherapy, and who were randomised to receive placebo or best supportive care (BSC). The reported median survivals were 4.7 [27] and 4.6 months (95% CI: 3.7–6.0) [28] respectively. By applying the methods described above using Weibull models, we estimate BSC and placebo mean survival to be 7.34 months (95% CI: 5.92–9.14) and 7.77 months (95% CI: 6.71–9.03). If patients received DOC they might expect an extension in average life expectancy to about a year depending on histology, with slightly better prospects for those with non-squamous histology.

Our results suggest that mean survival gains over DOC from RAM+DOC, NIN + DOC, and ERLO, are meagre but may be marginally superior for patients with non-squamous than for those with squamous tumour histology. The analysed results indicate that average survival gains over DOC from checkpoint inhibitors are greater than those from the targeted therapies, with estimates for the former reaching between 4 and 9 months depending on tumour histology and the method of modelling beyond the observed data.

The European Society for Medical Oncology (ESMO) has recognised that a comparison of treatments based solely on hazard ratios for OS provide only indirect information about treatment benefit; they have proposed an estimator, the “Magnitude of Clinical Benefit Scale (ESMO-MCBS)”, which they believe represents “a standardised, generic, validated approach to stratify the magnitude of clinical benefit that can be anticipated from anti-cancer therapies” [29]. Davis et al. 2017 [4] used this tool to examine pharmaceutical interventions for advanced cancers approved by the EMA 2009 to 2013. The authors expressed concern that for many of these interventions, the available evidence failed to demonstrate survival benefit or improved patient quality of life.

Patients may have difficulty in interpreting measures of relative risk (e.g. HRs) and in understanding the basis of the ESMO-MCBS tool measure. Patients often prefer information about the likely lifetime gain (e.g. life years gained) from a new treatment being offered. It has been suggested that advanced cancer patients with short life expectancy are willing to accept considerable toxicity of treatments that offer a chance of durable survival [30], however evidence on this is conflicting [31]. It has been claimed that a proportion of patients who receive immunotherapies may experience a durable survival response (a so called “tail of the curve response”) so that mean survival estimates for the “whole population” may mask this possibility. However, the evidence base for such outcomes is far from clear cut.

The British Thoracic Society has provided guidance for health care professionals about sharing information with patients with lung cancer. [32] Such information could include estimates of average survival benefit that might accrue with various treatment options. Furthermore decision makers such as NICE generally require estimates of the mean gain in survival from new treatments when taking reimbursement decisions. It is therefore of interest to gain an idea of the mean survival benefit yielded from new treatments for advanced NSCLC and to see how such benefit might vary according to tumour histology.

Equally importantly, mean survival gains offer an unambiguous, informative measure of outcome, which is far less exposed to limitations and controversies surrounding the use of quality-adjusted life years (QALYs) in the evaluation of treatments for neoplasms. While the QALYs facilitates decision making across areas, limitations in the way QALYs are constructed have led to criticism on various grounds (including insensitivity to changes in health states [33], especially those that are caused by adverse effects due to cancer treatments [34]. Limitations in QALYs have led researcher to conclude that ‘the measure shows important limitations in terms of its ability to accurately capture the value of the health gains deemed important by cancer patients’ [33]. Reimbursement decisions become challenging when comparing cancer, with its’ generally short term survival expectation, with chronic disabling diseases with relatively extended absolute survival. Given this, we expect that estimates of survival are key information which, at the very least, should be reported and considered alongside QALYs.

Our review has several strengths. To the best of our knowledge, this is the first attempt at comparing mean survival of all drugs with a licensed indication for second/third line treatment of advanced/metastatic wild-type NSCLC. It is justified, because the growing number of licensed therapies offers a new range of treatment options for which survival information is of interest to both oncologists and patients. Multi-arm RCTs could provide the best evidence, but these have not been undertaken and our work provides a pragmatic approach.

Our review has several limitations. Although we used rigorous methods to identify all relevant literature we could only include 11 primary research studies so the inherent risk of publication bias may be of particular importance. Our survival curves and estimates have relied on reconstructing the underlying individual patient data rather than using the individual patient data itself. However for all the included studies, there was a close correspondence between our derived curves and those published. A further potential limitation is the risk of uneven performance of the common comparator, DOC, between different studies; however these differences were small relative to differences between targeted therapies and the checkpoint inhibitors. We noted some differences in baseline characteristics across studies regarding the number of prior lines of treatment and disease stage at inclusion. For these variables survival outcomes were not reported in sufficient detail to allow sensitivity analyses to test the robustness of our results.

Conclusion

Based on our review, NIVO, PEMBRO and ATEZO exhibit superior benefit compared to other licensed drugs indicated for people with non-specific late stage NSCLC. The patient survival gains over chemotherapy from these drugs appear to be fairly substantial in the context of an expected average survival with DOC of less than 1 year for people with squamous histology and a little over a year for those with non-squamous histology.