1 Introduction

Renal cell carcinoma (RCC) is the most common type of kidney cancer and accounts for 2–3% of cancers worldwide [1]. The incidence is highest among men (twice over women’s rate [2]) and is increasing in many countries [3]. Individuals affected by RCC are often asymptomatic, and approximately 25–30% of patients have metastatic or advanced RCC (aRCC) at time of diagnosis [4], usually indicating a poor prognosis [5]. Prognostic factors have been identified by both the International Metastatic Renal Cell Carcinoma Database Consortium (IMDC) [5] and the Memorial Sloan–Kettering Cancer Center (MSKCC) [6]. IMDC and MSKCC criteria overlap, and the majority of patients are categorized into the same risk group using either set of criteria [7]. The criteria are used to classify patients into three risk groups (i.e., favorable, intermediate, and poor), which can aid in establishing a prognosis and guiding treatment choice. For poor- and intermediate-risk patients, median OS is 7.8 months and 22.5 months respectively, compared with 43.2 months for those with favorable risk [7]. In the real-world setting, median OS is 20.9 months for patients receiving first-line targeted therapy for metastatic RCC, and 14.7 months for those with intermediate/poor risk [8].

The introduction of targeted therapies during the past decade has improved survival outcomes for patients diagnosed with aRCC. These treatment options include agents that target the vascular endothelial growth factor (VEGF) pathway (e.g., sunitinib, bevacizumab, pazopanib, axitinib) or the mammalian target of rapamycin (mTOR) pathway (e.g., temsirolimus, everolimus [9]). According to the European Society for Medical Oncology (ESMO) Clinical Practice Guidelines, the standards of care for first-line treatment of aRCC patients with good or intermediate prognosis are bevacizumab (combined with interferon [IFN]), sunitinib, and pazopanib. The efficacy of both sunitinib and pazopanib has been confirmed by real-world evidence studies [10], and these two agents are currently the most commonly used first-line treatments for aRCC [11]. Alternatively, high-dose interleukin-2 (IL-2), sorafenib, and low-dose IFN combined with bevacizumab may be considered as first-line treatment options. For patients with poor prognosis, the standard first-line treatment choice is temsirolimus, with options to use sunitinib, sorafenib, or pazopanib [11].

Cabozantinib constitutes a newer first-line treatment option for patients with aRCC. It is an oral small-molecule inhibitor of multiple tyrosine kinases, including vascular endothelial growth factor receptor (VEGFR2), MET (receptor tyrosine kinase for hepatocyte growth factor), and AXL (receptor tyrosine kinase for GAS6) [12]. Cabozantinib was previously approved for second-line treatment of patients with aRCC. Recently, an investigator-initiated, randomized phase II multicenter trial (CABOSUN) compared cabozantinib with sunitinib as first-line therapy in patients with aRCC. This trial demonstrated that in patients with intermediate- or poor-risk aRCC, first-line treatment with cabozantinib resulted in significantly increased PFS (median 8.6 vs 5.3 months) over the current standard of care (sunitinib) [13]. In December 2017, following priority review, the Food and Drug Administration granted approval to cabozantinib for first-line treatment of patients with aRCC based on CABOSUN study. In Europe, cabozantinib is currently being evaluated as a first-line treatment in aRCC by the European Medicine Agency.

Given the variety of treatment standards in various countries, there is a need to compare the efficacy and safety of all available treatment options. However, head-to-head clinical trials are usually not available as a basis for treatment decisions, and it is not feasible to conduct head-to-head evaluations of all therapeutic options. To generate indirect evidence of comparative efficacy that can be used to support treatment decisions for this patient population, we therefore carried out a systematic review of the available randomized controlled trials for first-line aRCC treatments and conducted a comprehensive network meta-analysis (NMA) comparing the efficacy of the agent, cabozantinib, with standard-of-care treatments currently approved in Europe.

2 Methods

2.1 Search Methods

Our NMA was based on studies identified through a systematic literature review (SLR) conducted in June 2017. A comprehensive search protocol was developed (see Supplementary Information). The search strategy combined an existing published literature review with supplemental searches. The pazopanib manufacturer NICE submission document comprised a systematic review on pazopanib, bevacizumab plus IFN, IFN, IL-2, sunitinib, sorafenib, and temsirolimus for the period 1980–2009. A supplemental search was conducted for these treatments to fill the gap for the period 2009–2017, and for cabozantinib and tivozanib (which were not part of the pazopanib report) for the entire period from 1980 to 2017 using the MEDLINE®, Embase, and Cochrane databases. Additionally, records were identified through search of a clinical trial study registry (clinicaltrials.gov). References of identified systematic reviews, HTA, meta-analyses, and indirect treatment comparisons (ITC) were checked to identify any additional records.

2.2 Selection Criteria

The SLR followed the PICO (Population, Intervention, Comparator, Outcomes) framework and, in brief, utilized the following selection criteria (see supplementary material for full criteria). The population included adult patients ≥ 18 years of age with previously untreated aRCC. The treatments included the following in the first-line setting: cabozantinib (Cabometyx™), sunitinib (Sutent®), pazopanib (Votrient®), interferon alfa (Intron® A, Roferon®-A); interleukin-2 (Proleukin®), sorafenib (Nexavar®), bevacizumab (Avastin®) plus interferon alfa, temsirolimus (Torisel®), and tivozanib (Fotivda®). Comparators included any interventions from the above list, placebo, or best supportive care (BSC). Target outcomes were OS and PFS. Study designs eligible for inclusion were randomized controlled trials (RCTs) and letters reporting RCTs. We included SLRs, meta-analyses, and HTAs for screening of bibliographies. Studies published in English, French, German, Italian, and Spanish were included. Two reviewers independently screened titles/abstracts and excluded records that did not meet the selection criteria. Both reviewers then evaluated full-text articles for inclusion in the SLR, and resolved discrepancies through discussion or reconciliation by a third reviewer.

2.3 Data Extraction

One reviewer extracted data from eligible studies, and a second, independent reviewer assessed all extracted data. Discrepancies between the two reviewers were resolved by consensus. Extracted data included the following: study design variables, patient characteristics, risk categories, efficacy outcomes for median PFS and OS (in months), and HRs. For studies that had more than two arms with available HRs, only the treatments of interest were included. If a single study had more than one publication for the same outcome, the most recently reported data were selected. The preferred PFS endpoint was that determined by an independent review committee (IRC). If not available, investigator-determined PFS was used instead. Adjusted/stratified HRs for OS were preferred over non-adjusted/unstratified HRs. Finally, two independent reviewers critically appraised all eligible studies using assessment criteria based on recommendations in the NICE manufacturer’s template.

2.4 Statistical Analysis

The NMA was performed based on HRs of two efficacy endpoints in three patient groups: OS and PFS in the overall patient population and intermediate- and poor-risk sub-groups (network diagrams are available in the Supplementary Information; Figs. A-1A-6). In order to assess potential differences among the study populations, baseline patient characteristics in the identified studies were compared. The NMA was executed in R 3.2.3 using package netmeta to perform frequentist maximum likelihood estimation [14]. The model was a fixed-effect model based on logarithms of HRs, as previously described for example in Caldwell et al. 2005 and Rücker et al. 2012 [15, 16], yielding linear regression models where the parameters to estimate were log(HRs) between any treatment and the reference treatment (cabozantinib). The model can be summarized as follows:

$$ {\theta}^{\prime }= X\theta +\varepsilon, \kern0.75em \varepsilon \sim \mathrm{Normal}\left(0,\Sigma \right). $$

where θ is a vector of observed log(HRs) between pairs of treatments, θ is a vector of treatment effects to estimate, X is a design matrix for pairwise comparisons of treatments and comparators, and Σ is a diagonal matrix of inverse variances of each of the log(HR) estimates. Model outputs are point estimates and 95% confidence intervals (CIs) for HRs between pairs of treatments, herein summarized as forest plots for the PFS and OS outcomes. The HRs were calculated for cabozantinib versus other treatments.

3 Results

3.1 Study Selection

Eighty-eight references were identified through the bibliography of the pazopanib manufacturer submission document for the National Institute for Health and Care Excellence (NICE) [17]. The systematic literature search in bibliographic databases yielded 5094 citations, and additional records were identified with searches of the clinicaltrials.gov registry (n = 41). After removal of duplicates, 4115 abstracts were screened and 3625 were excluded at this stage. After full-text screening, an additional 388 publications were excluded (Fig. 1). Additionally, two publications were identified through reference checking [18, 19]. One additional record was identified through reviewing the relevant evidence appraisals by NICE for the indication of first-line aRCC [20]. In total, 105 publications referring to 19 studies were identified. For the CABOSUN trial, the clinical study report (CSR) was provided by Ipsen Pharma, which comprised the final data for cabozantinib [13]. Table 1 summarizes the characteristics of the 19 identified trials.

Fig. 1
figure 1

PRISMA Flow Chart. CSR: clinical study report; HR: hazard ratio; HTA: health technology assessment; ITC: indirect treatment comparison; KM: Kaplan–Meier (curve); NICE: National Institute for Health and Care Excellence

Table 1 Overview of studies identified through the systematic literature review and critical appraisal of study quality

3.2 Study Exclusions

Of the 19 identified studies, 17 were critically appraised; two studies could not be assessed, because only abstracts or posters were available (see Table 1). The assessment criteria that most of the studies fulfilled were the following: appropriate randomization (12 studies), balanced patient baseline characteristics between study arms (15 studies), no evidence of selective reporting (14 studies) and appropriate intent-to-treat (ITT) population analysis (16 studies). Nine studies had an open-label design, however, in 4 of these data were assessed by an independent imaging-review committee reducing the potential source of bias. Additionally, most of the studies (14) failed to report the method of treatment allocation concealment.

The publications were then assessed for data availability, and six studies were excluded. The CESAR [22], PISCES [24], PERCY Quattro [38]; Hinotsu 2013 [26], Study of Groupe Français d’Immunothérapie [39] and Boccardo 1998 [19] studies did not report OS/PFS HRs. Overall, 13 studies were retained for the NMA, providing data for pazopanib, bevacizumab, IFN, sunitinib, sorafenib, temsirolimus, cabozantinib, and tivozanib, while IL-2 was dropped due to lack of available data.

3.3 Characteristics of Study Populations

Across all trials, patients had similar median age (~ 60 years), and most of the patients included in the studies were male. In studies with available ethnicity data, most of the patients were Caucasian. The baseline risk characteristics of patients in the 13 included studies are summarized in Table 2. For studies with ECOG (Eastern Cooperative Oncology Group) performance data available, most of the patients were in status 0 or 1. Only the CABOSUN and TORAVA (sunitinib versus bevacizumab plus temsirolimus versus bevacizumab plus IFN) studies included more than 10% of the population in ECOG status 2, while other studies included less than 5% of the population in this performance category. The CABOSUN study used the IMDC risk category instead of MSKCC. The NCT00117637, SWITCH, TARGET, and CROSS-J-RCC studies had only 0–1% of poor prognosis patients, which was rather low compared to other studies. In contrast, study NCT00065468 had rather high proportions of poor-prognosis patients (69% and 76% in the temsirolimus and IFN arms respectively). Overall, patients included in the NCT00065468 and CABOSUN studies had the least favorable prognoses, while those in the NCT00117637, SWITCH, TARGET, and CROSS-J-RCC studies had the best prognosis profiles. These differences in risk profiles warranted separate analyses in intermediate or poor risk sub-groups. In other respects, the patient characteristics were similar between the interventions.

Table 2 Baseline patient risk characteristics of studies included in the network meta-analysis

The NMA was based on available HR data for OS and PFS as summarized in Table 3. Two studies had a cross-over design [23, 25]. Three studies enrolled pre-treated and treatment-naïve patients [27, 30, 31] but provided OS HR (n = 5) and PFS HR (n = 4) for the subgroup patients with no prior treatments; these HRs were used for the NMA. Two of the included studies were phase II studies (CABOSUN, NCT00117637), while the rest were phase III studies.

Table 3 OS/PFS study outcomes in studies included in the network meta-analysis

3.4 Network Meta-Analysis of OS and PFS

PFS outcomes were significantly increased for cabozantinib compared to all treatments in intermediate and poor-risk subgroups (see  Figs. 2, and 3). In intermediate-risk patients, hazard ratios [HRs] were 0.52 (95% CI: 0.33, 0.82), 0.46 (95% CI: 0.26, 0.8), 0.2 (95% CI: 0.12, 0.36), and 0.37 (95% CI: 0.2, 0.68) when cabozantinib was compared with sunitinib, sorafenib, IFN, or bevacizumab plus IFN, respectively. In poor-risk patients, the NMA also demonstrated significantly longer PFS for cabozantinib; HRs were 0.31 (95% CI: 0.11, 0.9), 0.22 (95% CI: 0.06, 0.87), 0.16 (95% CI: 0.04, 0.64), and 0.20 (95% CI: 0.05, 0.88) when cabozantinib was compared with sunitinib, temsirolimus, IFN, or bevacizumab plus IFN, respectively.

Fig. 2
figure 2

PFS network meta-analysis forest plots –– intermediate-risk group Bev: bevacizumab; HR: hazard ratio; IFN: Interferon; PFS: progression-free survival

Fig. 3
figure 3

PFS network meta-analysis forest plots –– poor risk-group Bev: bevacizumab; HR: hazard ratio; IFN: interferon; PFS: progression-free survival

The overall study populations in studies other than CABOSUN and ARCC/NCT00065468 included favorable-risk patients. When the HRs for the overall study populations were compared, PFS outcomes were significantly improved for cabozantinib compared to all treatments (Fig. 4). PFS HRs most strongly favored cabozantinib over the following comparators: IFN (HR = 0.24; 95% CI: 0.14, 0.38); temsirolimus (HR = 0.32; 95% CI: 0.19, 0.54); bevacizumab + IFN (HR = 0.35; 95% CI: 0.21, 0.57); and sorafenib (HR = 0.36; 95% CI: 0.23, 0.58). The HR for PFS also significantly favored cabozantinib over both pazopanib and sunitinib (HR = 0.48; 95% CI: 0.3, 0.75; and HR = 0.48; 95% CI: 0.31, 0.74, respectively). The HRs for OS consistently favored cabozantinib over comparators in subgroup and overall study population analyses, although these findings were not statistically significant (Figs. 5, 6, and 7). Results for placebo were calculated, because placebo provided a link in the network between sorafenib and pazopanib. However, placebo results are not a focus in this NMA, because BSC is not currently recommended as a treatment strategy for first-line treatment of aRCC [11].

Fig. 4
figure 4

PFS network meta-analysis forest plots –– overall-risk group Bev: bevacizumab; HR: hazard ratio; IFN: interferon; PFS: progression-free survival

Fig. 5
figure 5

OS network meta-analysis forest plots –– intermediate-risk group. Bev: bevacizumab; HR: hazard ratio; IFN: interferon; OS: overall survival

Fig. 6
figure 6

OS network meta-analysis forest plots –– poor-risk group. Bev: bevacizumab; HR: hazard ratio; IFN: interferon; OS: overall survival

Fig. 7
figure 7

OS network meta-analysis forest plots –– overall-risk group. Bev: bevacizumab; HR: hazard ratio; IFN: interferon; OS: overall survival

4 Discussion

We herein report results from a systematic review and NMA conducted to indirectly compare cabozantinib with standard-of-care treatments used in the first-line treatment of aRCC patients. Because the risk profiles of patients varied across the trials, the NMA was conducted separately in intermediate- and poor-risk subgroups. We found that cabozantinib improves PFS significantly in intermediate- and poor-risk patients versus all comparators. While the OS results are not statistically significant, they nominally favor cabozantinib in both intermediate- and poor-risk populations. Most of the studies included in the network were phase III, but two of the studies were phase II; CABOSUN (cabozantinib versus sunitinib) and NCT00117637 (sorafenib versus IFN). Sorafenib and IFN comparisons are additionally informed by phase III studies. However, CABOSUN has a similar design to other contemporary trials in aRCC; it is a randomized, multicentre, open-label, active-controlled trial comparing efficacy and safety of cabozantinib versus sunitinib. In the overall study populations, including mostly studies that include favorable-risk patients, the results for cabozantinib are consistent with the results in more homogenous analyses of intermediate-, and poor-risk patients. While the subgroup analyses thus provide opportunities of comparisons in less heterogeneous study populations, the overall analyses allow more studies to be included in the network. In summary, the conclusions remain the same whether intermediate, poor, or overall populations are compared: PFS results are significantly better for cabozantinib, and OS results favor it but are not statistically significant. CABOSUN study prognostic risk was based on the IMDC model, whereas in other studies the prognostic risk was determined by using the MSKCC criteria. However, compared with other models [7], the IMDC model has a wider use and improved prognostic value, as acknowledged in the most recent ESMO guidelines [11]. It is valid both for previously untreated and treated patients and for non-clear cell RCC [47]. The various risk models are based on a similar set of prognostic criteria, and for the purposes of this analysis can be considered to be interchangeable.

Other, previous NMAs have compared first-line treatments for patients with aRCC. Leung et al. (2014) have conducted an NMA of data available prior to August 2013 using similar methodology to that of the current study (i.e., comparing the logarithm of the HRs [15]) [48]. Their results suggest that sunitinib and axitinib improve PFS over sorafenib, pazopanib and temsirolimus, although not all comparisons are statistically significant. Unlike the current study, the Leung et al. analysis includes patients with an ECOG performance status of 0 or 1, and some patients after nephrectomy or prior cytokine therapy. While our analysis does not include axitinib, their finding that sunitinib improves PFS compared to sorafenib, pazopanib, and temsirolimus is consistent with our analysis. More recent NMA studies have evaluated the available data for first-line antiangiogenic therapies in aRCC. An analysis conducted by Rousseau et al. (2016) evaluated the benefit of first-line treatments in aRCC using a direct weighted-average meta-analysis and an NMA using Bayesian hierarchical models with random effects [49]. Results of the direct meta-analysis demonstrate significant improvement in PFS for patients treated with sunitinib, pazopanib, axitinib, and bevacizumab plus IFN compared to placebo or IFN. Their NMA results show no significant differences among antiangiogenic drugs for 6-month PFS or 1-year OS. Chang et al. (2016) conducted an NMA using Bayesian hierarchical random effects models to compare efficacy and safety of 12 different treatment arms among 7597 patients with aRCC [50]. In contrast to our study, this study does not include cabozantinib and considers different comparators. Chang et al. reported sunitinib to be the best treatment modality in terms of PFS (rank probability value = 2.36) and safety (rank probability value = 7.43). The Chang et al. study did not select patients based on prognostic risk category, but the current study suggests that cabozantinib may be more efficacious than sunitinib in the first-line treatment of both intermediate- and poor-risk patient groups. Real-world studies of first-line treatment of aRCC have also been published [51, 52]. In our analysis, OS results for all groups (overall, intermediate and poor risk) were consistent with the findings in study by Lalani et al. (2017) [51]; i.e., sunitinib was more effective than pazopanib in our analysis. Basappa et al. (2017) [52] found that sunitinib given according to product monograph showed no difference in OS when compared to pazopanib given according to product monograph. An alternative regimen (individualized sunitinib dosing) was the most effective of the three treatment approaches compared. Lalani et al. and Basappa et al. also assessed time to treatment failure (TTF), whereas we analyzed PFS. Nevertheless, neither of the real-world studies showed a difference in TTF, consistent with our analysis where no difference in PFS was observed between sunitinib and pazopanib.

Several limitations of our study should be noted. As with all systematic reviews, there is potential for bias in the study selection. To minimize this risk, the literature review was conducted according to the Cochrane Handbook for Systematic Reviews of Interventions [53]. Inclusion/exclusion criteria were pre-defined, and two independent reviewers conducted the work. Another weakness is that data for treatment-naïve patients were not available from all studies. Therefore, findings from the TARGET study were excluded from the OS network. To identify treatment-naïve OS results for the TIVO-1 study (Motzer 2013), additional manual searches of HTA agency websites were required [42]. Not all studies reported intermediate- and poor-risk group HRs. Four comparisons were possible in the intermediate-risk group (OS, PFS), and five and four comparisons were possible in the poor-risk group with OS and PFS endpoints respectively.

A limitation of the overall-population analysis is the difference between the studies regarding the proportions of patients with favorable-, intermediate-, and poor-risk factors. The CABOSUN study, by design, only included patients with intermediate and poor risk, whereas most other studies included patient populations with favorable-risk profiles. For this reason, in addition to the overall population analysis, we have performed comparisons in intermediate-, and poor-risk subgroups. The results of all analyses were consistent. The current study also has several strengths. The study quality of selected studies was systematically appraised using the NICE checklist, and studies were mostly considered to be of good quality, while a frequent source of potential bias was open-label design, which was reduced by involving an independent imaging-review committee in some of the studies. Our study also employed well-established HR NMA methodology, an approach that balances analysis complexity with ease of understanding and interpretation.

5 Conclusion

The current study suggests that cabozantinib is a promising first-line treatment for aRCC compared to available standard-of-care options. The results of this NMA may have clinical implications for the optimal approach to treat patients with aRCC, especially in light of the CABOSUN study, which demonstrated a clinical benefit versus sunitinib in patients with intermediate- or poor-risk aRCC. As the treatment landscape for aRCC evolves, future head-to-head clinical trials are also needed to ensure that robust clinical data are available for patients with different risk profiles across all treatment settings.