Study selection and heterogeneity assessment
The results of the three rounds of SR are summarized in Fig. 1. The initial SR covered searches from 1 January 1998 to 2 July 2013. After deduplication, 3822 records were identified for screening; from which five RCTs were identified for analysis: EMILIA [10]; GBG 26 [32]; EGF100151 [38]; Martin et al. [39]; and CEREBEL [17]. The first update of the SR encompassed searches from 1 October 2012 to 30 June 2016. After deduplication, 3401 records were identified for screening, from which updates to the original five RCTs and one new RCT were identified: PHEREXA [26]. The second update to the SR covered searches from 1 January 2016 to 3 January 2018. After deduplication, 2923 records were identified for screening, from which updates and one further RCT, ELTOP [40], were identified.
Overall, seven RCTs met the criteria for inclusion in the NMA: EMILIA [10]; GBG 26 [32]; EGF100151 [38]; a phase 2 trial of neratinib versus lapatinib plus capecitabine (Martin et al. 2013) [39]; PHEREXA [26]; ELTOP [40]; and a prior trastuzumab treatment subgroup in CEREBEL [17]—although CEREBEL did not meet the inclusion criteria, it included a subgroup defined as “patients who received prior trastuzumab either in adjuvant or metastatic setting” and was thus included in the NMA. The phase 3 TH3RESA study [41] that evaluated T-DM1 was excluded from the NMA because of limitations that its inclusion would impose i.e., only patients on certain treatment regimens in the comparator arm “Physician’s choice” were relevant to the NMA, and disaggregating data on particular treatments from this arm would have broken randomization. Thus, selecting a particular treatment from “Physician’s choice” would introduce bias and break the fundamental principle of NMA which relies on randomized evidence.
Eligibility criteria for inclusion in the NMA are summarized in Online Resource 2, and trial information and patient characteristics at baseline in the trials, which were included in the analysis, are summarized in Table 1. Although there were differences in population size and treatment line among studies, heterogeneity assessment indicated that all trials were comparable in terms of randomization, allocation concealment, demographic and baseline characteristics, outcome selection and reporting, patient withdrawal from the studies, and statistical analyses undertaken (Online Resource 3). In total, there were five phase 3 studies and two phase 2 studies. All were open-label, but EMILIA and EGF100151 used independent review committees to assess outcomes and, therefore, the outcome assessors were blinded to study treatment. Patients from all studies had been treated previously with trastuzumab; however, only results from a “prior trastuzumab treatment” subgroup were included from the CEREBEL study. Results of critical appraisal of trials are presented in Online Resource 3.
Table 1 Trial methodologies and baseline characteristics Treatment networks
Seven studies reported data for OS, for OS adjusted for treatment crossover, and for PFS (Fig. 2a); six studies for ORR as data were not reported in CEREBEL (Fig. 2b). Various treatment network plots were generated for the safety endpoints (Online Resource 6). Six studies were linked to network plots of treatment discontinuation (due to AEs), diarrhea, neutropenia, and ALT. Five studies were linked to plots for fatigue, nausea, and vomiting, four studies were included for AEs (grade 3 and above) and AST, and two studies were linked for serious AEs (Online Resource 6).
Model selection
The Bayesian random-effects model was the base-case analysis, and was preferred over the fixed-effects model for all endpoints to account for heterogeneity among the included studies. Between-study variance cannot be estimated owing to the small number of available studies and assuming homogeneity was considered to be implausible. Hence, informative priors based on the best empirical evidence were used instead [42]. Convergence statistics for the random-effects model are shown in Online Resource 8, and convergence plots are presented in Online Resource 10. For completeness, results obtained with the fixed-effects model are shown in Online Resource 9.
Overall survival
The HR data for cross-comparison of treatments are summarized for OS (the primary analysis) and for OS adjusted for treatment crossover (sensitivity analysis) in Table 2 and Fig. 3.
Table 2 Cross tabulation of treatment HR (95% CrI) for OS, OSX, and PFS Primary analysis
T-DM1 was associated with a trend towards greater OS benefit than all other approved treatments, although the wide CrIs reflect uncertainty around the comparisons. Greater OS benefit with T-DM1 was also demonstrated by the SUCRA ranking (first), compared with other approved treatments: (1) T-DM1, (2) pertuzumab plus trastuzumab plus capecitabine (unapproved combination), (3) trastuzumab plus capecitabine, (4) lapatinib plus capecitabine, (5) capecitabine, and (6) neratinib.
Sensitivity analysis
In the sensitivity analysis, adjusted estimates of OS were available for both EMILIA and EGF100151 (Online Resource 5) [10, 36]. For EMILIA, the treatment crossover-adjusted HR for OS was 0.69 (95% confidence interval [CI] 0.59, 0.82) using RPSFTM. In EGF100151, the adjusted HR for OS in which treatment crossover was used as a time-dependent covariate was 0.80 (95% CI 0.64, 0.99). Intention-to-treat estimates of OS were used for the other five studies, as in the primary analysis [17, 32, 38,39,40]. Sensitivity analysis results were generally similar to the base-case analysis, with a numerically greater OS benefit for T-DM1 than for the other treatments (Table 2 and Fig. 3).
Progression-free survival
Cross-comparison, between-treatment HRs for PFS are also summarized in Table 2 and Fig. 3. The analysis indicated that the likelihood of PFS benefit was greater with T-DM1 than with any of the other comparator treatments. The SUCRA ranking was also greater for T-DM1 (first) than for the other approved treatments: (1) T-DM1, (2) pertuzumab plus trastuzumab plus capecitabine, (3) lapatinib plus capecitabine, (4) trastuzumab plus capecitabine, (5) neratinib, and (6) capecitabine.
Overall response rates
Comparisons of ORR with T-DM1 and with other treatments showed that T-DM1 was associated with a more favorable ORR than all comparator treatments, and was more efficacious than capecitabine, lapatinib plus capecitabine, and neratinib (Fig. 4). Consistent with this finding, the SUCRA ranking was greatest for T-DM1 compared with the other approved treatments: (1) T-DM1, (2) pertuzumab plus trastuzumab plus capecitabine, (3) trastuzumab plus capecitabine, (4) lapatinib plus capecitabine, (5) neratinib, and (6) capecitabine.
Adverse events (grade 3 and above)
The ORs for the likelihood of various AEs occurring with T-DM1 compared with the different comparator treatments are summarized in Fig. 5. Treatment discontinuation due to an AE of grade 3 and above was less likely with T-DM1 than with other treatments that could be compared (there was no link between neratinib and T-DM1 in the network, and these therapies could not be compared), and discontinuation due to any AE was less likely with T-DM1 than with all other treatments except for neratinib. The SUCRA rankings for discontinuation due to an AE of grade 3 and above for approved treatments were: (1) T-DM1, (2) pertuzumab plus trastuzumab plus capecitabine, (3) trastuzumab plus capecitabine, (4) lapatinib plus capecitabine, and (5) capecitabine. The likelihood of serious AEs was lower with T-DM1 than with neratinib, or lapatinib plus capecitabine; no comparison with other treatments was possible (Fig. 5a).
The ORs indicated a substantially lower risk of diarrhea associated with T-DM1 than with other treatments, and this difference was reflected by the SUCRA rankings: (1) T-DM1, (2) capecitabine, (3) trastuzumab plus capecitabine, (4) lapatinib plus capecitabine, (5) pertuzumab plus trastuzumab plus capecitabine, and (6) neratinib. Most ORs for fatigue, nausea, vomiting, and neutropenia favored T-DM1 over other treatments; however, there was higher uncertainty for lower risk of vomiting with T-DM1 than with neratinib, or lapatinib plus capecitabine (Fig. 5b).
ORs indicated that increased AST was more likely with T-DM1 than with other treatments, and was least likely with capecitabine. The SUCRA rankings were: (1) capecitabine, (2) lapatinib plus capecitabine, (3) neratinib, (4) trastuzumab plus capecitabine, and (5) T-DM1. The risk of increased ALT with T-DM1 was higher than with trastuzumab plus capecitabine, lapatinib plus capecitabine, or capecitabine.
Effect size could not be quantified for mucosal inflammation. T-DM1 was consistently better than the comparators, but no mucosal inflammation events were reported in one of the cohorts in study EGF100151; removal of this study disconnected the network, preventing further analysis. Similarly, the absence of events in one arm of study GBG 26 prevented further analysis of thrombocytopenia; T-DM1 was consistently worse than the comparators but this could not be quantified, even when GBG 26 was removed from the network. T-DM1 was also consistently worse than comparators in terms of anemia events. The absence of such AEs in one arm of GBG 26 afforded the possibility to re-analyze with that study excluded; OR estimates for T-DM1 are shown in Online Resource 7. Finally, no effect size could be estimated for PPE, owing to PPE being a rare event in EMILIA, thereby preventing estimation of an OR.