Introduction

Globally, breast cancer (BC) is the most common cancer and is the leading cause of tumor-associated mortalities among women. It accounts for about 30% of female cancer incidences and 15% of all cancer deaths [1,2,3]. Incidence and mortality rates of BC have been consistently increasing, with about 0.5% and 0.7 per 100,000 annual growth, respectively [3, 4]. Moreover, as a heterogeneous disease, BC has four major molecular subtypes divided by gene expressions [5]. Among them, the BC of human epidermal growth factor receptor 2− (HER2−) overexpression occurs in about 20–30% of BC and is associated with poor prognoses [6].

With a length of 185 ku transmembrane glycoprotein with tyrosine kinase activities, HER2 is encoded by proto-oncogenes on chromosome 17q21 and consists of 1255 amino acids [7]. Amplification of the HER2 gene is one of the most important factors affecting breast cancer growth and metastasis. After HER2 gene activation, tumor cells can be stimulated by inhibiting apoptosis, promoting their proliferation, increasing their invasiveness, and promoting angiogenesis as well as lymph angiogenesis [8]. Therefore, HER2 is an independent and powerful prognostic indicator for clinical monitoring of breast cancer therapy and is also an important target for tumor-targeted drug selection. Breast cancer patients with HER2 overexpression are characterized by rapid disease progression, short remission period of chemotherapy, poor effects of endocrine therapy, low disease-free survival (DFS), and overall survival (OS) rates [9]. Therefore, recently, targeted therapy for HER2 has been the focus of targeted therapy for breast cancer [10].

Trastuzumab (T), as the gold standard for HER2-positive BC treatment, was the first-line HER2 targeted drug to be approved by the Food and Drug Administration (FDA) and is also the first humanized monoclonal antibody to be approved for HER2-positive BC [11, 12]. Trastuzumab binds HER2’s perimembrane extracellular subdomain IV and exerts antitumor activities through various mechanisms, including inhibiting signal transduction and regulating antibody-dependent cell-mediated cytotoxicity. Moreover, trastuzumab induces internalization and degradation of the HER2 receptor, attracting cytotoxic immune cells into the tumor microenvironment, inhibiting cell growth and proliferation signaling, and ultimately killing tumor cells [13, 14]. In a phase III HERA clinical trial involving 5102 HER2-positive women with early-stage breast cancer, trastuzumab-treated patients exhibited significantly reduced hazard ratios for disease-free survival events (HR=0.76) [15]. Although trastuzumab has changed the paradigm for HER2-positive breast cancer treatment and significantly improved patients’ prognosis, about 35% of patients have natural resistance, and about 70% of patients who initially respond to trastuzumab treatment progress to metastatic disease and develop resistance within 1 year [12, 16]. Moreover, trastuzumab-associated cardiac toxicity limits its clinical applications. Therefore, additional treatments are needed to provide these patients with further clinical benefits.

Lapatinib (L) is a tyrosine kinase inhibitor (TKI) that exerts its anti-tumor effects by competing with intracellular ATP to block the HER2 signal, thereby blocking phosphorylation and downstream changes in molecular pathways [17]. Because of its different mechanisms of action with monoclonal antibodies, it may have some advantages in overcoming drug resistance [18]. In an Alternative III clinical study, patients treated with lapatinib + trastuzumab + aromatase inhibitors (AIs) exhibited significantly longer median progression-free survival (PFS) outcomes than patients treated with trastuzumab + AI (11 months vs. 5.6 months). Moreover, lapatinib + AI-treated patients exhibited longer median PFS than those treated with trastuzumab + AI (8.3 months vs. 5.6 months) [19]. However, in an ALLTO trial [20], the efficacy of lapatinib was inferior to that of trastuzumab. The combination of trastuzumab with lapatinib therapy has also been reported to be more efficacious, relative to trastuzumab therapy. The CHER-Lob and TRIO-US B07 proved that trastuzumab plus lapatinib treatment has a better pathologic complete response (pCR) outcome [21, 22]. However, ALTTO showed that with regard to disease-free survival (DFS), there were no marked differences among trastuzumab plus lapatinib, trastuzumab, and lapatinib therapy groups, with the combination group exhibiting a higher toxicity [20]. There, it has not been conclusively determined whether efficacies of trastuzumab plus lapatinib or lapatinib therapy are not inferior to trastuzumab therapy.

Therefore, we determined whether trastuzumab plus lapatinib or lapatinib therapy is no-inferiority to trastuzumab therapy in HER-positive breast cancer.

Materials and methods

Study design

This study was performed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines (PRISMA) [23] and registered in PROSPERO (CRD42021285865).

Search strategy

Two researchers (YY and LXM) searched relevant studies from PubMed, Embase, the Cochrane library, CNKI, Wan Fang, and Sinomed databases. The Chinese search terms were “ruxianai,” “ruxianzhongliu,” “ruai,” “lapatini,” and “qutuozhudankang.” The English search terms are shown in Table 1.

Table 1 PubMed search strategy of lapatinib vs. trastuzumab therapy for HER2-positive breast cancer

Inclusion criteria

The inclusion criteria in this study were as follows: (i) patients with HER2-positive (3+ staining with immunohistochemistry or/and fluorescent in situ hybridization (FISH) positive) breast cancer based on clinical, histological, or pathological diagnosis; (ii) treatment of T, L, or T + L arms with chemotherapy combined with trastuzumab, lapatinib, or trastuzumab combined with or followed by lapatinib; (iii) primary outcomes were OS, DFS/event-free survival (EFS), and PFS while secondary outcomes were pCR (ypT0/is ypN0), pCR (ypT0/is ypN0/+), overall response rate (ORR), disease control rate (DCR), rate of breast-conserving surgery (BCS), recurrence-free survival (RFS), cardiac toxicities, and other toxicities; and (iv) randomized controlled trials (RCTs).

Exclusion criteria

The exclusion criteria were as follows: (i) studies with different chemotherapies among different arms; (ii) conference abstracts and letters among others; and (iii) studies without available outcomes.

Data extraction and quality assessment

Two researchers extracted the relevant information using a predefined data extraction table, containing literature basic information (trial name, title, author, registration number, publication year), demographic information (number of participants in L + T arm, L arm, and T arm, percentage and number of hormone receptor-positive and hormone receptor-negative participants, tumor stage and diagnosis of patients, inclusion and exclusion criteria), intervention feature information (duration and dose of chemotherapy and anti-HER 2 therapy), and methodological elements (random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, selective reporting, and other bias). The quality of trials was assessed by two researchers using the risk of bias tool of The Cochrane Collaboration [24]. Any disagreements were resolved by discussions with a third researcher.

Statistical analysis and evidence quality assessment

We used RevMan 5.3 and Stata 14 for all data analyses. A meta-analysis was performed according to the anti-HER2 regimen (L + T versus T or L versus T), respectively. Pooled hazard ratios (HRs) were estimated for survival outcomes, including OS, DFS/EFS, RFS, and PFS while risk ratios (RRs) were determined for dichotomous outcomes, including pCR, ORR, rate of BCS, cardiac toxicities, and other toxicities with 95% confidence intervals (CIs) using the inverse variance or Mantel–Haenszel methods [24]. Heterogeneity was assessed by the χ2 test and I2 statistics. A fixed-effects model was used to analyze all effect quantities in this study.

Subgroup analysis was performed using a random-effects model based on the following conditions: tumor stage (I–III or metastatic breast cancer (MBC)), hormone receptor (HR) status, or treatment type (neoadjuvant, adjuvant, or palliative treatment). Sensitivity analysis was performed to identify heterogeneity of main outcomes using the leave-one-out procedure. In addition, a sensitivity analysis was performed to ensure if any of the results were affected by the change of model. After removing obvious heterogeneity studies, a fixed-effects model was used to analyze effect quantities. Publication bias was detected by Egger’s test and considered when p ≤ 0.05 [25]. GRADE profiler 3.6 was used to assess the quality of evidence in accordance with five aspects: risk bias, imprecision, indirectness, publication bias, and inconsistency. Evidence qualities were evaluated as high quality, medium quality, low quality, or very low quality.

Results

Study selection and characteristics

A total of 4093 entries were downloaded from Chinese and English databases. After removing duplicate records and those that were not eligible by reading titles, abstracts, and full-text articles, 21 studies [21,22,23, 26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43], including 13 RCTs, were identified (Fig. 1).

Fig. 1
figure 1

PRISMA diagram of literature searching and screening process

The included studies had been published between 2012 and 2021. Overall, the studies involved 12,024 eligible participants (T+L: 4817, L: 3570, T: 3637) whose median follow-up time varied from 21.5 months to 9 years. Ten RCTs assessed I–III stage breast cancer [21,22,23, 26,27,28,29,30,31,32,33,34,35,36, 40,41,42,43], while 3 RCTs assessed metastatic breast cancer [37,38,39]. Nine RCTs, including 7 RCTs with dual HER2 blockade [22, 23, 27,28,29,30,31,32,33,34, 40, 42, 43] and 2 RCTs with single HER2 blockade [35, 36, 41] assessed the role of the anti-HER2 therapy in a neoadjuvant setting [22, 23, 27,28,29,30,31,32,33,34,35,36, 40,41,42,43], 1 RCT assessed the dual HER2 blockade in an adjuvant setting [21, 26], and 3 RCTs assessed the single HER2 blockade in a palliative setting [37,38,39]. Characteristics of the included studies are shown in Table 2. More details are shown in Additional file 3: Table S1.

Table 2 Main characteristics of the selected studies

Quality assessment of the included studies

Two RCTs (CALGB 40601 and WJOG6110B/ELTOP) did not describe random sequence generation, while 7 RCTs (ALTTO, CALGB 40601, LPT109096, NCIC CTG MA.31, TRIO-US B07, WJOG6110B/ELTOP, and CEREBEL) did not make detailed illustrations of allocation concealment. Blinding of participants and personnel was not adopted in any of the RCTs. Two RCTs (NeoALTTO and GeparQuinto) adopted blinding of outcome assessment. One RCT (LPT109096) did not report on complete outcome data, while all RCTs were free from reporting bias and 2 RCTs (ALTTO and CHER-Lob) got unclear risk of bias of other bias. Details of risk of bias are shown in Figs. 2 and 3.

Fig. 2
figure 2

Risk of bias graph

Fig. 3
figure 3

Risk of bias summary

Primary outcomes

Overall survival

Eight trials [21, 22, 26, 29, 31, 35, 38, 39] reported data on OS (calculated from randomization to death from any cause or last follow-up) for pooling in meta-analysis. Data from WJOG6110B/ELTOP [37], in which participants were all previously treated with trastuzumab with progression, were excluded. Heterogeneity tests of p = 0.46, I2=0, and p=0.22, I2=29%, were tested in OS (T+L vs. T) and OS (L vs. T). The T+L arm showed significant improvements in OS, compared to the T arm (HR: 0.84, 95% CI: 0.73–0.97, p=0.02; Fig. 4). The L arm showed a markedly lower efficacy with regard to OS, compared to T arm (HR: 1.26, 95% CI: 1.08–1.46, p =0.003; Fig. 4).

Fig. 4
figure 4

Meta-analysis of OS. ALTTOa: trastuzumab followed by lapatinib group

Subgroup analysis

The L arm showed no statistical significance in OS of patients with neoadjuvant therapy, compared to T arm (HR: 0.85, 95% CI: 0.60–1.20, p =0.36; Fig. 5). The L arm shows no statistical significance in OS of patients with palliative therapy, compared to T arm (HR: 1.40, 95% CI: 1.10–1.80, p =0.007; Fig. 5). Subgroup differences were found (interaction test, p = 0.02).

Fig. 5
figure 5

Subgroup analysis of OS in accordance with therapy setting (L vs. T)

Progression-free survival

A total of 2 trials [38, 39] provided data on PFS (defined as time from randomization to disease progression) for pooling in the meta-analysis. Data from WJOG6110B/ELTOP [37], in which participants were previously treated with trastuzumab with progression, were excluded. The heterogeneity test of p=0.58, I2=0 in PFS (L vs. T) did not reveal heterogeneity. Compared to the T arm, the L arm showed a lower efficacy with regard to PFS (HR: 1.35, 95% CI: 1.11–1.64, p=0.002; Fig. 6).

Fig. 6
figure 6

Meta-analysis of PFS

Disease-free survival/event-free survival

Four trials [21, 26, 31, 35] involving stage I–III patients provided data on DFS/EFS (defined as the time from randomization to recurrence of invasive breast cancer at local, regional, or distant sites; contralateral invasive breast cancer; second non-breast malignancy; or death as a result of any cause, whichever occurred first) for pooling in the meta-analysis. As reported by CHER-Lob [22], RFS was defined as the time from randomization to breast cancer recurrence (loco regional or distant; contralateral BC excluded) or death from any cause, whichever occurred first, which is similar to the definition of DFS/EFS. Therefore, RFS in CHER-Lob [22] was also included for pooling in the meta-analysis. Heterogeneity test of p=0.35, I2=9% in DFS/EFS (T+L vs. T) revealed low heterogeneity. The T+L arm showed significant improvements in DFS/EFS, compared to the T arm (HR: 0.89, 95% CI: 0.80–0.98, p=0.02; Fig. 7). Heterogeneity test of p=0.20, I2=36% in DFS/EFS (L vs. T) showed low heterogeneity. Compared to the T arm, the L arm showed a markedly low efficacy with regard to DFS/EFS (HR: 1.22, 95% CI: 1.05–1.41, p =0.008; Fig. 7).

Fig. 7
figure 7

Meta-analysis of DFS/EFS. ALTTOa: trastuzumab followed by lapatinib group

Secondary outcomes

pCR (ypT0/is ypN0)

Nine trials [22, 23, 30, 34, 36, 40,41,42,43] with neoadjuvant therapy provided data on pCR (ypT0/is ypN0) (defined as the absence of residual invasive tumor in breast and axillary nodes) for pooling in the meta-analysis. Heterogeneity test of p=0.32, I2=14% in pCR (T+L vs. T, ypT0/is ypN0) revealed a low heterogeneity. The T+L arm showed significant improvements in pCR (ypT0/is ypN0), compared to the T arm (RR: 1.27, 95% CI: 1.13–1.43, p <0.0001; Fig. 8). Heterogeneity test of p=0.24, I2=22% in pCR (L vs. T, ypT0/is ypN0) showed a low heterogeneity. The T arm showed significant improvements in pCR (ypT0/is ypN0), compared to the L arm (RR: 0.73, 95% CI: 0.65–0.83, p <0.00001; Fig. 8).

Fig. 8
figure 8

Meta-analysis of pCR(ypT0/is ypN0)

pCR (ypT0/is ypN0/+)

Six trials [29, 34, 36, 40,41,42] with neoadjuvant therapy provided data on pCR (ypT0/is ypN0/+) (defined as the absence of residual invasive tumor in the breast) for pooling in the meta-analysis. Heterogeneity test of p=0.14, I2=45%, in pCR (T+L vs. T, ypT0/is ypN0/+) showed a low heterogeneity. The T+L arm had significant improvements in pCR (ypT0/is ypN0/+), compared to the T arm (RR: 1.31, 95% CI: 1.16–1.49, p<0.0001; Fig. 9). Heterogeneity test of p=0.05, I2=54%, in pCR (L vs. T, ypT0/is ypN0/+) showed a high heterogeneity. The T arm showed significant improvements in pCR (ypT0/is ypN0/+), compared to the L arm (RR: 0.79, 95% CI: 0.70–0.89, p<0.0001; Fig. 9).

Fig. 9
figure 9

Meta-analysis of pCR(ypT0/is ypN0/+)

Recurrence-free survival

Two trials [21, 29] provided data on RFS (defined as the interval from surgery to ipsilateral invasive breast tumor recurrence, regional recurrence, distant recurrence, or death from any cause, whichever occurred first) for pooling in the meta-analysis. Data from CHER-Lob [22] and GeparQuinto [35] were not pooled in the meta-analysis because RFS was defined as the time from randomization. Heterogeneity test of p=0.02, I2=82%, in RFS (T+L vs. T) showed a high heterogeneity. The T+L arm showed significant improvements in RFS, compared to the T arm (HR: 0.83, 95% CI: 0.72–0.96, p =0.01; Fig. 10). CALGB 40601 (30) did not find significant differences between L and T arms with regard to RFS (HR: 1.50, 95% CI: 0.82–2.77, p =0.19; Fig. 10).

Fig. 10
figure 10

Meta-analysis of RFS

Overall response rate

Eight trials [34, 36,37,38, 40,41,42,43] provided data on ORR (based on the World Health Organization (WHO) criteria or the Response Evaluation Criteria in Solid Tumors (RECIST)) for pooling in the meta-analysis. Data from NCIC CTG MA.31 [39] was not pooled in the meta-analysis because HER2-positive and HER2-negative patients were assessed in ORR together. Data from WJOG6110B/ELTOP [37], in which participants were all previously treated with trastuzumab with progression, were excluded. Heterogeneity test of p=0.08, I2=56% in ORR (T+L vs. T) showed a high heterogeneity. Differences in ORR between the T and T+L arms were insignificant (RR: 1.02, 95% CI: 0.96–1.09, p=0.53; Fig. 11). Heterogeneity test of p=0.66, I2=0%, in ORR (L vs. T) did not reveal any heterogeneity. Differences in ORR between T and L arms were insignificant (RR: 0.98, 95% CI: 0.93–1.03, p=0.41; Fig. 11).

Fig. 11
figure 11

Meta-analysis of ORR

Disease control rate

Four trials [35, 37, 38, 40] provided data on DCR (based on the World Health Organization (WHO) criteria or the Response Evaluation Criteria in Solid Tumors (RECIST)) for pooling in the meta-analysis. Data from NCIC CTG MA.31 [39] were not pooled in the meta-analysis because HER2-positive and HER2-negative patients were assessed in DCR together. Data from WJOG6110B/ELTOP [37], in which participants were all previously treated with trastuzumab with progression, were excluded. EORTC 10054 [40] reported that T+L and T arms had comparable DCR rates (Fig. 12). Heterogeneity test of p=0.44, I2=0%, in DCR (L vs. T) did not reveal any heterogeneity. Differences in DCR between the L and T arms were insignificant (RR: 0.96, 95% CI: 0.90–1.01, p=0.13; Fig. 12).

Fig. 12
figure 12

Meta-analysis of DCR

Rate of breast-conserving surgery

Six trials [27, 33, 36, 40,41,42] with neoadjuvant therapy reported data on BCS rates for pooling in the meta-analysis. Heterogeneity test of p=0.69, I2=0%, in BCS (T+L vs. T) did not reveal any heterogeneity. Differences in BCS rates between T and T+L arms were not significant (RR: 1.01, 95% CI: 0.88–1.15, p=0.94; Fig. 13). Heterogeneity test of p=0.32, I2=14%, in BCS (L vs. T) showed a low heterogeneity. Differences in BCS rates between the L and T arms were insignificant (RR: 0.94, 95% CI: 0.86–1.04, p=0.24; Fig. 13).

Fig. 13
figure 13

Meta-analysis of rate of BCS

Cardiac toxicities

Nine trials [23, 26, 27, 30, 34, 36, 37, 40, 42] provided data on cardiac toxicities (congestive heart failure (CHF) and decline of left ventricular ejection fraction (LVEF). CHF was defined as Cardiac dysfunction New York Heart Association Class, severe CHF, symptomatic CHF, or confirmed CHF. LVEF decline was defined as reported by the authors of the included studies because different thresholds were used. Data from NCIC CTG MA.31 [39] and GEICAM/2006-14 [41] were not pooled in the meta-analysis because HER2-positive and HER2-negative patients were assessed together. Heterogeneity test of p=0.04, I2=65%, in CHF (T+L vs. T) showed high heterogeneity. Differences in CHF between the T and T+L arms were insignificant (RR: 0.95, 95% CI: 0.73–1.23, p=0.71; Fig. 14). Heterogeneity test of p=0.08, I2=52%, in LVEF decline (T+L vs. T) showed high heterogeneity. Differences in LVEF decline between the T+L and T arms were insignificant (RR: 0.82, 95% CI: 0.67–1.01, p=0.06; Fig. 14). Heterogeneity test of p=0.12, I2=45%, in CHF (L vs. T) showed a low heterogeneity. Differences in CHF between the L and T arms were insignificant (RR: 0.89, 95% CI: 0.62–1.28, p=0.54; Fig. 14). Heterogeneity test of p=0.55, I2=0% in LVEF decline (L vs. T) showed no heterogeneity. Compared to the T arm, the L arm exhibited a lower incidence of LVEF decline (RR: 0.67, 95% CI: 0.50–0.90, p=0.008; Fig. 14).

Fig. 14
figure 14

Meta-analysis of cardiac toxicities. ALTTOa: trastuzumab followed by lapatinib group

Other toxicities

Data on grade III/IV toxicities reported in more than half of the trials were obtained [44]. Eleven trials [23, 26, 28, 30, 32, 36,37,38, 40, 42, 43] provided data on other toxicities (chemotherapy adverse effects were graded according to the National Cancer Institute Common Terminology Criteria for Adverse Events) for pooling in the meta-analysis. Data from NCIC CTG MA.31 [39] and GEICAM/2006-14 [41] was not pooled in the meta-analysis because HER2-positive and HER2-negative patients were assessed together.

Diarrhea

Eleven trials [23, 26, 28, 30, 32, 36,37,38, 40, 42, 43] provided data on grade III/IV diarrhea for pooling in the meta-analysis. Heterogeneity test of p=0.02, I2=57%, in diarrhea (T+L vs. T) showed high heterogeneity. Compared to the T arm, the T+L arm showed a higher incidence of grade III/IV diarrhea (RR: 8.32, 95% CI: 6.49–10.68, p<0.00001; Fig. 15). Heterogeneity test of p<0.00001, I2=81%, in diarrhea (L vs. T) showed high heterogeneity. The L arm showed a higher incidence of grade III/IV diarrhea, compared to the T arm (RR: 5.62, 95% CI: 4.41–7.17, p<0.00001; Fig. 15).

Fig. 15
figure 15

Meta-analysis of diarrhea. ALTTOa: trastuzumab followed by lapatinib group

Subgroup analysis

Division into subgroups was in accordance with I–III tumor stages [23, 26, 28, 30, 32, 36, 40, 42, 43] or MBC [37, 38]. The L arm had a higher incidence of grade III/IV diarrhea in stage I–III patients (RR: 7.90, 95% CI: 5.88–10.62, p<0.00001; Fig. 16). The L arm shows no statistical significance of grade III/IV diarrhea in MBC patients (RR: 0.99, 95% CI: 0.46–2.15, p=0.99; Fig. 16). Subgroup differences were found (interaction test, p<0.00001).

Fig. 16
figure 16

Subgroup analysis of diarrhea in accordance with tumor stage (L vs. T)

Divisions into subgroups were in accordance with treatment type, including neoadjuvant therapy [23, 28, 30, 32, 36, 40, 42, 43] and palliative therapy [37, 38]. The L arm had a higher incidence of grade III/IV diarrhea in patients with neoadjuvant therapy (RR: 6.97, 95% CI: 4.46–10.91 p<0.00001; Fig. 17). The L arm shows no statistical significance of grade III/IV diarrhea in patients with palliative therapy (RR: 0.99, 95% CI: 0.46–2.15, P=0.99; Fig. 17). Subgroup differences were found (interaction test, p<0.00001).

Fig. 17
figure 17

Subgroup analysis of diarrhea in accordance with therapy setting (L vs. T)

Neutropenia

Eight trials [23, 32, 36,37,38, 40, 42, 43] provided data on grade III/IV neutropenia for pooling in the meta-analysis. Heterogeneity test of p=0.42, I2=0%, in neutropenia (T+L vs. T) showed no heterogeneity. Differences in grade III/IV neutropenia between the T and T+L arms were insignificant (RR: 1.16, 95% CI: 0.86–1.56, p=0.33; Fig. 18). Heterogeneity test of p=0.02, I2=59%, in neutropenia (L vs. T) showed high heterogeneity. Differences in grade III/IV neutropenia between the T and L arms were insignificant (RR: 0.99, 95% CI: 0.89–1.09, p=0.82; Fig. 18).

Fig. 18
figure 18

Meta-analysis of neutropenia

Fatigue

Six trials [23, 28, 36, 38, 40, 42] provided data on grade III/IV fatigue for pooling in the meta-analysis. Heterogeneity test of p=0.71, I2=0%, in fatigue (T+L vs. T) showed no heterogeneity. Differences in grade III/IV fatigue between the T and T+L arms were insignificant (RR: 0.84, 95% CI: 0.42–1.67, p=0.62; Fig. 19). Heterogeneity test of p=1.00, I2=0%, in fatigue (L vs. T) did not reveal any heterogeneity. Differences in grade III/IV fatigue between the T and L arms were insignificant (RR: 1.44, 95% CI: 0.97–2.11, p=0.07; Fig. 19).

Fig. 19
figure 19

Meta-analysis of fatigue

Rash/skin toxicity

Nine trials [26, 28, 30, 32, 36,37,38, 40, 42] provided data on grade III/IV rash or skin toxicity for pooling in the meta-analysis. Heterogeneity test of p=0.47, I2=0%, in rash/skin toxicity (T+L vs. T) showed no heterogeneity. The T+L arm had a higher incidence of grade III/IV rash or skin toxicity, when compared to the T arm (RR: 6.75, 95% CI: 4.66–9.77, p<0.00001; Fig. 20). Heterogeneity test of p=0.54, I2=0%, in rash/skin toxicity (L vs. T) showed no heterogeneity. The L arm had a higher incidence of grade III/IV rash or skin toxicity, when compared to the T arm (RR: 8.71, 95% CI: 5.64–13.45, p<0.00001; Fig. 20).

Fig. 20
figure 20

Meta-analysis of rash/skin toxicity

Vomiting

Six trials [28, 36, 38, 40, 42, 43] provided data on grade III/IV vomiting for pooling in the meta-analysis. Heterogeneity test of p=0.99, I2=0%, in vomiting (T+L vs. T) did not reveal any heterogeneity. Differences in grade III/IV vomiting between the T+L and T arms were not significant (RR: 2.17, 95% CI: 0.91–5.19, p=0.08; Fig. 21). Heterogeneity test of p=0.82, I2=0%, in vomiting (L vs. T) did not reveal any heterogeneity. Differences in grade III/IV vomiting between the L and T arms were insignificant (RR: 1.29, 95% CI: 0.69–2.43, p=0.42; Fig. 21).

Fig. 21
figure 21

Meta-analysis of vomiting

Nausea

Six trials [28, 36, 38, 40, 42, 43] provided data on grade III/IV nausea for pooling in the meta-analysis. Heterogeneity test of p=0.52, I2=0%, in nausea (T+L vs. T) did not reveal any heterogeneity. Differences in grade III/IV nausea between the T+L and T arms were insignificant (RR: 1.61, 95% CI: 0.64–4.06, p=0.31; Fig. 22). Heterogeneity test of p=0.81, I2=0%, in nausea (L vs. T) did not reveal any heterogeneity. Differences in grade III/IV nausea between the T and L arms were insignificant (RR: 1.02, 95% CI: 0.58–1.80, p=0.94; Fig. 22).

Fig. 22
figure 22

Meta-analysis of nausea

In this manuscript, we only reported subgroup analysis with a high possibility of subgroup effects. Findings from the meta-analysis and subgroup analyses are summarized in Additional file 4: Table S2. All descriptions and forest plots from subgroup analyses, including from unreported subgroups, are shown in Additional file 1 (Figure S1–S20).

Publication bias

Publication bias were assessed for the primary outcomes. The Egger’s test did not reveal any publication bias with regard to OS (T+L vs. T) (t=−1.43, p=0.248, p>0.05) and OS (L vs. T) (t=−1.71, p=0.163, p>0.05). Since only 2 trials were included in PFS (L vs. T), the publication bias of this outcome was not determined. The Egger’s test did not reveal any publication bias with regard to DFS/EFS (T+L vs. T; t=−2.56, p=0.051, p>0.05); however, there was a publication bias with regard to DFS/EFS (L vs. T; t=−10.88, p=0.008, p≤0.05). These findings are shown in Additional file 2 (Figure S1–S4).

Sensitivity analysis

The result of DFS/EFS (L vs. T) revealed a significant difference with the previous result by the change of model [RR = 1.13, 95%CI: 0.91 to 1.42, p = 0.27, REM]. The result of RFS (T+L vs. T) revealed a significant difference with the previous result by the change of model [RR = 0.57, 95%CI: 0.22 to 1.48, p = 0.25, REM]. The heterogeneity test for pCR (L vs. T, ypT0/is ypN0/+; p = 0.05, I2 = 54%) revealed a high heterogeneity. After excluding data from the NSABP B-41 trial for AC followed by Doc chemotherapy, there was no heterogeneity (p = 0.52, I2 = 0). Therefore, this study is the source of heterogeneity. After deleting the heterogeneity source, the result of pCR (L vs. T, ypT0/is ypN0/+) using the fixed effects model revealed insignificant difference with the previous result [RR = 0.72, 95%CI: 0.62 to 0.83, p < 0.00001]. The heterogeneity test for ORR (T+L vs. T) revealed a high heterogeneity (p = 0.08, I2 = 56%). After excluding data from the NeoALTTO trial, which used wP chemotherapy as the neoadjuvant therapy and FEC chemotherapy as the adjuvant therapy, there was no heterogeneity (p = 0.41, I2 = 0). Therefore, this study is the source of heterogeneity. After deleting the source of heterogeneity, the result of ORR (T+L vs. T) using the fixed effects model revealed insignificant difference with the previous result [RR = 0.96, 95%CI: 0.90 to 1.03, p = 0.28]. The heterogeneity test for CHF (T+L vs. T) revealed high heterogeneity (p = 0.04, I2 = 65%). After excluding data from the ALTTO trial, which used anti-HER2 therapy as the adjuvant therapy, there was a low heterogeneity (p = 0.23, I2 = 31%). Therefore, this study was the source of heterogeneity. After deleting the source of heterogeneity, the result of CHF (T+L vs. T) using the fixed effects model shows insignificant difference with the previous result [RR = 0.65, 95%CI: 0.44 to 0.97, p = 0.04]. The heterogeneity test for LVEF decline (T+L vs. T) revealed a high heterogeneity (p = 0.08, I2 = 52%). After excluding data from the ALTTO trial (ALTTO and ALTTOa), which used anti-HER2 therapy as the adjuvant therapy, there was no heterogeneity (p = 0.83, I2 = 0%). Therefore, this study was the source of heterogeneity. After deleting the source of heterogeneity, the result of LVEF decline (T+L vs. T) using the fixed effects model shows insignificant difference with the previous result [RR = 0.55, 95%CI: 0.11 to 2.73, p = 0.46]. The heterogeneity test for diarrhea (T+L vs. T) revealed a high heterogeneity (p = 0.02, I2 = 57%). After excluding data from ALTTOa, which used trastuzumab followed by lapatinib as the anti-HER2 therapy, there was no heterogeneity (p = 1.00, I2 = 0%). Therefore, this study was the heterogeneity source. After deleting the source of heterogeneity, the result of diarrhea (T+L vs. T) using the fixed effects model showed insignificant difference with the previous result [RR = 11.39, 95%CI: 8.30 to 15.63, p < 0.00001]. The heterogeneity test for diarrhea (L vs. T) revealed a high heterogeneity (p < 0.00001, I2 = 81%). After excluding data from the CEREBEL trial with a low methodology method, there is little heterogeneity (p = 0.34, I2 = 11%). Therefore, this study was the source of heterogeneity. After deleting the heterogeneity source, the result of diarrhea (L vs. T) using the fixed effects model shows insignificant difference with the previous result [RR = 7.62, 95%CI: 5.73 to 10.11, p < 0.00001]. The heterogeneity test for neutropenia (L vs. T) revealed a high heterogeneity (p = 0.02, I2 = 59%). After excluding data from the NeoALTTO trial, which used wP chemotherapy as the neoadjuvant therapy, there was little heterogeneity (p = 0.71, I2 = 0%). Therefore, this study was the source of heterogeneity. After deleting the source of heterogeneity, the result of Neutropenia (L vs. T) using the fixed effects model shows insignificant difference with the previous result [RR = 0.92, 95%CI: 0.83 to 1.02, p= 0.10]. All results are stable. Findings from sensitivity analysis are shown in Table 3.

Table 3 Sensitivity analysis

Evidence quality assessment

Thirty-seven outcomes were assessed by GRADE. Risk bias: Almost all outcomes were considered serious risk due to unclear allocation concealment of the included studies, except outcomes of BCS rates for T+L vs. T and L vs. T. Inconsistency: High heterogeneities were found in outcomes of pCR (L vs. T, ypT0/is ypN0/+), RFS (T+L vs. T), ORR (T+L vs. T), CHF (T+L vs. T), LVEF decline (T+L vs. T), diarrhea (T+L vs. T), diarrhea (L vs. T), and neutropenia (L vs. T). Thus, these outcomes were considered serious risk of inconsistency. Indirectness: All outcomes had no significant indirectness, because all trials were direct comparisons. Imprecision: Outcomes of OS (L vs. T, neoadjuvant therapy), DFS/EFS (L vs. T), RFS (T+L vs. T), ORR (T+L vs. T), ORR (L vs. T), DCR (L vs. T), rate of BCS (T+L vs. T), rate of BCS (L vs. T), CHF (T+L vs. T), CHF (L vs. T), LVEF decline (T+L vs. T), diarrhea (T+L vs. MBC), diarrhea (T+L vs. palliative therapy), neutropenia (T+L vs. T), neutropenia (L vs. T), fatigue (T+L vs. T), fatigue (L vs. T), vomiting (T+L vs. T), vomiting (L vs. T), nausea (T+L vs. T), and nausea (L vs. T) were considered serious risk of imprecision due to insufficient sample size. Publication bias: DFS/EFS (L vs. T), CHF (L vs. T) and nausea (L vs. T) exhibited a publication bias. Overall: No outcomes had high-quality evidence, 15 outcomes had moderate-quality evidence, 14 outcomes had low-quality evidence, and 8 outcomes had very low-quality evidence (Table 4).

Table 4 GRADE evidence profile of outcomes

Discussion

This is an updated systematic review and meta-analysis, which conclusively determined whether efficacies of trastuzumab plus lapatinib or lapatinib therapy are not inferior to trastuzumab therapy. In previous studies, Yu et al. and Clavarezza et al. [44, 45] did not compare lapatinib therapy with standard trastuzumab therapy, and the latest included studies were published in 2017. Xu et al. [46] reported the meta-analysis results among three arms (T+L, T, and L). However, this study [46] with the included RCTs from 2012 to 2015 did not include all relevant studies. Moreover, all treatment types (neoadjuvant, adjuvant, or palliative treatment) were included in this study, while it is different from the studies of Ma et al. [47] and Guarneri et al. [48], which only included the RCTs of neoadjuvant therapy. Thus, our study included all relevant RCTs, enlarged the sample size, and almost analyzed all important outcomes, which may lead to a more scientific and comprehensive meta-analysis results. Compared to previous studies, our findings basically showed no significant difference, and most results were highly similar.

This meta-analysis shows that the efficacy of trastuzumab combined with lapatinib therapy is superior to standard trastuzumab therapy alone, with a significant improvement in OS, DFS/EFS, pCR (ypT0/is ypN0), and pCR (ypT0/is ypN0/+), RFS, but has more safety risks, with a higher incidence of diarrhea and rash/skin toxicity. In addition, standard trastuzumab therapy alone was proven superior to lapatinib therapy in efficacy, with a significant improvement in OS, PFS, DFS/EFS, pCR (ypT0/is ypN0), and pCR (ypT0/is ypN0/+). With regard to safety, standard trastuzumab therapy alone had a higher incidence of LVEF decline, but had a low incidence of grade III or IV diarrhea and rash/skin toxicity, compared to lapatinib therapy alone. In previous studies, Clavarezza et al. [45] reported that trastuzumab combined with lapatinib therapy significantly increased the pCR rate, compared to trastuzumab therapy alone, in tandem with our findings. Xu et al. [46] reported that trastuzumab combined with lapatinib therapy significantly improved the pCR, EFS, and OS, but showed a higher rate of grade III/IV diarrhea, rash or erythema, and neutropenia, compared to lapatinib or trastuzumab therapy alone, which is in tandem with our findings. Ma et al. [47] reported that standard trastuzumab therapy alone plus chemotherapy was superior to chemotherapy plus lapatinib therapy in pCR (ypT0/is ypN0/+) (RR=0.82, 95% CI: 0.72–0.93) and pCR (ypT0/is ypN0) (RR=0.77, 95% CI: 0.67–0.88), while lapatinib plus trastuzumab therapy and lapatinib therapy showed no significant difference in rate of BCS compared with chemotherapy plus trastuzumab therapy, and lapatinib plus trastuzumab therapy and lapatinib therapy showed higher incidence of diarrhea and skin rash compared with chemotherapy plus trastuzumab therapy. Guarneri et al. [48] reported that trastuzumab combined with lapatinib therapy significantly improved RFS and OS, compared to standard trastuzumab therapy.

OS, PFS, and DFS/EFS are important clinical outcomes. We found that trastuzumab combined with lapatinib therapy significantly improved OS and DFS/EFS, compared to standard trastuzumab therapy alone while lapatinib plus chemotherapy had a lower efficacy in OS, PFS, and DFS/EFS, compared to standard trastuzumab therapy. Based on GRADE, evidence quality for these outcomes was generally moderate, except DFS/EFS (L vs. T), which was considered low quality. Therefore, trastuzumab plus lapatinib is a better option for increasing the survival time of patients. Although lapatinib plus trastuzumab therapy had higher pCR rates, when compared to standard trastuzumab therapy, while trastuzamab was superior to lapatinib, differences among the three kinds of anti-HER2 therapy with regard to breast-conserving rate were insignificant. Thus, we consider treatment of better pCR efficacy may have no obvious clinical meaning for patients who think highly of breast conservation. However, if patients with early breast cancer think highly of short-term efficacy, then trastuzumab combined with lapatinib may be a better choice. We also established that patients with better pCR efficacies had better long-term survival outcomes. However, it has yet to be established whether pCR is associated with long-term survival outcomes.

Trastuzumab-associated cardiac toxicities have been evaluated. Some studies reported that trastuzumab-induced cardiotoxicity might result from its negative regulation of murine double minute 2 (MDM2) and p53. Meanwhile, trastuzumab-induced cardiomyocyte apoptosis has been associated with inflammatory infiltrations. Chemokine expressions of TNFα, MCP-1 and ICAM-1 mediated by TLR4 contribute to the inflammatory responses. Lapatinib preserved cell energy and inhibited TNFα-induced cardiomyocyte apoptosis by activating the AMPK pathway [49, 50]. In this study, lapatinib showed a lower incidence of LVEF decline, compared to trastuzumab therapy, and evidence quality was moderate. Therefore, for patients with bad cardiac conditions, the efficacies of a combination of lapatinib with trastuzumab should be evaluated. With regard to other toxicities, trastuzumab had a lower incidence of grade III/IV diarrhea and rash/skin toxicity, compared to lapatinib therapy and lapatinib plus trastuzumab therapy. Mayo et al. reported that lapatinib can reduce gut microbial diversity, which may be the reason for the high incidence of diarrhea [51]. However, high incidences of rash during treatment with lapatinib and combination therapy may not be a bad thing. Amir Sonnenblick reported that patients with early development of rash derive superior benefits from lapatinib-based therapies [52]. However, reasons for the rash remain unclear. Researchers inferred that lapatinib pharmacokinetics or pharmacodynamics influenced rash development. Normal epidermal growth depends on EGFR, which is expressed on the proliferating skin [52].

In previous studies [44,45,46,47,48], no evidence quality assessment was performed. To determine the reliability of the meta-analysis results, GRADE was used to assess the evidence quality in this study. In long-term survival outcomes (excluding subgroup analysis results), almost all outcomes were assessed moderate quality evidence due to unclear allocation concealment of the included studies, except DFS/EFS (L vs. T) and RFS (T+L vs. T), which were assessed very low evidence quality. Thus, we supposed it was generally credible that lapatinib plus trastuzumab therapy had a better long-term efficacy, when compared to standard trastuzumab therapy, while trastuzamab was superior to lapatinib. In short-term survival outcomes, more than half of the outcomes were assessed low- or very low-quality evidence due to unclear allocation concealment of the included studies, high heterogeneity, or insufficient sample size. Although this study showed that lapatinib plus trastuzumab therapy had a better short-term efficacy, when compared to standard trastuzumab therapy, while trastuzamab was superior to lapatinib, it is still hard to make a conclusion that which therapy had a better short-term efficacy, while outcomes of rate of BCS were assessed moderate-quality evidence due to insufficient sample size. This study proved that no significant difference was found in rate of BCS among three therapies, which, we suppose, was credible. In cardiac toxicities and other toxicities (excluding subgroup analysis results), almost all outcomes were assessed low- or very low-quality evidence due to unclear allocation concealment of the included studies, high heterogeneity, insufficient sample size, or publication bias. Thus, further verification is needed to determine which therapy is safer. Overall, no outcomes had high-quality evidence, 15 outcomes had moderate-quality evidence, 14 outcomes had low-quality evidence, and 8 outcomes had very low-quality evidence. More than half of the outcomes were assessed low- or very low-quality evidence. We inferred that it was probably caused by the following reasons. First, most included studies did not design well, which caused serious risk bias. Second, insufficient sample size led to insignificant differences in some results, which caused serious imprecision. Third, publication bias and high heterogeneity downgraded the level of evidence. To upgrade the evidence quality, more well-designed long-term large sample RCTS are needed. Apart from that, more strict inclusion and exclusion criteria should be made in future studies, so that more studies with low heterogeneity can be included.

To determine the source of heterogeneity, subgroup analysis revealed subgroup effects between groups. Patients with neoadjuvant therapy were associated with longer OS, relative to patients with MBC, while patients with stage I–III breast cancer or neoadjuvant therapy had higher incidences of diarrhea than patients with MBC or palliative therapy during lapatinib treatment. Outcomes from subgroup analyses may have been affected by the instability caused by small sample sizes. We also assessed the quality of evidence and found moderate-quality evidence for OS (L vs. T, palliative therapy), diarrhea (L vs, T, I-III), and diarrhea (L vs. T, neoadjuvant therapy), and low-quality evidence for OS (L vs. T, neoadjuvant therapy), diarrhea (L vs, T, MBC), and diarrhea (L vs. T, palliative therapy), which may inform clinicians and patients when selecting treatment options.

In recent years, studies have increasingly evaluated the efficacies of dual-targeted therapy versus single-targeted therapy. It is significant for clinicians and patients to evaluate the efficacy and safety of dual and single-targeted therapy for better therapeutic selection. This study is associated with various limitations; first, most of the included studies did not clearly mention allocation concealment, which reduces the reliability of the included studies. Second, different chemotherapies in the included studies may lead to clinical heterogeneity. Third, most of the trials, apart from ALTTO, had small sample sizes. Finally, low incidences of safety events in the studies may have led to excess judgment of treatment effects. In the future, relevant, well-designed long-term large sample RCTS are needed, and more studies should assess the mechanisms of cardiac and non-cardiac toxicities of lapatinib and trastuzumab. In addition, it is of significance to determine whether pCR has any effects on long-term survival or not when lapatinib and trastuzumab are used, and whether combinations of lapatinib and trastuzumab can reduce incidences of cardiac toxicities.

Conclusions

The efficacy of trastuzumab combined with lapatinib therapy is superior to standard trastuzumab therapy alone, but has more non-cardiac grade III/IV toxicities. The efficacy of lapatinib therapy is inferior to that of standard trastuzumab therapy alone. However, the cardiac safety of lapatinib therapy is superior to that of standard trastuzumab therapy.