Introduction

Trastuzumab plus chemotherapy is approved for use in patients with HER2-positive [HER2 immunohistochemistry (IHC) 3+ or IHC 2+ and in situ hybridization (ISH)-positive] metastatic adenocarcinoma of the stomach/gastroesophageal junction (GEJ) who have not received prior anticancer treatment for their metastatic disease [1]. This is now considered the standard of care, globally [2, 3]. Approval was based on pivotal data from the Trastuzumab for Gastric Cancer (ToGA) study, which demonstrated that adding trastuzumab to chemotherapy (capecitabine or fluorouracil with cisplatin) significantly improved overall survival (OS) in patients with HER2-positive advanced gastric cancer (AGC) or GEJ cancer versus chemotherapy alone [median OS 13.8 (95% CI 12–16) versus 11.1 months (95% CI 10–13)] [4]. The overall tumor response rate in ToGA was 47.1% in a population where 77.6% of patients had HER2 IHC 3+ or IHC 2+/ISH-positive tumors, suggesting a substantial proportion did not respond to first-line trastuzumab treatment despite HER2 overexpression [4]. The mediators of primary refractory or developed HER2 resistance in gastric cancer are not yet fully understood, although preclinical investigations are ongoing [5]. Currently, no global clinical outcomes data have established a HER2-targeted second-line regimen for patients with HER2-positive AGC.

Trastuzumab emtansine (T-DM1) is an antibody–drug conjugate of trastuzumab linked to the tubulin-binding agent DM1 via a stable thioether linker [6]. T-DM1 retains trastuzumab’s HER2-targeting mechanism of action [7]. T-DM1 internalization then leads to intracellular release of DM1-containing catabolites that induce mitotic arrest and cell death [6, 7]. T-DM1 improved OS and progression-free survival (PFS) in patients with metastatic breast cancer (MBC) previously treated with HER2-targeted therapies [8,9,10], compared with the control. These survival improvements were observed across patient subgroups in exploratory biomarker analyses [11, 12].

GATSBY was an international, randomized, open-label, adaptive phase II/III study to evaluate T-DM1 versus a taxane in patients with previously treated HER2-positive locally advanced or metastatic gastric/GEJ adenocarcinoma [13]. GATSBY did not meet its primary endpoint of improved OS in patients treated with T-DM1 2.4 mg/kg every week (qw), compared with taxane therapy [13].

Evaluation of HER2 expression and downstream signaling will improve our understanding of HER2-positive AGC disease biology, and why T-DM1 did not demonstrate improved OS versus taxane therapy in GATSBY. Exploratory biomarker analyses in patients with MBC suggest that subgroups with higher HER2 mRNA levels (> median) experienced a larger survival benefit from T-DM1 [11, 12]. In this biomarker analysis of GATSBY, we assessed HER2 expression and additional biomarkers: HER3—an important heterodimerization partner enabling HER2-mediated activation of the PI3K/Akt pathway [14, 15]; PTEN expression and PIK3CA mutations—associated with constitutive activation of the PI3K/Akt pathway [16, 17] even in the presence of HER2 inhibition [18,19,20]; and cMET expression—linked with resistance to HER2-targeted therapies [21, 22]. Notably, PIK3CA mutation has been linked with reduced efficacy of HER2-targeted therapies [23, 24], but not T-DM1 [11]. Fcγ receptor (FcγR) IIa/IIIa polymorphisms were assessed as trastuzumab binds FcγR on immune effector cells [7]. Herein we report the results of the preplanned biomarker analysis of GATSBY.

Materials and methods

Patient eligibility and study design

The design of GATSBY (NCT01641939), inclusion/exclusion criteria, and treatment schedule have been published [13]. Patients aged ≥ 18 years with centrally tested HER2-positive (IHC 3+, independent of ISH status, or IHC 2+ and ISH-positive) AGC, who progressed during or after first-line therapy were eligible [13]. Patients were randomized 2:2:1 to receive 3.6 mg/kg T-DM1 every 3 weeks (q3w), 2.4 mg/kg T-DM1 qw, or taxane (docetaxel or paclitaxel; physician’s choice) in stage 1 of the study and, following independent data monitoring committee (IDMC) review, randomized 2:1 to receive the IDMC-selected dose regimen of 2.4 mg/kg T-DM1 qw or taxane in stage 2 (dose selection based on pharmacokinetic, safety, and efficacy data) [13]. The primary endpoint was OS (time from randomization to death, regardless of the cause of death); PFS [time from randomization to first occurrence of progressive disease or death from any cause (whichever occurred first)] was a secondary endpoint. GATSBY was conducted in accordance with Good Clinical Practice and the Declaration of Helsinki. The protocol was approved at each site by the local ethics committee/institutional review board. All patients provided written informed consent.

Specimen characteristics

Primary and/or metastatic tumor samples were used for biomarker analyses. Samples were supplied as formalin-fixed paraffin-embedded tumor blocks if available, or freshly cut unstained slides (shipped within 48 h of cutting). After HER2 assessment, additional biomarker assays were performed according to slide availability and priority. Whole-blood samples for cytogenetic analysis were collected per study protocol and shipped on dry ice to the sponsor’s Clinical Sample Operations (within 48 h if stored at − 20 °C; monthly if stored at − 70 °C).

Assay methods

Tissue processing and assays were performed centrally by Targos Molecular Pathology prior to unblinding of the study data. IHC analysis was used to assess the expression of HER2, PTEN, and cMET. IHC staining was performed according to the manufacturers’ instructions; the antibodies and validation tests used during IHC analysis are listed in Supplementary Table 1.

HER2 protein subgroups were defined by IHC staining intensity (per scoring criteria listed in Supplementary Table 2) and H score. HER2 IHC 2+ and IHC 3+ subgroups (separately and combined) were further characterized as focal, heterogenous, or homogenous according to the proportion of positively stained tumor cells (< 30%, 30–79%, or ≥ 80%, respectively).

PTEN expression subgroups were categorized according to ≤ or > median and expression levels in the tumor versus the surrounding nontumor tissue (categorized as “none”, “decreased”, “slightly decreased”, “equivalent”, or “increased” as described previously) [25]. PTEN was also reported as an H score for the cytoplasm and nucleus, determined by the percentage of viable tumor cells in each staining category: ≥ 50% no staining (0), ≥ 50% weak (1), ≥ 50% moderate (2), and ≥ 50% strong (3) intensity levels; and defined as H score = [1 × (% cells staining at 1)] + (2 × (% cells staining at 2)] + [3 × (% cells staining at 3)] [26]. cMET subgroups were classified by IHC staining intensity per an internally generated algorithm (Ventana/Genentech; Ventana Medical Systems, Inc., Tucson, AZ, USA): no staining or < 50% tumor cells with any intensity (IHC 0); ≥ 50% tumor cells with weak or higher staining but < 50% with moderate or higher intensity (IHC 1+); ≥ 50% tumor cells with moderate or higher staining but < 50% with strong intensity (IHC 2+); ≥50% staining with strong intensity (IHC 3+) [27].

HER2 gene amplification was measured by ISH (INFORM HER2 Dual ISH DNA Probe Cocktail Assay; Ventana Medical Systems, Inc.) and defined as positive (HER2 gene copy number: chromosome 17 centromere ratio ≥ 2.0) or negative. Additional subgroups were defined by gene copy number (< 4, ≥ 4 and < 6, or ≥ 6) and gene ratio (≥ 2 and < 4 or ≥ 4). HER2 and HER3 mRNA levels were measured using quantitative reverse transcription polymerase chain reaction (PCR; cobas® z480 Analyzer; Roche Molecular Diagnostics, Pleasanton, CA), expressed as concentration ratios relative to the housekeeping gene G6PD, and categorized as ≤ median or > median. PIK3CA mutation analyses were performed using allele-specific PCR (cobas® z480 Analyzer) with coverage of 17 mutations in exons 1 (R88Q), 4 (N345K), 7 (C420R), 9 (E542K, E545A, E545D, E545G, E545K, Q546E, Q546K, Q546L, and Q546R), and 20 (M1043I, H1047L, H1047R, H1047Y, and G1049R), and defined as mutated or non-mutated type per exon. A validation study in gastric cancer (GC) comparing the assay to Sanger sequencing was performed before samples were analyzed. All assays were performed according to manufacturer’s instructions. FcγR polymorphisms (rs1801274, rs396991) were assessed on whole blood-extracted DNA using TaqMan-based real-time PCR assays according to internal procedures and classified by allele expression.

Statistical analyses

The clinical cutoff point for this analysis was June 30, 2015. Exploratory OS and PFS subgroup analyses were preplanned, without preplanned sample sizes or power calculations. Median OS and PFS were calculated using the Kaplan–Meier method. Unstratified Cox proportional hazards regression was used to estimate hazard ratios (HRs) and 95% CIs for T-DM1 versus taxane in biomarker subgroups. Multivariate analyses using Cox proportional hazards modeling were performed to explore the potential influence of HER2 IHC 3+ staining and randomized treatment interaction on survival. Additional variables included were: treatment arms, stratification factors (world region, prior HER2-targeted therapy, prior gastrectomy), sex, age, Eastern Cooperative Oncology Group (ECOG) performance status, visceral disease (lung or liver), primary site of disease, baseline disease measurability, and extent of disease. A second model including the same variables, plus an interaction term, was used to test for multiplicative interaction between treatment effect and HER2 IHC 3+.

Subgroup treatment effect pattern plot (STEPP) analyses were performed post hoc to evaluate associations between OS and HER2 gene copy number and mRNA as continuous variables [28].

Statistical analyses were performed using SAS software (version 9.2; SAS Institute, Cary, NC, USA). STEPP plots were created with R software. All analyses were exploratory and not hypothesis-testing.

Results

Patients

Patients were enrolled between September 2012 and February 2015, and were randomized to receive T-DM1 2.4 mg/kg qw (n = 228), T-DM1 3.6 mg/kg q3w (n = 70), or taxane (n = 117; Supplementary Figure 1). Biomarker analyses were conducted in the intent-to-treat (ITT) population (T-DM1 2.4 mg/kg qw and taxane arms; see Supplementary Table 3 for demographics information). Patient biomarker levels (Table 1) and median follow-up {17.5 [interquartile range (IQR) 12.1–23.0] and 15.4 months [IQR 9.2–18.1] in T-DM1 2.4 mg/kg qw and taxane arms, respectively} were similar across treatment arms [13].

Table 1 Patient biomarker levels (ITT population)

HER2 expression/characterization at baseline

HER2 assessments were performed on primary tumors for 312/345 (90.4%) patients (remaining assessments performed using samples from metastases or of unknown origin). HER2 expression subgroups were equally distributed across treatment arms (Table 1); 67.2% (231/344) of patients had HER2 IHC 3+ status. Focal staining was observed in 15.4% of patients (53/344; Table 1); homogenous staining was observed in 56.7% of patients (195/344), predominantly in IHC 3+ tumors [IHC 3+: 51.1% (118/231); IHC 2+: 16.8% (19/113)].

HER2 mRNA expression was similar between treatment arms (Table 1). The relationship between HER2 IHC staining and HER2 mRNA expression is demonstrated in Fig. 1. HER2 mRNA expression and gene ratio were numerically higher in tumors with more homogenous IHC staining. Tumors with homogenous IHC 3+ staining had the highest median HER2 mRNA expression and gene ratio (Fig. 1).

Fig. 1
figure 1

HER2 expression levels in HER2 IHC 2+/ISH-positive versus IHC 3+ subgroups (all patients): aHER2 mRNA and bHER2 gene ratio. The solid white line indicates the median, the solid blue box represents Q1–Q3, vertical blue lines extend to 1.5 × IQR outside the box, and the blue dots represent individual outliers. Figures generated using Spotfire (TIBCO Software Inc., Palo Alto, CA, USA). HER2 human epidermal growth factor receptor 2, IHC immunohistochemistry, IQR interquartile range, ISH in situ hybridization

Association between HER2 expression and OS

In the T-DM1 arm, patients with tumors exhibiting higher HER2 expression levels had longer median OS, compared with those in lower HER2 expression subgroups (Table 2). This pattern was consistently observed across high versus low HER2 expression subgroups defined by protein level, gene copy number, gene amplification, H score, and staining pattern. In the taxane arm, OS did not appear as strongly linked with HER2 expression or amplification except for HER2 staining patterns in IHC 3+ tumors, where homogenous staining was associated with longer median OS versus heterogenous and focal staining (Table 2).

Table 2 Median OS in high versus low HER2 expression biomarker subgroups

Association between HER2 expression and comparative treatment efficacy

Median OS was comparable between treatment arms in subgroups with IHC 3+ tumors, > median HER2 mRNA, ≥ 6 HER2 gene copy number, or ≥ 4 HER2 gene ratio (Fig. 2a), and in patients with IHC 3+ or IHC 2+/ISH-positive tumors exhibiting homogenous or nonfocal HER2 IHC staining patterns. However, 95% CIs for HRs were overlapping for all subgroups (Fig. 2a). Kaplan–Meier curves support a link between HER2 expression and comparative treatment efficacy as measured by OS according to HER2 protein and mRNA expression (Fig. 2b, c).

Fig. 2
figure 2figure 2

Comparative efficacy of T-DM1 and taxane therapies in HER2 expression subgroups: a OS in individual subgroups and Kaplan–Meier curves showing treatment and biomarker effect on OS according to b HER2 expression and cHER2 mRNA levels; d PFS in individual subgroups. Dotted line on forest plots represents the hazard ratio for the overall ITT population. AVAL elapsed time to the event of interest, CI confidence interval, HER2 human epidermal growth factor receptor 2, IHC immunohistochemistry, ISH in situ hybridization, ITT intent-to-treat, NE not evaluable, OS overall survival, PFS progression-free survival, qw every week, T-DM1 trastuzumab emtansine

Median PFS in patients with high HER2 expression was generally similar between T-DM1 and taxane treatment arms, whereas median PFS in patients with low HER2 expression (e.g. IHC 2+/ISH-positive, or low mRNA HER2 expression) indicated significantly worse survival outcomes with T-DM1, compared with taxane treatment in this population (Fig. 2d).

Post hoc multivariate Cox regression modeling suggested that HER2 IHC 3+ staining was an independent predictor of OS (HR 0.58; p = 0.0001; Table 3). The addition of an IHC 3+ *T-DM1 treatment interaction term (model 2) confirmed that the longer OS in HER2 IHC 3+ versus IHC 2+ subgroups specifically occurred in association with T-DM1 (Table 3). Other variables identified as independent prognostic factors for improved OS were prior gastrectomy (p = 0.0002), visceral disease (lung or liver) at baseline (p = 0.0314), and an ECOG performance status of 0 at baseline (p = 0.0297; Table 3). Prior HER2-targeted therapy also appeared to have an effect [HR 0.75, 95% CI 0.54–1.05 (model 1) and HR 0.75, 95% CI 0.55–1.03 (model 2), respectively]. STEPP analysis revealed no linear pattern for HER2 mRNA, or HER2 gene copy number when correlated with the treatment effect (HR) on OS (Supplementary Figure 2). For PFS, the STEPP analysis showed a more consistent linear pattern for HER2 mRNA and HER2 gene copy number than observed with OS (Supplementary Figure 3).

Table 3 Multivariate analysis of OS

Association between HER3 mRNA, PTEN, cMET expression, PIK3CA mutation status, and FcγR polymorphisms, and comparative treatment efficacy

A reduced treatment effect was observed with T-DM1 versus taxane therapy in patients with > median HER3 mRNA levels (Fig. 3). No clear differences were identified within the PTEN, cMET, and FcγR subgroups, although patients with high cMET protein expression (IHC 3+) showed a numerically better treatment effect with T-DM1, compared with taxane therapy (Fig. 3; Supplementary Figure 4). However, no firm conclusions can be drawn given the small subgroup sizes. The patient subgroup with PIK3CA mutations was also too small (n = 17) to allow interpretation.

Fig. 3
figure 3

OS in HER3 mRNA, PTEN, cMET, and PIK3CA mutation status subgroups. Dotted line represents the hazard ratio for the overall ITT population. CI confidence interval, cMET cellular mesenchymal-epithelial transition, HER3 human epidermal growth factor receptor 3, ITT intent-to-treat, NE not evaluable, OS overall survival, PI3K phosphoinositide 3-kinase, PIK3CA phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit-alpha, PTEN phosphatase and tensin homolog, qw every week, T-DM1 trastuzumab emtansine

Discussion

GATSBY did not show superior OS with T-DM1 versus taxane therapy, in previously treated AGC [13]. The results from this prespecified biomarker analysis support the hypothesis that tumors with evidence of high HER2 expression, as defined by IHC 3+, HER2 mRNA expression > median, gene ratio ≥ 4, or homogenous or nonfocal staining may be more sensitive to T-DM1 therapy, with survival outcomes comparable to taxane therapy.

Comparable OS and PFS in the T-DM1 and taxane arms were observed in patients with HER2 IHC 3+ and > median HER2 mRNA, while taxane therapy was associated with longer survival than T-DM1 in patients with lower HER2 expression. This differential treatment benefit between high and low HER2 expression was more pronounced for PFS than OS. This may result from PFS being evaluated during the GATSBY study treatment period, whereas any putative relationship between biomarkers and final OS could be confounded by post-progression treatments. However, per-protocol investigator assessment of PFS within an open-label study could also potentially affect the data. Interpretation of the cMET analysis was limited by the small patient numbers in the cMET IHC 0 and IHC 3+ groups. The observed 5.2% mutation rate for PIK3CA limited the analysis of its impact on treatment outcomes in this study, and HER2-positive GC biology (mutation rates in BC are substantially higher, e.g., 30.5% in EMILIA [11], but can be as low as 3% in some GC subtypes [29]).

Our findings support previous observations that heterogenous HER2 staining patterns seem more common in GC than BC [30,31,32]. The proportion of patients in GATSBY with HER2 IHC 2+/ISH-positive tumors was similar to that reported in ToGA (32.8% and 27%, respectively) [4], and is markedly higher than the ~ 10% rate observed in BC [33]. Median HER2 mRNA expression levels in this study were also lower than those measured in BC studies [median ranges 9–13 in BC (Roche, Data on file)].

Biomarker analyses from EMILIA and TH3RESA in patients with HER2-positive advanced/MBC reported that the statistically significant improvements in PFS and OS in patients receiving T-DM1 versus control were maintained across all subgroups examined (HER2 and HER3 mRNA, PTEN protein and PIK3CA mutation status [11, 12], and EGFR mRNA [11]). These analyses also reported greater improvements in survival endpoints among patient subgroups with > median HER2 mRNA expression [11, 12]. A similar pattern was identified in this biomarker analysis of GATSBY (although to a lesser extent), with increased PFS and OS in patients with higher versus lower levels of HER2 expression in the T-DM1 arm. Although in BC, even patients with lower HER2-expressing tumors experienced a PFS and OS benefit with T-DM1, compared with the control arm [11, 12], whereas in GATSBY T-DM1 was not associated with superior treatment outcomes versus taxane therapy in the ITT population or any biomarker subgroup.

The reasons for the lack of T-DM1 effect on OS and PFS in GATSBY are unclear, but suggest that HER2 expression could drive disease progression differently in GC versus BC, which could have important implications for gastric tumor biology. Specifically, the malignant phenotype may be driven more by HER2 signaling in BC than in GC, leading to greater sensitivity to HER2-targeted therapies in BC. This concept is further supported by the lack of benefit of other HER2 inhibitors in GC, such as lapatinib [34, 35]. Furthermore, IHC scoring in GC allows strong incomplete membrane staining to be classified as HER2-positive [31]. This could mean HER2 receptor density is lower in GC than BC even in homogenously stained tumors; comparison of HER2 mRNA levels between GC and BC tumors with homogenous HER2 staining does not support this hypothesis (Roche, Data on file); however, more accurate methods to measure exact HER2 receptor density per cell may be needed to test this assumption. Discordance in HER2 expression between primary and metastatic GC has been reported previously, suggesting that HER2 status can be altered or lost during disease progression [36, 37]; HER2 status was confirmed using a primary tumor sample for approximately 90% of patients in GATSBY (all had status confirmed using tissue collected prior to first-line therapy). HER2 expression could also have changed before versus after first-line HER2-targeted therapy [38], and in primary versus metastatic tumors during disease progression. However, our multivariate analysis of OS, and the previously published subgroup analysis for the GATSBY study, does not indicate a worse outcome in patients in the T-DM1 arm who received prior HER2-targeted therapy [13]. There could also be non-HER2-mediated, as yet unknown, resistance mechanisms in GC that influenced the treatment outcomes in this study.

The TRIO-013/LOGiC study (lapatinib plus chemotherapy as first-line therapy for patients with upper gastrointestinal adenocarcinomas), reported lapatinib was associated with improved PFS in those with metastatic cancers with high-level HER2 amplification [34]. Subgroup analyses of the TyTAN study (lapatinib plus paclitaxel as second-line therapy in patients with AGC) also reported that, although HER2-targeted therapy failed to improve OS versus paclitaxel alone in the ITT population, HER2 IHC 3+ patients who received lapatinib plus paclitaxel had a significantly reduced risk of death and PFS [35]. However, there are important differences between the study populations of TyTAN and GATSBY: TyTAN enrolled only Asian patients, whereas patients from Asia-Pacific comprised 46% of the GATSBY population; only 6% in TyTAN had previously received trastuzumab, compared with 77.4% in GATSBY [13, 35].

This analysis provides the first large-scale formal evaluation of predefined biomarkers in HER2-positive AGC. The population was representative of patients with HER2-positive AGC who had relapsed or progressed during or after first-line therapy. However, the study was powered for the primary efficacy endpoint and not for biomarker analyses, and only a limited panel of biomarkers was assessed. The PFS data could be confounded by the open-label study design and investigator assessment. Additionally, the primary tumor tissue analyzed for most patients might not reflect the tumor biology, including HER2 status, at study entry.

In this exploratory biomarker analysis of GATSBY, patients with high HER2 expression (as reflected by several HER2 markers) had a comparable OS and PFS benefit from T-DM1 as from taxanes; although T-DM1 2.4 mg/kg weekly did not show superiority over taxanes for OS and PFS in any of the prespecified biomarker subgroups of patients with previously treated HER2-positive AGC. The analyses presented advance the understanding of HER2 biology in gastric tumors; however, to improve future studies of HER2-targeted agents in GC, it may be useful to implement mandatory collection of tumor tissue at study entry and focus enrollment on patients with tumors that show high HER2 expression levels and/or a more homogenous HER2 IHC staining pattern.