Introduction

Adjuvant treatment decisions in hormone receptor (HR)+/HER2-negative early breast cancer (EBC) have traditionally relied on clinicopathological characteristics (nodal status, tumour size/grade, age, and co-morbidities). A recent meta-analysis of standard chemotherapy (CT) regimens demonstrated that proportional risk reductions were little affected by age, nodal status, tumour diameter/differentiation, ER status, and tamoxifen use [1].

Increasing evidence suggests that HR+/HER2-negative EBC is a heterogeneous disease [2]. Specifically, patients with molecularly classified [2] luminal-A or immunohistochemically (IHC) classified (Ki-67, progesterone receptor (PR)) luminal-A-like tumours have excellent prognosis [3,4,5,6]. However, clinical utility of Ki-67 is limited by inter-observer variability [7, 8], despite improvement by quality assurance [9, 10].

Since the advent of genomic signatures, efforts have been undertaken to integrate them into clinical routine for pN0-1 HR+ patients. Until 2015, most clinical evidence for genomic signatures derived from retrospective analyses of prospective trials conducted in the early-CT era [11,12,13,14,15] or used archived samples from untreated patients [16]. Considerable retrospective evidence exists for reproducible prognostic impact of Oncotype DX®/Recurrence Score® (RS), MammaPrint®, EndoPredict®, and Prosigna®, but their value beyond centrally measured IHC markers or derived scores like IHC4 remains uncertain; [4, 5, 17] the RS is the only test shown to predict CT benefit [12, 18].

While many national/international EBC guidelines incorporate genomic signatures [19], prospective trials exploring CT overtreatment/under-treatment are needed to assess benefits of augmenting conventional pathology with genomic signatures.

Recent prospective evidence has generally confirmed retrospective results: for MammaPrint®, MINDACT [20]; for RS, the registry part of TAILORx (in N0 disease) [21]; and the WSG-PlanB three-year analysis in clinically high-risk EBC [22]. Nevertheless, the utility of genomic testing in high-risk patients and the optimal selection for these tests remain unclear.

PlanB is a prospective, randomised multicentre phase 3 CT trial in HER2-negative EBC. RS was prospectively assessed in HR+ patients; following an early amendment, RS ≤ 11 participants with ≤3 involved lymph nodes (LN) could omit CT; RS ≤ 11 patients had excellent three-year survival [22]. The planned translational analyses included independent central pathology review of grade/IHC markers and their utilisation with RS for prognosis in nodal and KI-67 subgroups, based on the “five-year” (median 55-month) PlanB follow-up.

Methods

Study participants

The trial included female patients, 18–75 years, with histologically confirmed, unilateral primary invasive BC, adequate surgical treatment (free margins, sentinel-node biopsy in node-negative, or axillary dissection in node-positive patients), without evidence of metastasis. Key inclusion criteria: HER2-negativity; pT1-T4c; pN+ [or pN0 with a risk factor (≥pT2, grade 2/3, high uPA/PAI-1, <35 years, or HR-negative)]; ECOG performance status <2 or Karnofsky Index ≥ 80%; signed informed consent; and (if ≥4 positive LN, RS > 11, or HR-negative) willingness to participate in the adjuvant CT PlanB trial.

Study design

WSG-PlanB was approved by German ethics boards and conducted in accordance with the Declaration of Helsinki. Clinicaltrials.gov identifier: NCT01049425.

PlanB (CONSORT diagram: Fig. 1) began in 2009 as a CT trial comparing anthracycline-containing (four cycles of epirubicin/cyclophosphamide followed by four cycles of docetaxel q3w) and anthracycline-free (six cycles of docetaxel/cyclophosphamide q3w) CT. After including 274 patients, the trial was amended (08/2009) to recommend endocrine therapy (ET) alone for pN0/pN1, locally HR+ patients with RS ≤ 11 (based on an initial RS validation study [11]).

Fig. 1
figure 1

PlanB CONSORT diagram. CT chemotherapy, ET endocrine therapy, HR hormone receptor, RS recurrence score

Follow-up was performed at three-month intervals for the first three years and every six months thereafter. Data were obtained from electronic case record forms and verified by monitoring visits to the study sites.

RS was assessed on surgically removed primary tumour tissue at the central laboratory of Genomic Health Inc. (Redwood City, CA).

Slide review, IHC, and fluorescence in situ hybridisation were analysed in an independent central laboratory (Institute of Pathology, Hannover Medical School) [22]. Tumours were considered centrally ER+ (antibody SP-1) or PR+ (antibody PgR636, both Zytomed, Berlin, Germany) if immunostaining was present in ≥1% of tumour nuclei. Ki-67 was assessed centrally using rabbit monoclonal Ki-67 antibody 30–9 (Ventana Inc.Tucson, USA) on ≥100 invasive tumour cells; the semi-quantitative procedure for Ki-67 produces values in 5% increments. Hence, analysed semi-quantitative ranges Ki-67 ≤ 10%, ≥40% correspond to quantitative KI-67 ≤ 13.25%3, Ki-67 > 35% [23], respectively; analyses were also performed for semi-quantitative Ki-67 ranges <20% and ≥20%6. IHC4 was computed as previously described [4, 17].

Endpoints

The endpoints included prospective evaluation of RS prognostic impact at follow-up target of five years: Clinical outcomes (disease-free survival [DFS], overall survival [OS]) in RS ≤ 11 patients treated with ET alone, and prospective evaluation of the prognostic value of other parameters (Ki-67, IHC4 and histological grade [Elston-Ellis] by local/central assessment). Study was performed according to the reporting recommendations for tumour marker prognostic studies (REMARK) guidelines [24].

Statistical analysis

For DFS analysis, an event was defined as any invasive cancer event or death (with/without recurrence). Estimates of five-year DFS or OS with approximate 95% confidence intervals [given in brackets] were obtained by the Kaplan–Meier method. Comparisons of DFS or OS among subgroups used pairwise log-rank tests (reported as significant for p < 0.05).

Subgroup analyses were performed in RS ≤ 11, RS 12–25, RS > 25, and in Ki-67 subgroups (see below). Univariate and multivariate (forward elimination) Cox proportional hazard models for DFS were estimated; RS, Ki-67, ER, PR, and IHC4 were coded as continuous variables using fractional ranks. For a realistic measure of effect sizes, hazard ratios of fractionally ranked variables are reported for 75th versus 25th percentile. Factors with significant impact on DFS or OS are referred to as “prognostic”. Nodal status was coded as pN1-3 vs node-negative, pN2-3 versus pN0-1, and pN3 versus pN0-2; tumour stage was coded as pT2-4 versus pT1; local and central grades were coded as grade 3 versus grade 1–2. Statistical analyses were performed using SPSS v.23 (IBM Corp., Armonk, NY).

Role of the funding source

The industry supporters in this trial had no role in study design, data collection, analysis/interpretation, writing, or decision to submit the manuscript. The authors (OG, UN, NH, RK) had full data access and hold final responsibility for manuscript submission.

Results

Study participants

From 4/2009 to 12/2011, 3198 patients were recruited from 93 centres; 2449 were randomised for CT. As previously reported [22], the central tumour bank population included 2642 locally HR+ cases (97.4% centrally confirmed). The analyses presented here focus on locally HR + patients (all pN) unless otherwise stated. Their median age was 56 years; median tumour size was 19 mm; 61.9% had central grade 2; 58.8% were N0; 35.2% were pN1; 6.0% were pN2-3; 2553 (96.6%) had available RS: 17.4% RS ≤ 11, 58.4% RS 12–25, and 20.8% RS > 25 (Table 1).

Table 1 Patient characteristics in ER and/or PR positive population (by local assessment)

Within the 2274 locally HR+ pN0-1 post-amendment patients, 404 (17.8%) had RS ≤ 11; CT was omitted in 348 (86.1%) of these patients: 238 (68.4%) pN0 and 110 (31.6%) pN1.

435 tumours were HR-negative/HER2− by central pathology (local HR-negative status centrally confirmed in 93.5%).

Compliance with treatment recommendations within the trial was 95.2% for pN0 and 75.2% for pN1.

Median follow-up was 55 months (range 3–72 months).

DFS in RS groups, impact of nodal status, and Ki-67 within these RS groups

In locally HR+ patients with follow-up (including all pN), DFS was higher in RS ≤ 11 and RS 12–25 compared to RS > 25 (both p < 0.001); five-year DFS was 93.6% [90.8–96.4%] in low-RS (about three-fourths receiving no CT), 94.3% [92.8–95.8%] in intermediate RS (all CT-treated), and 84.2% [80.6–87.8%] in high-RS patients (all CT–treated) (Fig. 2a). Five-year DFS in those RS ≤ 11 patients treated with ET alone was 94.2% [91.2–97.3%] and was similar in pN0 (n = 238) (94.2%; [90.4–98.0%]) and pN1 (n = 110) subgroups (94.4%; [89.5–99.3%]).

Fig. 2
figure 2

DFS for patients with RS ≤ 11, 12–25, and > 25 overall (a), node-negative patients (b), for patients with pN1 disease (c), and for patients with pN2-3 disease (d)

Fig. 3
figure 3

OS for patients with RS ≤ 11, 12–25, and > 25 overall (a), for node-negative patients (b), for patients with pN1 disease (c), and for patients with pN2-3 disease (d)

Fig. 4
figure 4

DFS in HR+/Ki-67 0–10% group (a) and in HR+/Ki-67 > 10% and <40% (b)

Higher DFS in RS ≤ 11 and RS12-25 compared to RS ≥ 25 patients held separately in N0, N1, and N2 + (Fig. 2b–d). In pN2-3 patients with RS > 25, five-year DFS was 61.7% [46.6–76.8%].

Within RS subgroups, a significant impact of Ki-67 on DFS was seen only for RS > 25. In terms of fractionally ranked Ki-67, the hazard ratio (75th to 25th percentile) was 2.62 [1.39–4.91] (p = 0.003), i.e. poorer DFS. The corresponding CI in low and intermediate RS were [0.39–3.01] and [0.87–2.82], respectively. (Similar results were seen for grouped Ki-67).

DFS in Ki-67 subgroups and impact of RS within these subgroups

DFS was assessed (Appendix Fig. 5) in (“luminal”) patients (all pN) with central HR+ and HER2-negative status and follow-up for the Ki-67 subgroups “low” (0–10%, n = 810), “intermediate” (>10%, <40%, n = 988), and “high” (≥40%, n = 74); and for comparison in triple-negative (TN) (n = 405) patients (central HR-negative and HER2-negative); CT was administered in 79.9%, 86.7%, 97.3%, and 100%, respectively. Five-year DFS rates were 94.7% [92.9–96.6%], 91.0% [89.0–93.0%], 73.4% [61.8–84.9%], and 80.3% [76.2%–84.4%], respectively (p < 0.003 for “intermediate” versus “low” Ki-67, p = 0.5 for “high” versus “TN”, p < 0.001 for the other four comparisons).

Fig. 5
figure 5

DFS by central Ki-67 expression levels in centrally HR+ patients. DFS in the triple-negative (centrally ER/PR/HER2-negative) subgroup is included for comparison

Figure 4 illustrates the impact of RS groups on DFS within the low-Ki-67 and intermediate Ki-67 subgroups defined above (there were only 62 patients with follow-up and measured RS in the entire high-Ki-67 subgroup, almost all with high RS, Table 2). Within the intermediate Ki-67 subgroup, RS > 25 was associated with poorer DFS than either in RS ≤ 11 or RS 12–25 (both p < 0.001, log-rank); five-year DFS was 94.4% [89.9–98.9%], 93.8% [91.5–96.1%], and 83.6% [78.7–88.6%] in these RS ≤ 11, RS 12–25, and RS > 25 patients, respectively; the corresponding hazard ratios for RS > 25 were 3.61 [1.61–8.13] versus RS ≤ 11 and 2.82 [1.72–4.61] versus RS 12–25. Within the low-Ki-67 subgroup, an impact of RS was not seen, but only 33 events occurred in the whole low-Ki-67 subgroup (94.7% five-year DFS, see above), and only 8.4% had high RS.

Table 2 Joint distribution of recurrence score (RS) and semi-quantitative Ki-67 in the analysed population

In pN0-1 patients, five-year DFS rates for Ki-67 low (0-10%), intermediate (>10%, <40%), and high (≥40%) were 95.0% [93.1–96.8%], 92.0% [90.0–94.0%], and 75.4% [63.4–87.3%], respectively. Within intermediate Ki-67 (but restricting to pN0-1), the relative impact of RS groups on DFS was as above: RS > 25 was associated with poorer DFS than either RS ≤ 11 or RS12-25 (both p < 0.001, log-rank). Five-year DFS was 94.5% [89.8–99.3%], 93.8% [91.5–96.2%], and 87.1% [82.3–91.9%] in RS ≤ 11, RS 12–25, and RS > 25 patients, respectively (pN0-1, intermediate Ki-67).

Separately for two subgroups defined by Ki-67 ≥ 20 and Ki-67 < 20 (St. Gallen-inspired cutoff), high–RS (>25) patients had poorer DFS than low- or intermediate RS patients (all pN): The hazard ratios for RS > 25 versus RS 0–25 within Ki-67 ≥ 20% and Ki-67 < 20% subgroups were 2.69 [1.65–4.40] and 2.14 [1.07–4.26], respectively. Again, RS ≤ 11 and RS12-25 subgroups had similar five-year DFS: 92.2 versus 92.3% for Ki-67 ≥ 20% and 94.9 versus 95.1% for Ki-67 < 20%, respectively.

Univariate and multivariate analyses of DFS

In univariate analyses of locally HR+ patients with available RS (Table 2), nodal status (pN2-3 versus pN0-1, pN3 versus pN0-2), central and local grade 3 (vs. grade 1 or 2), tumour size >2 cm, as well as continuous, fractionally ranked RS, Ki-67, PR, and IHC, were all significant factors for DFS; higher levels of all these factors were unfavourable except for PR.

Table 3 also shows the results of a multivariate analysis including all the markers identified by univariate analysis (and fractionally ranked ER). Besides nodal status (pN2-3 vs. pN0-1, pN3 vs. pN0-2), grade 3 (independently by central and local assessment), tumour size >20 mm, and fractionally ranked RS—but not IHC4 or Ki-67—were independent factors for poorer DFS. The same results were true if analysis of the model was limited to chemotherapy-only-treated patients. When RS was excluded from the multivariate model, IHC4 or both Ki-67 and PR became independent predictors for DFS. When local grade was excluded from the multivariate model, Ki-67 became independent prognostic factor for DFS.

Table 3 Univariate and multivariate analysis (DFS) for all locally HR + tumours with available RS

OS analysis by RS and nodal status

Consistent with DFS, among all locally HR + patients with available RS, better OS was observed in RS ≤ 11 or RS12–25 patients than in RS > 25 (p < 0.001 for both comparisons). Five-year OS (Fig. 3a) was 99.1% [98.5–100%] in RS ≤ 11, 97.2% [96.0–98.5%] in RS 12–25, and 93.3% [90.8%–95.8%] in high-RS (>25) patients, despite only about one-fourth of these RS ≤ 11 patients receiving CT, compared to all RS > 11 patients; moreover, five-year OS in those RS ≤ 11 patients receiving ET alone was also 99.1%. The differences correspond to quite substantial hazard ratios of 6.46 [2.27–18.42] for RS > 25 versus RS ≤ 11 and 3.26 [1.87–5.70] for RS > 25 versus RS 12–25.

In node-negative patients, five-year OS was a remarkable 99.2% [98.0–100%] in RS ≤ 11 compared to 98.3% [97.0–99.5%] in RS 12–25 and 96.7% [94.4–99.0%] in RS > 25. OS in RS 12-25 was significantly higher than in RS > 25 patients within all nodal subgroups. OS in low-RS patients was also significantly higher than in high-RS patients within nodal subgroups 1–3 and 4–9 involved nodes (Fig. 3b–d).

Discussion

Our data provide the first prospective evidence in clinically high-risk EBC demonstrating strong impact of the RS on DFS and OS, not only in the collective as a whole, but also within key subgroups for clinical decision making. The independent impact of RS (as a continuous variable) on DFS was seen in a multivariate analysis including clinicopathological factors and IHC measurements. The observed survival impacts are particularly remarkable considering the PlanB trial design (CT omitted in a large fraction of HR+ , pN0-1 patients with RS ≤ 11); similarity of DFS in RS ≤ 11 and RS12-25 subgroups might be attributable to mitigating effects of CT in all RS > 11 (but only some RS ≤ 11) patients.

Based on their excellent five-year DFS (>94%) and OS (>99%), the omission of CT in 348 clinically high-risk (up to three involved LN), genomically low-risk (RS ≤ 11) patients seems justified. These favourable survival rates extend those of the explorative PlanB analysis after three-year follow-up [22] and are consistent with the reported five-year invasive DFS of 93.8% and OS of 98% in the RS < 11 subgroup treated with ET alone from TAILORx (pN0 pT1-2 EBC) [21]. PlanB and TAILORx, the first prospective trials using RS for adjuvant decision making, also confirm previous retrospective data suggesting very low relapse rates in RS < 18 pN0–N1 EBC patients without CT [11,12,13, 18].

MammaPrint® is the only other genomic signature supported by a prospective trial (MINDACT) [20]. MINDACT met its primary objective, five-year distant DFS > 92% without adjuvant CT in patients with high clinical risk (AdjuvantOnline! 9.0) and low genomic risk, suggesting that 46% of the clinically high-risk population (including 46% pN1 and/or 28.6% grade 3 tumours) may not require CT [20].

These prospective trials show that CT can be safely spared in a clinically meaningful fraction of HR +/HER2-negative patients with 0–3 affected LN and low genomic risk. High tumour burden remained a strong unfavourable prognostic factor in PlanB; five-year DFS in pN2+ patients (a small group within PlanB) ranged from 100% (RS ≤ 11) to 61% (RS > 25).

Our analyses add prospective evidence to the large body of consistent, yet retrospective/observational data for Oncotype DX including the SEER (>45,000 patients) and Clalit registry (n = 2028) analyses. These studies demonstrated excellent five-year breast cancer-specific survival (BCSS) of 99–100% in RS < 18 pN0 patients (2–7% CT use) and 95–99% in RS < 18 pN1 disease (7-41% CT use, increasing from one to three positive LN) [25,26,27]. All studies observed poor outcome in RS > 25 patients, indicating a need for further targeted therapies in this population.

PlanB is the first prospective study comparing the prognostic value of histological grade and IHC markers (ER, PR, Ki-67, and IHC4) determined by independent central pathology with that of a genomic signature in EBC. Our univariate/multivariate analyses confirm that these markers analysed by an experienced laboratory have prognostic value, if RS is excluded. However, RS eliminated all IHC markers (and IHC4) in multivariate analysis, consistent with most [4, 5], although not all [17] retrospective studies.

Consistent with Denkert et al. [23]., HR +/HER2-negative patients with Ki-67 levels ≥40% had poor survival, similar to that of triple-negative patients. However, an unfavourable impact of higher Ki-67 on DFS was seen only in the RS > 25 subgroup of locally HR+ patients, whereas no impact of Ki-67 was seen in corresponding RS ≤ 11 and RS 12–25 subgroups.

Despite Ki-67 inter-laboratory/inter-observer variability [8], cutoff uncertainty (13.25 versus 20%) [3, 5, 28], and conflicting results regarding its predictive value concerning adjuvant CT benefit [28, 29], the St. Gallen Consensus currently includes Ki-67 for identifying luminal-A-like patients, who should not receive CT. Furthermore, several studies indicate that a substantial proportion of patients would be re-classified from luminal-A to luminal-B if genomic signatures were added to an IHC-based allocation [4, 30]. In PlanB, 10% of luminal-A-like tumours (i.e. Ki-67 < 20%) had RS > 25 [22], a high-risk group for whom CT is recommended.

CT indication based on IHC-defined luminal subtypes or grade is associated with substantial inter-observer variability, especially outside a central pathology setting [22]. It should thus be carefully re-evaluated given our prospective data demonstrating that the highly reproducible RS result outperforms IHC markers as a prognostic factor in EBC. However, the multivariate analysis suggests that genomic assays are best used in the context of established factors such as grade, nodal status, and tumour size. Ki-67, as determined by an experienced laboratory, seems to be useful for selecting patients for the more expensive genomic testing when financial resources are limited. That is because concordance between RS and Ki-67 risk assessment is relatively high in the Ki-67 ≤ 10 and ≥40% groups and the most prognostic value of RS was in the subgroup of intermediate Ki-67 (>10 to <40%). Hence, using genomic signatures may be particularly useful for intermediate/high-risk pN0-1 disease according to clinical and/or IHC markers. In clinically low-risk disease, using MammaPrint® is controversial due to the missing CT predictive effect within this group [20]. Using more than one signature per patient is currently not recommended.

Our results have some important limitations. First, this five-year follow-up, translational research analysis of the PlanB phase 3 trial is exploratory; the primary endpoint, comparing CT arms, will be reported subsequently. Second, so far, clinical consequences for CT omission can only be drawn for the relatively small group of RS ≤ 11, pN0-1 patients. Yet, approximately 60% of patients have RS 12–25, where CT benefit is uncertain [31]. Two large prospective trials (TAILORx, RxPONDER) randomise patients to chemo-endocrine or ET alone within this RS range. The ongoing WSG–ADAPT trial uses dynamic proliferation response to preoperative ET as a selection tool for CT allocation in pN0-1 patients with intermediate RS (data availability: 2020).

In conclusion, the present findings regarding the PlanB five-year follow-up may be helpful in guiding/refining CT decisions in HR+/HER2-negative EBC, based on genomic signature and clinicopathological factors, particularly in pN0-1 patients otherwise considered as intermediate to high risk. The PlanB results support sparing adjuvant CT in pN0-1 patients with RS ≤ 11 (though possibly higher “clinical”) risk. However, whereas five-year data seem sufficient for assessing early risk reduction mitigated by CT [1], longer follow-up is needed to explore questions such as the duration of adjuvant ET. The additional predictive impact of RS (or other genomic signatures) for late recurrence beyond that of clinicopathological factors is a complex and still controversial issue. Finally, additional targeted therapies (e.g. CDK 4/6 inhibitors) need to be evaluated in pN0-1 luminal-B patients with high genomic risk who have significant residual risk despite adjuvant CT.