Introduction

Advances in medical treatment have significantly improved the prognosis of human epidermal growth factor 2 (HER2)-positive early breast cancer (BC) patients over time and led to establish chemotherapy combined with 1 year of trastuzumab as the standard adjuvant treatment [1].

The impact of prognostic/predictive biomarkers on the outcome of patients treated with appropriate standard systemic treatment has been considered by the American Joint Committee on Cancer (AJCC) Staging System panel in the update of the breast cancer staging. Based on the incorporation of biologic factors (histologic grade, estrogen receptor, progesterone receptor, HER2, and multigene panels) to the classic anatomic stage, the 8th edition of the AJCC breast cancer staging system has introduced prognostic stage, which was developed using data from patients identified in the National Cancer Database (2010–2011) and then validated in large cohorts of patients from the MD Anderson Cancer Center and the California Cancer Registry [2,3,4,5,6,7]. These studies allowed to confirm the improved prognostic performance of the prognostic stage as compared to the anatomic stage in the general breast cancer patients’ population. The most recently updated version of the prognostic stage was released after the results of the validation study highlighted that a proportion of patients could not be assigned a specific prognostic stage [7]. Therefore, the prognostic staging system was refined to include all the possible combinations of anatomic stages and biomarkers [8]. As declared by the AJCC staging panel, the actual prognostic stage will undergo frequent updates, based on future validation studies in large databases of patients treated with state-of-the-art therapies [4, 6]. Several studies, all conducted in retrospective patient cohorts, have been reported in the last couple of years, overall corroborating the prognostic stage as a more accurate discriminator of breast cancer patients’ outcome as compared to the anatomic stage. However, it has to be pointed out that many of these studies used data from the National Cancer Database or the SEER (Surveillance, Epidemiology, and End Results) registry covering a period of time including years 2010 and 2011. Considering the overlap between the National Cancer Database and the SEER, these studies included data that were previously used by the AJCC panel to develop the prognostic score. Moreover, most of these studies, including the main validation studies by the AJCC panel, did not report detailed analysis of distinct breast cancer subtypes, with no study specifically focused on HER2-positive disease. Furthermore, even in the most robust cohorts, exposure to trastuzumab was not reported or not homogeneous among HER2-positive patients (literature review in Additional file 1) [7, 9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]. This aspect is a relevant caveat, since the assumption at the basis of the adoption of the prognostic stage is that patients are offered adequate systemic treatment based on biologic characterization [2, 4, 6].

One of the major clinical needs for HER2-postive BC patients is an accurate risk stratification to guide escalated and de-escalated strategies to ensure the most effective treatment along with a more rationale resource allocation [24]. One of the most important goals of staging is to help clinicians define a treatment plan [5]; therefore, the evaluation of outcome prediction by the prognostic stage in HER2-positive patients cohorts treated with standard therapy is a key step in order to define its potential role as tool to guide de-escalated therapeutic choices. For this kind of investigation, a randomized trial testing de-escalated against standard treatment represents the ideal setting.

In this study, we aimed to validate the prognostic stage in HER2-positive BC patients treated with adjuvant chemotherapy combined with 1 year or 9 weeks trastuzumab in the randomized ShortHER trial [25].

Methods

Patients

The ShortHER trial (NCT00629278) is a phase 3 trial of adjuvant therapy that randomized 1253 patients with HER2-positive early BC to anthracycline and taxane-based chemotherapy combined with 1 year (long) or 9 weeks (short) trastuzumab. Study characteristics and results are reported elsewhere [25].

Staging

In the present analysis, patients were classified according to the anatomic stage, based on tumor size (T) and nodal status (N), and to the prognostic stage that takes into account T, N, estrogen receptor, progesterone receptor, histologic grade, and HER2 status. The most recent version of the 8th AJCC edition was used as reference [8]. Histologic grade, hormone receptor expression, and HER2 status were based on local pathology. According to the AJCC staging manual, for the present analysis, estrogen receptor and progesterone receptor expression was classified as positive in case of staining in > 1% of tumor cells.

Once the anatomic and prognostic stages were applied, patients with discordant stage assignment were defined as follows:

  • Those patients moved to a more favorable stage category with the prognostic stage as compared to the anatomic stage were defined as downstaged;

  • Those patients moved to a less favorable stage category with the prognostic stage as compared to the anatomic stage were defined as upstaged.

Statistical analysis

Statistical analyses were performed using IBM SPSS v.24 and R project for Statistical Computing [26]. Distant disease-free survival (DDFS) was calculated from randomization until relapse at a distant site or death, whichever first.

Kaplan-Meier method was used to estimate survival curves. The log-rank test was used to compare stage categories. The Harrel concordance index (C index) was calculated for each of the two staging systems. Difference between the C index of the anatomic and prognostic stage models was tested by using “compareC” package in R [26]. Cox proportional regression models were used to calculate hazard ratios (HRs) and 95% confidence intervals (CIs). The log-rank test χ2 statistic, and its P value were used to explore the discrimination between groups. The significance level was P < 0.05. All tests were two-sided.

Results

Stage classification

Complete data for classification according to the anatomic and the prognostic stage were available for 1244 patients. Patients’ characteristics are reported in Table 1. The comparison of anatomic and prognostic stage classifications is summarized in Table 2.

Table 1 Patients’ characteristics
Table 2 Comparison of anatomic and prognostic stage classifications in patients enrolled in the ShortHER trial

The rate of concordance was 58.4% (n = 727 patients), whereas 517 patients (41.6%) had a discordant stage category assignment. All discordant cases were downstaged by the prognostic stage:

  • 100% of anatomic stage IB patients (n = 40) were re-classified as IA

  • 61.6% of anatomic stage IIA patients (n = 246) were re-classified as IB (6.0%) or IA (55.6%);

  • 63.0% of anatomic stage IIB patients (n = 94) were re-classified as IA (1.3%) or IB (81.7%);

  • 58.7% of anatomic stage IIIA patients were re-classified as IB (19.0%) or IIA (39.7%);

  • 100% of anatomic stage IIIC patients (n = 66) were re-classified as IIIA (13.6%) or IIIB (86.4%).

Among downstaged patients, the change was by one stage down for 23.4% (n = 121), by two stages down for 71.8% (n = 371), and by three stages down for 4.8% (n = 25) of cases.

Survival analysis

Median follow-up was 6.1 years. Five-year DDFS rates and their 95% confidence interval for stage categories according to the anatomic and prognostic stage classifications are reported in Table 3, survival curves are shown in Fig. 1.

Table 3 Five-years DDFS rates by stage category according to the anatomic and prognostic stage classifications
Fig. 1
figure 1

Kaplan-Meier DDFS curves by anatomic stage (a) and prognostic stage (b)

Both models showed the ability to stratify patients at different outcome (log-rank P < 0.001). The C index was 0.69209 for the anatomic stage and 0.69249 for the prognostic stage, with no significant difference (P = 0.975). With prognostic stage, 58.9% of patients were classified as stage IA and showed excellent outcome after adjuvant chemotherapy and trastuzumab (5-year DDFS 95.7%, 95%CI 94.2–97.3%). However, within each of the stage categories, the outcome was numerically inferior for the prognostic stage groups (Table 3).

Table 4 shows Cox regression analysis for DDFS according to anatomic and prognostic stage, with stage IA as reference category. With anatomic stage, the prognosis of stage IB and IIA patients was not statistically different as compared to stage IA patients. With prognostic stage, all stage categories showed significantly worse outcome as compared to stage IA. We further explored the stage discrimination by focusing on stages I–II and looking at the log-rank χ2 and its P value in paired comparisons. For IB vs IA, the χ2 statistics was 0.014 (P = 0.906) for anatomic stage and 5.930 (P = 0.015) for prognostic stage. For IIA vs IB, the χ2 statistics was 0.579 (P = 0.447) for anatomic stage and 0.263 (P = 0.608) for prognostic stage. For IIB vs IIA, the χ2 statistics was 5.322 (P = 0.0.021) for anatomic stage and 0.165 (P = 0.686) for prognostic stage. A higher χ2 statistic indicates a higher group separation. The results of the Cox regression analysis and those of the paired log-rank tests indicate that, for patients with stage I–II disease, the largest prognostic discrimination is between stage IIB and previous stages for anatomic stage and between stage IB and IA for prognostic stage.

Table 4 Cox regression DDFS analysis

Short vs long trastuzumab in stage I patients

Analyses comparing DDFS of stage I patients treated with 9 weeks vs 1 year trastuzumab were conducted (Fig. 2). The outcome of anatomic stage I patients (n = 509) was excellent irrespectively of trastuzumab duration (5-year DDFS 96.2%, 95%CI 93.8–98.7% in the short arm and 96.6%, 95%CI 94.4–99.0% in the long arm). Among prognostic stage I patients (n = 872), those who received 9 weeks trastuzumab had a non-significant numerically inferior DDFS (5-year DDFS 93.7%, 95%CI 91.4–96.2% vs 96.3%, 95%CI 94.5–98.2%, log-rank P = 0.080; HR 1.60 95%CI 0.94–2.73, P = 0.083). When limiting the analysis to patients with prognostic stage IA, the absolute difference in 5-year DDFS was reduced to 1.5% (95.0%, 95%CI 92.7–97.3% in the short arm vs 96.5%, 95%CI 94.5–98.5% in the long arm, log-rank P = 0.408; HR 1.29, 95%CI 0.70–2.38, P = 0.409).

Fig. 2
figure 2

Kaplan-Meier DDFS curves for patients treated in the short (9 weeks trastuzumab) vs the long (1 year trastuzumab) arm according to stage categories: anatomic stage I patients (a), prognostic stage I patients (b), prognostic stage IA patients (c)

Discussion

This is the first study (i) evaluating the performance of prognostic AJCC stage specifically for early HER2-positive BC patients treated with adjuvant chemotherapy and trastuzumab, (ii) evaluating the performance of prognostic AJCC in a prospective randomized trial, and (iii) validating the prognostic AJCC in a European patients’ cohort. Our findings show a similar prognostic performance for prognostic and anatomic stage, despite prognostic stage reallocated a substantial proportion of patients (41.6%) to a more favorable stage category. Previous studies in general BC patient populations have described a reallocation rate with prognostic stage most frequently reported around 40–60% (range 18–74%; Additional file 1) [7, 9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]. In the California Cancer Registry, including 54,727 patients in anatomic stages I to IV, 31.0% and 20.6% of patients were assigned to a more favorable and less favorable stage category with prognostic stage, respectively [7]. Only a few studies reported the discrepancy between the two stage models specifically for HER2-positive BC patients, leading to non-univocal results (Additional file 1) [9, 14, 21, 23]. A large cohort from the National Cancer Database including n = 60,155 HER2-positive BC patients showed 29.4% and 0% rates of downstaging and upstaging, respectively [9]. Another study showed that 35.8% and 40.7% of HER2-positive patients (n = 1982) were classified as stage I by anatomic stage and prognostic stage, respectively [15]. The rate of downstaging (58.4%) was higher in our study, and consequently, the enrichment in stage I patients with prognostic stage was also more evident. The high prevalence of hormone receptor-positive patients in the ShortHER population (68%) might have contributed to substantial downstaging. It should be highlighted that the ShortHER population reflects the characteristics of HER2-positive patients commonly treated in contemporary clinical practice [25, 27].

Our data show that the substantial downstaging of patients with the prognostic stage did not affect the performance of the model which was maintained similar to anatomic stage (P = 0.975 for C index comparison). In this context, available literature data focused on HER2-positive patients are scanty. Moreover, their interpretation is extremely limited by the lack of homogenous treatment or lack of information about it (Additional file 1) [18, 23]. The largest cohort of HER2-positive patients analyzed for survival outcome according to prognostic stage included 562 cases (mostly not treated with trastuzumab) and showed a good 10-year disease-specific survival for prognostic stage I patients (> 96%), but did not report overall model performance [18].

As previously discussed, prognostic stage led to an enrichment in stage I (70% vs 40.9% anatomic stage) and more specifically in stage IA patients (58.9% vs 37.7%). The pairwise comparisons conducted in stage I–IIA patients suggest that the prognostic stage better discriminated the group of patients with the best prognosis among others (IA), whereas with anatomic stage there was no significant difference in outcome among patients in stages IA, IB, and IIA. However, when looking at absolute survival rates, stage IA patients had slightly numerically inferior outcome as compared to stage IA groups by anatomical stage. The numerically worse outcome for prognostic stage and matched anatomic stage categories was evident for all stage groups. In synthesis, the prognostic stage, by recognizing the prognostic effect of biomarkers, results in a shift from a worse to a better stage category (mostly to stage IA or stage IB) of a large number of patients as compared to the anatomic stage. One of the consequences of this shift is a better separation of stage groups in terms of DDFS, especially in stage I–IIA patients. However, intuitively, in absolute terms, the outcome of prognostic stage IA and IB patients, being enriched by patients with a worse anatomic stage category, is somehow diluted and results numerically inferior to the corresponding anatomic stage. Moreover, prognostic stage > IIA categories are depleted vs the same anatomic stage category in patients with a better prognosis; again, as a consequence, the outcome of prognostic stage > IIA groups is numerically inferior to the corresponding anatomic stage. The main implication is that the prognostic stage is more valuable as anatomic stage as a tool to counsel patients about their prognosis: by applying the prognostic stage, more patients would be regrouped in more favorable stage categories and would be informed about a good outcome as compared to the anatomic stage. To the other side, the prognostic stage identifies a more restricted number of patients with far poorer outcomes. However, what clinicians have to keep in mind when counseling patients is that in absolute terms the estimated outcome for a given prognostic stage category might not correspond to the estimation for the same anatomic category.

An appropriate identification of patients at excellent outcome with standard adjuvant treatment is key to identifying those patients who may be offered de-escalated treatment strategies. Treatment de-escalation for HER2-positive patients with anthracycline-free regimens as the paclitaxel-trastuzumab schedule is already administered in clinical practice based on anatomic stage, mostly for patients with stage I disease [28, 29]. We explored whether prognostic stage I may be of value in identifying patients for de-escalated therapies. Our results suggest that if anatomic stage I seems a good parameter to guide de-escalated therapeutic choices, this may not be the case for prognostic stage I. Indeed, prognostic stage I patients treated with short trastuzumab had an absolute 3% worse DDFS rate at 5 years as compared to patients enrolled in the long trastuzumab arm. When restricting the analysis to prognostic stage IA, there was still an absolute 1.5% difference in 5-year DDFS favoring the long arm. However, this result was not statistically significant and was based on a difference of just six events between the two arms. Although these were exploratory, unplanned, and unpowered analyses that should be interpreted with caution, the results can be considered as hypothesis-generating that require further testing in similar trials. To note, the acceptable absolute difference in outcome to consider a de-escalated treatment as safe is currently debated [30]. If our results will be confirmed in further studies, the two staging systems will be recognized as providing divergent information in the context of patient selection for treatment de-escalation, possibly posing a challenge in the implementation of the prognostic stage in clinical practice.

Our study has strengths: this is the first study evaluating the prognostic performance of prognostic stage in a cohort of HER2-positive patients, all receiving chemotherapy and trastuzumab; patient population is derived from a prospective trial; 99% of patients had sufficient data for the present analysis; the study design allowed to explore short vs long trastuzumab in stage-defined groups.

Main limitations of this study include the choice of the survival endpoint (DDFS) which is different from the one used to develop and validate the prognostic stage (BC-specific survival) [7]. In the ShortHER trial, actual median follow-up does not allow for a mature evaluation of BC-specific survival in this population of patients. Therefore, we opted to use DDFS as a surrogate of BC-specific survival considering the lethal nature of DDFS events. Another limitation is the reduced sample size in stage-defined groups, limiting the power of direct comparisons.

Conclusions

In conclusion, the AJCC prognostic stage is valuable in counseling patients regarding their prognosis and may serve as reference for clinical trial design and sample size estimation. Our data do not support the assumption that prognostic stage may also guide treatment de-escalation, thus more information from other randomized trials are needed. These findings fill the present void of appraising the clinical validity and utility of prognostic staging in HER2-positive patients. Research into integrated models of risk stratification tailored at fulfilling the need for clinically useful tools to guide de-escalated therapeutic choices is highly encouraged.