Introduction

Hormone receptor (HR)-positive, human epidermal growth factor receptor 2 (HER2)-negative breast cancer is the most common subtype of all breast cancers that accounts for approximately 70% [1, 2]. Recently, the efficacy of cyclin-dependent kinase 4/6 (CDK 4/6) inhibitors as adjuvant therapy was demonstrated in patients with HR-positive, HER2-negative, node-positive high-risk early breast cancer (EBC) in the monarchE trial [3]. In the monarchE trial, the criteria for high risk were as follows: patients with ≥ 4 positive axillary lymph nodes (ALNs) or 1–3 positive ALNs and histological grade (HG) 3 or tumor ≥ 5 cm [3]. As indicated by this high-risk criterion, HG is one of the strong prognostic factors for patients with breast cancer [4]. The Nottingham combined HG, which consists of three components: tubule formation (TF) score, nuclear atypia (NA) score, and mitotic counts (MC) score, is currently the most commonly used grading system internationally. In this system, total scores of 3–5, 6 or 7, and 8 or 9 correspond to HG1, HG2, and HG3, respectively.

In Japan, most pathologists have incorporated the nuclear grade (NG), which is the sum of the NA and MC scores, alongside the HG in their assessment [5]. This grading system is a modification of the Black grading system [6] and was initially developed to distinguish patients at high risk of recurrence within the cohort of node-negative breast cancer patients [5]. In the NG system, total scores of 2 or 3, 4, and 5 or 6 correspond to NG1, NG2, and NG3, respectively [5]. Consequently, following the criteria of both grading paradigms, a patient with HG3 would have either have a score of 8 (TF score 2, NA score 3, and MC score 3) or 9 (TF score 3, NA score 3, and MC score 3), and still be classified as NG3 (NA score 3, and MC score 3, total score = 6).

The Japanese Breast Cancer Society Clinical Practice Guideline for the pathological diagnosis of breast cancer recommended “histological/nuclear” grading system in daily clinical practice [7]. Several studies have reported that HG3 and NG3 are significantly associated with a worse prognosis than the other grades [5, 8,9,10]. Particularly, in patients with ER-positive, HER2-negative breast cancer, both HG3 and NG3 had significantly worse outcomes [11,12,13]. Another report showed that the clinical outcomes of patients with NG3 tumors have proven to be significantly or near significantly worse than those of patients with NG2 tumors [14]. Among patients with HG2, there are cases with high TF score and low NA and MC scores (e.g., TF score 3, NA score 2, and MC score 1, total score = 6), and cases with low TF score and high NA and MC scores (e.g., TF score 1, NA score 3, and MC score 3, total score = 7). Therefore, they both receive different NG scores, with the former being classified as NG1 (NA score 2, MC score 1, total score = 3) and the latter as NG3 (NA score 3, MC score 3, total score = 6). Thus, by using the NG instead of the HG for grading, we can see that some patients with HG2 are classified as NG3 (i.e., patients with HG2/NG3). In fact, a report directly comparing the HG and NG systems showed that the overall concordance rate was more than 70%, unveiling instances where some patients with HG2 were classified as NG3, while none with NG2 were classified as HG3 [11].

Given the high concordance between the grading systems and the usefulness of HG3 and NG3 as prognostic factors, replacing the HG with the NG in the selection criteria for patients at a high risk of recurrence used in the monarchE trial may provide adequate risk stratification. However, such a shift might alter risk classification for patients with HG2/NG3. For example, a patient with HG2/NG3, two involved nodes, and a tumor size of 2 cm would not be classified as high-risk according to the HG but would be classified as high-risk according to the NG. However, these aspects have not yet been fully evaluated. Therefore, this study aimed to evaluate whether risk stratification by HG used in the monarchE study could also be achieved using risk stratification by NG. Furthermore, the study also aimed to focus on the prognosis of the patients whose risk cohort was altered by the NG system.

Patients and methods

Overall, 647 HR-positive, HER2-negative, node-positive patients who were diagnosed with primary breast cancer at the National Cancer Center Hospital between January 2011 and December 2019 were identified. The exclusion criteria were as follows: (1) ductal carcinoma in situ at primary breast cancer, (2) stage IV disease at initial diagnosis, (3) patients without ALN involvement in primary breast cancer, and (4) those with unknown receptor status, NG, HG, and Ki-67 index. The medical records of the included patients were obtained from our prospectively generated database to extract the patient age at initial diagnosis, sex, clinical and pathological tumor size, clinical and pathological nodal status, HG, NG, histological type, menopausal status, estrogen receptor (ER) status, progesterone receptor (PR) status, HER2 status, Ki-67 index, presence or absence of lymphovascular invasion, type of initial surgery, chemotherapy, postoperative radiotherapy, and endocrine therapy (ET). For the evaluation of HG and NG, patients who underwent neoadjuvant chemotherapy (NACT) were evaluated using needle biopsy specimens before NACT, and those who did not undergo NACT were evaluated using surgical specimens. First, we defined three risk cohorts based on the risk classification of the monarchE study [3]. They were cohort 1: patients with ≥ 4 positive ALNs or 1–3 positive ALNs and grade 3 or tumors ≥ 5 cm; cohort 2: patients with 1–3 positive ALNs, grade < 3, tumor size < 5 cm, and high Ki-67 index (≥ 20%); and cohort 3: patients with 1–3 positive ALNs, grade < 3, tumor size < 5 cm, and low Ki-67 index (< 20%). Following this, all eligible patients were divided into four groups according to cohort conversion pattern by both grading systems; group 1: patients in cohort 1 by HG to cohort 1 by NG (i.e., no cohort conversion), group 2: patients in cohort 2 by HG to cohort 2 by NG (i.e., no cohort conversion), group 3: patients in cohort 3 by HG to cohort 3 by NG (i.e., no cohort conversion), and group 4: patients in cohort 2 or 3 by HG to cohort 1 by NG (i.e., cohort conversion). HR positivity was defined as either ER- or PR-positive. ER and PR were considered positive if the immunohistochemistry (IHC) staining was positive in > 1% of tumor cells [15]. A HER2 negative result corresponded to a score of 0 or 1+ on IHC or 2+ on IHC without amplification on fluorescence in situ hybridization [16,17,18]. The tumor, node, and metastasis staging of breast cancer was based on the 8th edition of the American Joint Committee on Cancer staging manual [19]. The Ki-67 index was evaluated using the same method in the monarchE study. Specifically, the Ki-67 index was measured in all untreated breast primary tumor samples using the Ki-67 IHC assay developed by Agilent Technologies (formerly Dako; Santa Clara, CA, USA) [3].

Statistical analyses

Invasive disease-free survival (IDFS) and distant relapse-free survival (DRFS) among the four groups were estimated using the Kaplan–Meier method, and survival estimates were compared using the log-rank test. In addition, we also tried to evaluate IDFS and DRFS separately for patients who received NACT and those who did not. This differentiation was important as we took into account that HG and NG had been assessed in distinct specimens for these two patient groups. IDFS was defined as the time from the initial surgery date to the date of the first occurrence of ipsilateral invasive breast tumor recurrence, local/regional invasive breast cancer recurrence, distant recurrence, all-cause mortality, contralateral invasive breast cancer, or second primary non-breast neoplasm. DRFS was defined as the time from the initial surgery date to the date of distant recurrence or all-cause mortality, whichever occurred first. Cox proportional-hazards model with hazard ratios (HRs) and 95% confidence intervals (CIs) was used to evaluate the independent prognostic effects of each variable on IDFS and DRFS. The baseline variables (P < 0.05) in the univariate analysis were included in the multivariate analysis. Baseline characteristics were evaluated using the Mann–Whitney U test or chi-square test, as appropriate. All statistical analyses were conducted using the statistical software STATA SE version 16 (StataCorp LP, College Station, TX, USA), and statistical significance was set at P < 0.05.

Ethical approval

The National Cancer Center Hospital Review Board and Ethical Committee approved this study (approval no. 2017-278), and the requirement for informed consent was waived because of the retrospective nature of the study.

Results

Patient demographics and tumor characteristics

According to risk cohort classification by HG, 351 (54.3%), 107 (16.5%), and 189 (29.2%) patients were classified as cohorts 1, 2, and 3, respectively, while according to risk cohort classification by NG, 371 (57.3%), 93 (14.4%), and 183 (28.3%) patients were classified as cohorts 1, 2, and 3, respectively. Among the 647 patients, 351 (54.3%), 93 (14.4%), 183 (28.3%), and 20 (3.1%) patients were classified as groups 1, 2, 3, and 4, respectively. The relationship between the risk cohorts based on each grading system and the defined groups is shown in Fig. 1.The overall concordance rate between HG and NG was 70.3% (455/647). Particularly, the 193 patients with HG3 were classified as NG3, and 103 (92.0%) of the 112 patients with HG1 were classified as NG1. In contrast, 31 (9.1%) of the 342 patients with HG2 were classified as NG3 (Table 1). Of these 31 patients, 20 were classified as group 4 for NG3 (Fig. 1). Table 2 presents the demographics and tumor characteristics of all patients and each of the four groups. No differences were found in age, sex, ER status, HER2 status, or ET rates among the four groups. The PR-negative status was significantly higher in group 1 than in group 3, and the total mastectomy rates were significantly higher in group 1 than in groups 2 and 3. Additionally, the axillary lymph node dissection (ALND) rate was significantly higher in group 1 than in groups 3 and 4. However, chemotherapy and irradiation were significantly more common in group 1 than in groups 2, 3, and 4.

Fig. 1
figure 1

The relationship between the risk cohorts based on each grading system and the defined groups. Group 1: Patients in cohort 1 by HG to cohort 1 by NG (i.e., no cohort conversion). Group 2: Patients in cohort 2 by HG to cohort 2 by NG (i.e., no cohort conversion). Group 3: Patients in cohort 3 by HG to cohort 3 by NG (i.e., no cohort conversion). Group 4: Patients in cohort 2 or 3 by HG to cohort 1 by NG (i.e., cohort conversion). Cohort 1: Patients with ≥ 4 positive ALNs or 1–3 positive ALNs and grade 3 or tumors ≥ 5 cm. Cohort 2: Patients with 1–3 positive ALNs, grade < 3, tumor size < 5 cm, and high Ki-67 index (≥ 20%). Cohort 3: Patients with 1–3 positive ALNs, grade < 3, tumor size < 5 cm, and low Ki-67 index (< 20%). HG histological grade, NG nuclear grade, ALNs axillary lymph nodes

Table 1 Histological grades and nuclear grades of all patients
Table 2 Patients’ characteristics

Recurrence events

During the median follow-up of 71.4 months (interquartile range 51.4–98.5 months), 111 IDFS and 79 DRFS events occurred in all patients (Table 3). Of the 351 patients in group 1, most IDFS events were distant recurrences, and common sites of distant recurrence were the bone, liver, lung, and distant lymph nodes. Among the patients in group 2, most IDFS events were locoregional recurrence, distant recurrence, and second primary neoplasm, whereas, among those in group 3, most IDFS events were second primary neoplasms. Among the 20 patients in group 4, most IDFS events were distant recurrences. However, a small number of DRFS events occurred in groups 2, 3, and 4.

Table 3 Recurrence events

IDFS and DRFS based on the four groups

Regarding IDFS, patients in group 1 had significantly worse 5-year IDFS than those in groups 2 and 3 (80.8% vs. 89.5%, P = 0.0319; 80.8% vs. 95.5%, P = 0.0002, respectively). Patients in group 4 also had significantly worse 5-year IDFS than those in groups 2 and 3 (78.0% vs. 89.5%, P = 0.0224; 78.0% vs. 95.5%, P = 0.0051, respectively) (Fig. 2a). Univariate analysis revealed that the significant risk factors associated with IDFS events were patients with group 1 (HR 2.87; 95% CI 1.63–5.08; P < 0.001), group 4 (HR 3.25; 95% CI 1.05–10.0; P = 0.040), PR-negative status (HR 1.94; 95% CI 1.08–3.47; P = 0.027), and no ET (HR 2.73; 95% CI 1.42–5.24; P = 0.003). In the multivariate analysis, the significant risk factors for IDFS events were patients with group 1 (HR 2.84; 95% CI 1.64–4.92; P < 0.001) and no ET (HR 2.71; 95% CI 1.29–5.70; P = 0.009). Patients with group 4 were a borderline significant risk factor for IDFS (HR 3.08; 95% CI 0.97–9.81; P = 0.057) (Table 4a). Regarding DRFS, patients in group 1 had significantly worse 5-year DRFS than those in groups 2 and 3 (85.2% vs. 95.3%, P = 0.0025; 85.2% vs. 98.4%, P < 0.0001, respectively). Patients in group 4 also had significantly worse 5-year IDFS than those in groups 2 and 3 (83.6% vs. 95.3%, P = 0.0060; 83.6% vs. 98.4%, P = 0.0006, respectively) (Fig. 2b). Univariate analysis revealed that the significant factors associated with DRFS events were patients with group 1 (HR 10.7; 95% CI 3.37–34.1; P < 0.001), group 4 (HR 11.4; 95% CI 2.28–56.7; P = 0.003), PR-negative status (HR 2.31; 95% CI 1.20–4.44; P = 0.012), no radiotherapy (HR 0.46; 95% CI 0.26–0.81, P = 0.008), and no ET (HR 2.60; 95% CI 1.13–5.92; P = 0.024). In the multivariate analysis, the significant risk factors for DRFS events were patients with group 1 (HR 9.70; 95% CI 3.23–29.1; P < 0.001), group 4 (HR 11.0; 95% CI 2.15–56.4; P = 0.004), and no ET (HR 2.73; 95% CI 1.08–6.93; P = 0.034) (Table 4b).

Fig. 2
figure 2

a Invasive disease-free survival (IDFS) and b distant relapse-free survival (DRFS) according to the risk group. Group 1: Patients in cohort 1 by HG to cohort 1 by NG (i.e., no cohort conversion). Group 2: Patients in cohort 2 by HG to cohort 2 by NG (i.e., no cohort conversion). Group 3: Patients in cohort 3 by HG to cohort 3 by NG (i.e., no cohort conversion). Group 4: Patients in cohort 2 or 3 by HG to cohort 1 by NG (i.e., cohort conversion). Cohort 1: Patients with ≥ 4 positive ALNs or 1–3 positive ALNs and grade 3 or tumors ≥ 5 cm. Cohort 2: Patients with 1–3 positive ALNs, grade < 3, tumor size < 5 cm, and high Ki-67 index (≥ 20%). Cohort 3: Patients with 1–3 positive ALNs, grade < 3, tumor size < 5 cm, and low Ki-67 index (< 20%). IDFS invasive disease-free survival, DRFS distant relapse-free survival, GS grading system, HG histological grade, NG nuclear grade, ALNs axillary lymph nodes, CI confidence interval

Table 4 Univariate and multivariate analysis results of factors associated with invasive disease-free survival (IDFS) (a) and distant relapse-free survival (DRFS) (b)

This study included 84 and 563 patients who underwent and did not undergo NACT, respectively. The number of patients who received NACT was 72, 8, 2, and 2 for groups 1, 2, 3, and 4, respectively; IDFS events occurred in 24, 0, 0, and 1 patients, respectively, while DRFS events occurred in 20, 0, 0, and 1 patients, respectively (Online Resource Table 1). The number of events in groups other than group 1 was relatively too small to perform an adequate evaluation. In contrast, the numbers of patients who did not receive NACT were 279, 85, 181, and 18 for groups 1, 2, 3, and 4, respectively; IDFS events occurred in 54, 12, 16, and 4 patients, respectively, and DRFS occurred in 43, 5, 7, and 3 patients, respectively (Online Resource Table 2). The 5-year IDFS rates for groups 1 and 4 were significantly lower than that of group 3 (84.2% vs. 95.4%, P = 0.0067; 81.6% vs. 95.4%, P = 0.0282, respectively). Although the 5-year IDFS rate for group 2 was not significantly different from those for groups 1 and 4, there was a slightly significant trend (84.2% vs. 88.2%, P = 0.2728; 81.6% vs. 88.2%, P = 0.0953, respectively) (Online Resource Fig. 1a). Regarding DRFS, patients in group 1 had significantly worse 5-year DRFS than those in groups 2 and 3 (88.1% vs. 94.8%, P = 0.0269; 88.1 vs. 98.3%, P = 0.0004, respectively). Patients in group 4 also had significantly worse 5-year IDFS than those in groups 2 and 3 (87.8% vs. 94.8%, P = 0.0424; 87.8% vs. 98.3%, P = 0.0079, respectively) (Online Resource Fig. 1b).

Discussion

In this study, we investigated whether NG, instead of HG, could appropriately stratify the prognosis of patients with HR-positive, HER2-negative, node-positive high-risk EBC, as defined in the monarchE trial. The results showed that the risk cohort classification by NG was highly consistent with that by HG, indicating that the risk classification by NG could appropriately stratify patients at a high risk of recurrence. Furthermore, patients classified as low-risk according to the HG classification but high-risk according to the NG classification (i.e., group 4) had a poor prognosis similar to those classified as high-risk according to both the HG and NG classifications (i.e., group 1). We could not fully evaluate the utility of NG in patients with NACT because of the small number of patients in this population (only 86 patients). However, among patients who did not undergo NACT, patients in group 4 also had poor prognoses similar to those in group 1, with respect to IDFS and DRFS. These results suggest that, at least for patients not undergoing NACT, NG can appropriately stratify prognosis even if it replaces HG and that it can also adequately select high-risk cases undetectable by HG. To the best of our knowledge, these results are the first to be reported.

The patients’ background in this study showed the following characteristics compared with those of the ET-alone group in the monarchE study [3]. Patients in the ET-alone group in the monarchE study were those in groups 1 and 2 in our study. However, compared with patients in the MonarchE study, our study patients had a higher proportion of those with 1–3 lymph nodes (59.8% vs. 40.4%), ≥ 65 years (22.7% vs. 14.6%), no chemotherapy (15.1% vs. 4.7%), and Ki-67 values of ≥ 20% (76.9% vs. 43.6%), respectively. In contrast, the clinicopathological factors, such as menopausal status, tumor diameter, HG, and ER/PR status, were similar between the monarchE study and this study. Regarding the follow-up period of this study, the median follow-up was > 70 months, which was sufficiently long to detect the occurrence of early recurrence events. The proportion of group 1 in the total patient population was > 50%, and we believe it a suitable target population to examine whether the stratification of recurrence risk could be replicated by replacing HG with NG. The overall concordance rate of HG and NG in this study was as high as 70.3% (455/647), particularly because all patients with HG3 were classified as those with NG3. This result is consistent with that of a previous report [11]. Although NG was used as a selection criterion for patients with a high risk of recurrence, the risk of underestimating the number of high-risk patients based on the monarchE study criteria was low. In contrast, 31 patients with HG2 were classified as those with NG3 in this study, and the risk cohort was changed to 20 of these patients (group 4). These patients in group 4 had a poor prognosis similar to those in group 1. Multivariate analyses showed that patients in group 4 were a significantly poor prognostic factor for DRFS and a marginally poor prognostic factor for IDFS.

Comparing the clinicopathological characteristics between the 31 and 311 patients with HG2/NG3 and HG2/NG1 or NG2, respectively, in this study showed that patients with HG2/NG3 had significantly lower tubule formation scores than those with HG2/NG1 or NG2, whereas the nuclear atypia score, mitotic count score, and Ki-67 value were significantly higher (Online Resource Table 3). Mitotic counts and Ki-67 values are commonly used to evaluate the proliferative activity of breast cancer, and tumor cell proliferative activity is an important independent prognostic factor in patients with breast cancer [20, 21]. Therefore, patients with HG2/NG3 have poorer prognostic factors than those with HG2/NG1 or NG2. This indicated a certain number of patients with NG3 who might be at a high risk of recurrence even if they were not determined to be at a high risk of recurrence because of HG2. Furthermore, the possibility that these patients have a poor prognosis suggests that the NG classification can identify a population that the conventional HG classification cannot adequately stratify. Although the evaluation of the three-grade classification scale may vary among pathologists, several studies have reported on the moderate reproducibility of HG with both inter- and intra-observer concordance [22,23,24]. Regarding NG, various activities have been conducted to standardize the criteria for NG assessment among pathologists, and the interobserver agreement level was also satisfactory [25,26,27].

This study had some limitations. First, this was a retrospective study performed at a single institution. Second, although we collected data from consecutive patients with HR-positive, HER2-negative, node-positive EBC, we did not adjust for a selection bias. Third, patients with unknown Ki-67 values were excluded from this study. Third, it is unclear whether CDK 4/6 inhibitors may benefit patients at a high risk of recurrence according to the NG classification. Finally, because of the relatively small sample size of this study, the reproducibility of the results needs to be validated with a larger sample size in a multicenter setting.

In conclusion, we showed that NG could be used to stratify the risk of recurrence of HR-positive, HER2-negative, node-positive EBC. Additionally, we demonstrated that NG could be used to select a group of patients who would not be considered high-risk if HG were used. Therefore, these results may contribute to adequate decision-making regarding adjuvant therapy according to the risk of recurrence.