1 Introduction

The current treatment approach for hormone receptor-positive and human epidermal growth factor receptor 2 (HER2)-negative breast cancer patients is focused on selecting patients who can be spared chemotherapy [1,2,3,4]. Previously there were no tools for classifying patients who needed chemotherapy and those who did not. However, there have been several attempts to assess breast cancer patients’ risk with genomic analysis; finally, several guidelines advise using genomic assay results to identify people needing chemotherapy [5,6,7,8]. The National Comprehensive Cancer Network (NCCN) guidelines for breast cancer recommend using MammaPrint(MMP) for decision-making for chemotherapy in patients with N0-1 early-stage breast cancer [9]. While Oncotype DX is widely used in patients who have no metastasis to lymph nodes, MMP test has higher level of evidence for node-positive luminal A breast cancer patients [9].

Use of genomic analysis for risk assessment is key in treatment decision-making for breast cancer; however, there are some disadvantages to using this new tool. In the medical setting in Korea, national health insurance does not cover the cost of genomic analysis, thus imposing burdens on patients who already pay for the breast surgery. Furthermore, the time taken to obtain results of genomic tests is long, and patients are unable to receive therapy until the test results are obtained. For these reasons, prediction of MMP recurrence score in advance, using clinicopathological data, may be useful. In our previous study, we created a nomogram that can easily predict the MMP risk score as high or low using 4 clinicopathologic features: patient age, progesterone receptor status (PR), nuclear grade, and Ki-67 [10]. Data on these four factors are easily obtained in clinical settings. Although this nomogram is useful in predicting patient prognosis, it is still hard to correlate the result from the nomogram with survival benefit. Additionally, the size of the validation group in the previous study was insufficient considering that treatment choice is critical for patient outcome.

As the nomogram needs further validation, we have used recent patient data from Asan Medical Center, Seoul, to do so. We also used HR+/HER2− breast cancer patient data from 2010 to 2013, which has patient recurrence and survival data with clinicopathologic data, to investigate whether the nomogram can identify people with survival benefit.

2 Methods

2.1 Patients

Validation Set 1 enrolled breast cancer patients who were T1-3N0-1M0 hormone receptor-positive and HER2-negative who underwent MMP who had breast cancer surgery between 2020 and 2021 at Asan Medical Center, Seoul, Korea. The dataset of 172 cases who were eligible was used for validation of the nomogram created in our previous study that predicts MMP results using clinical data including age, nuclear grade, PR status, and Ki-67 results [10]. Clinical data obtained included patients’ age at surgery, sex, surgery type, TNM stage, adjuvant TNM stage, cancer size, Lymph node (LN) status, histologic and nuclear grade, Lymphovascular invasion (LVI), Hormone receptor status, HER2 status, Ki-67, and p53. Nuclear staining for ER and PR was evaluated using the Allred scoring method (0–8). Membrane staining for HER2 was evaluated using the HercepTest (BenchMark XT autostainer using OptiView DAB Detection Kit, Ventana Medical Systems, Tucson, AZ, USA) protocol. Immunohistochemistry for Ki-67 (1:250, MIB-1, Dako, Glostrup, Denmark) was performed using a BenchMark XT autostainer (Ventana Medical Systems) with an i-View detection kit (Ventana Medical Systems). Pathologic staging was determined based on the American Joint Committee on Cancer Staging Manual 7th edition.

Further, we used Validation Set 2, 1835 T1-3N0-1M0 HR+/HER2− patients from Asan Medical Center from 2010 to 2013, which included survival data and recurrence data, to analyze whether the nomogram can be associated with survival benefit identification as well. We compared the survival outcomes of two groups divided by nomogram score. The outcomes were disease-free survival (DFS), overall survival (OS), and breast cancer-specific survival (BCSS). DFS was defined as the time from the date surgery to the first date of disease recurrence; OS, the time from the date of surgery to the date of a patient’s death from any cause; BCSS, the time from the date of surgery to the date of a patient’s death from breast cancer. This study was reviewed and approved by the Institutional Review Board of Asan Medical Center (2017-1341). Informed consent was waived because the study was based on retrospective clinical data.

2.2 Statistical analysis

For validation, the MMP results of 172 cases were used. The four factors that were significant according to the nomogram, age at diagnosis (20–100), nuclear grade (range, 1–3), Allred scores of PR status (range, 0–8) and Ki-67 labeling index (0-100), were used to validate nomogram predictability. The Chi-square and Fisher’s exact tests were used for between-group comparisons of clinicopathological characteristics, based on MMP results. We conducted a robustness analysis to validate our model and employed receiver operating characteristic (ROC) analysis and calculated the area under the curve (AUC). Using the Kaplan-Meier method to further validate our nomogram system, we generated survival curves for breast cancer patients from 2010 to 2013. The significance of differences in survival was tested using the log-rank test. All data analyses were performed using R statistical package ver 3.2.0 (http://r-project.org). Significance level was set at 0.05, and all p values were two sided.

3 Results

3.1 Patient characteristics

Detailed information on patient characteristics of the Validation Set 1 cohort (n = 172) classified into MMP score high-risk group versus MMP score low-risk group is found in Table 1. The number of patients with initial T stage 1, 2, and 3 were 79 (45.9%), 89 (51.7%), and 4 (2.4%), respectively. Approximately 97.7% patients had lymph node involvement (n = 168). The mean age at initial operation for the entire cohort was 52.0 ± 10.3 years. There were no patients with ER Allred negative or weak positive scores; 4 (2.3%) had intermediate and 168 (97.7%) had strong positive scores. As regards PR Allred score, there were 16 (9.3%) patients with negative; 17 (9.9%), weak; 37 (21.5%), intermediate; and 102 (59.3%), strong positive scores. Approximately 43.0% of patients had positive LVI, and 47.1% showed high Ki-67. Additional file 1: Table S1 shows the dataset of the 407 patients used for developing the nomogram in the previous study.

Table 1 Comparison between the characteristics of MMP low-risk and MMP high-risk patients with those of the total validation set 1 patient cohort

3.2 Model predicting MMP results and validation of nomogram with validation set 1

Of the 172 patients, 93 (54.1%) were MMP low and 79 (45.9%) were MMP high. The mean age at diagnosis among those with MMP low was 52.1 \(\pm\) 9.9 years; for MMP high, it was 51.8 \(\pm\) 10.8 years (= 0.829). The MMP high group had higher histologic and nuclear grades, all with < 0.001, compared to the MMP low-risk group. There were 64.6% patients with high Ki-67 level in the MMP high group, compared to 32.3% in the MMP low group.

As regards estrogen receptor status, most patients showed strong Allred scores (7–8). Only 1 (1.1%) patient in the MMP low-risk group and 3 (3.8%) in the MMP high-risk group had intermediate Allred score (5–6). No patients showed negative or weak positive ER Allred score. PR status showed distinct characteristics compared to that of ER status; however, these were not significant. No difference was found in the presence of LVI, p53 status, and surgical methods. Pathologic stages of MMP low and high, tumor size, and number of positive nodes and largest positive node size showed no statistical relevance.

The nomogram (Fig. 1a) was initially made by training set and later validated by validation set, and the AUC was 0.82 (95% CI, 0.77–0.87) (Fig. 1b) and 0.77 (95% CI, 0.68–0.86) (Fig. 1c), respectively [10]. To validate the nomogram, we used the patient cohort of 172 breast cancer patients who underwent MMP testing. The AUC of the Validation Set 1 was 0.73 (95% CI, 0.66–0.81) (Fig. 2). The calibration plot (Fig. 2) shows good calibration.

Fig. 1
figure 1

a Nomogram to predict low-risk recurrence score of MammaPrint result and receiver operating characteristic curve of nomogram. b Training group of 306 patients. c Validation group of 103 patients

Fig. 2
figure 2

a Receiver operating characteristic curve of nomogram. Validation cohort of 172 patients. b Calibration plot of nomogram

3.3 Additional validation of the nomogram with an independent cohort

We performed an additional validation study with the data of 1,835 T1-3N0-1M0 HR+/HER2- patients, Validation Set 2. The patient cohort was classified into two groups, low-risk and high-risk group, based on a nomogram value of 183. A cutoff of 183 was selected based on Youden index, which consider both sensitivity and specificity to find optimal cutoff value. Detailed characteristics of the patient cohort used for additional validation (n = 1,835) can be seen in Table 2. The patient’s cohort data is divided into two groups by Total point (TP) cutoff of 183. The mean age at diagnosis was 44.7\(\pm\)9.0 years vs. 50.6 \(\pm\) 9.2 years in TP<183 and TP \(\ge\) 183 groups, respectively (p < 0.001). The TP < 183 group had a higher histologic grade, a higher nuclear grade, and a higher Ki-67 level and p53 than the low-risk group (p < 0.001). ER (p = 0.029) and PR (p < 0.001) status were different between the two groups. The TP<183 group had 89 (13.2%) and 572 (85.2%) with intermediate and strong ER Allred scores, respectively. On the other hand, the TP \(\ge\) 183 group had 115 (9.9%) and 1039 (89.3%) with intermediate and strong ER Allred scores, respectively, which indicated higher percentage of strong ER Allred score. That of PR was also similar, where 194 (28.9%) vs. 210 (18.1%) patients had intermediate PR Allred score, and 412 (61.3%) vs. 913 (78.5%) had strong PR Allred score. Pathologically confirmed results showed higher stages in T and N stages in the TP<183 group. In the TP<183 group, only 401 (59.7%) had T1 disease and 459(68.3%) had N0; on the contrary, there were 857 (73.7%) patients with T1 stage disease and 848 (72.9%) with N1 stage in the TP \(\ge\) 183 group.

Table 2 Characteristics of the validation set 2 patient cohort

To assess the efficacy of the nomogram in defining the prognosis of the patients, Kaplan-Meier analysis was used. We identified that the high-risk group according to the nomogram had significantly lower DFS (p = 0.008) and 5-year BCSS rates (98.6% vs. 99.4%, p = 0.002) compared to the low-risk group (Fig. 3), In the case of 5-year OS (98.1% vs. 97.9%, p = 0.056), the high-risk group showed inferior survival trend compared to the low-risk group. Thus, the nomogram can indeed distinguish the better-prognosis group from the worse-prognosis group.

Fig. 3
figure 3

Kaplan-Meier analysis of validation set 2 according to the cutoff of 183. a Overall survival. b Disease-free survival, and c Breast cancer-specific survival

4 Discussion

For patients with early-stage breast cancer, predicting the response to chemotherapy and risk of recurrence is crucial. Genomic testing for early breast cancer is a reliable tool for decision making for treatment. For adjuvant systemic therapy in patients with non-metastatic, ER/PR positive, HER2 negative, and N0-1 breast cancers, the 70-gene MMP test is recommended as evidence of category 1 in the 2022 NCCN guidelines [9]. Moreover, the St. Gallen Consensus Conference, American Society of Clinical Oncology (ASCO), and European Commission Initiative on Breast Cancer recommended use of the 70-gene signature in women with high clinical risk factors [7, 11, 12]. Therefore, the 70-gene signature test is increasingly known for its usefulness in determining candidates for adjuvant chemotherapy. Although there is evidence for the usefulness of MMP in certain subsets of patients, clinicians hesitate to recommend this test because of the cost and long duration to obtain test results.

The nomogram in our previous study helped clinicians promptly predict a low MMP risk, using only four simple clinicopathologic factors (age, nuclear grade, progesterone receptor status, and Ki-67) [10]. According to multivariate analysis, age and progesterone receptor status showed positive relationship with MMP low risk and their regression coefficient were 0.65 (95% confidence interval [CI], 0.32–0.98) and 0.18 (95% confidence interval [CI], 0.05–0.31), respectively. Nuclear grade and Ki-67 had negative association with MMP low risk and their regression coefficient were − 2.22 (95% confidence interval [CI], − 3.75 to − 0.69) and − 0.07 (95% confidence interval [CI], − 1.69 to 0.04). Initially the nomogram was made by cohort of 409 T1-3N0-1M0 hormone receptor-positive and HER2-negative breast cancer patients [10]. The total 409 cohort was randomly classified into two separate set, 306 training set and 103 validation set (Training set was used to build nomogram and validation set was used to validate the nomogram). Other studies have attempted to predict the MMP score by using the MR imaging radiomics signature or other clinicopathologic data [13, 14]. However, we decided that the easiest way to make a quick estimation is by nomogram which is ran by simple four simple factors [10]. On top of it we added user-friendly interfaced calculator that can calculate the MMP score in a second.

In this retrospective cohort study, which aimed to validate a nomogram that can predict the probability of a low risk of MammaPrint results in women with clinically high-risk breast cancer (according to definition in Adjuvant! Online), we found out the AUC of Validation Set 1 was 0.73 (95% CI, 0.66–0.81). In order to further validate the nomogram by seeking correlation of low risk of MammaPrint and long-term survival of patients, additional study was done with Validation Set 2. Kaplan–Meier analysis of the two groups divided by a nomogram score of 183 showed a relevant difference in OS (p = 0.056), DFS (p = 0.008) and BCSS (p = 0.002). Though we have chosen 183 as an ideal cutoff for discriminating low-genomic risk and the basis for the choice of treatment option(Sensitivity and specificity is 84% and 65%, respectively), however, clinicians can choose another cutoff, if their primary preference is another combination of sensitivity and specificity. In MINDACT trial, the primary outcome was to assess the presence of non-inferiority in 5-year survival without distant metastasis, among patients who have high-risk clinical features and did not receive chemotherapy, but have low genomic risk score [15]. Similarly, our study investigated whether nomogram score, which was originally designed to find out low risk of MMP, can be related to survival advantage in high clinical risk patients. As low risk in MMP is conventionally thought of as sign of good prognosis, we have tried to verify the association of nomogram score with survival prediction ability.

On closely inspecting the cohort profiles of 1,835 patients at AMC from 2010 to 2013, we found that no patient in the group with TP \(\ge\) 183 had histologic grade III, while the group with TP<183 had 257(38.3%) patients with grade III. In addition, the groups with TP \(\ge\) 183 and TP < 183, had 103 (8.9%) and 3 (0.4%) breast cancer patients with grade I, respectively. Since it was first demonstrated in 1991 [16], the histologic grade is a well-known prognostic factor in breast cancer. Consensus and recommendations for good practice in determining the tumor grade set by pathology in breast cancer [17] render the tumor grade a simple and accurate determinant in decision making regarding adjuvant therapy. Despite developments in new molecular technology, tumor grade determination remains a validated method for assessing patient prognosis when alternative molecular testing is not available [18].

Age at diagnosis plays a key role in our model; younger patients (< 50 years) are unlikely to have 95% or higher low genomic risk probability according to the nomogram calculator [10]. In our study, there was a considerable difference in the mean age at diagnosis in the Validation Set 2 (44.7 ± 9.0 years in the high-risk group and 50.6 ± 9.2 years in the low-risk group). Most recently updated results of the MINDACT trial, with a median follow-up of 8.7 years, reveal that in clinically high/genomic low risk patients, women aged 50 years or younger have a relevant difference in 8-year distant metastasis-free survival, based on the type of adjuvant therapy administered [19]. The absolute difference in the 8-year distant metastasis-free survival in the chemotherapy and no-chemotherapy groups were 5.0 and 0.2% point respectively, and this result might be the consequence of ovarian function suppression effect of chemotherapy which only affect premenopausal women. Our study result corresponds with up-to-date interest in age at diagnosis, which is considered crucial prognostic factor in the selection of adjuvant treatment options.

In terms of disease-free survival, the TP \(\ge\) 183 group showed significantly superior outcome over the TP<183 group (p = 0.008). HR+ and HER2− breast cancers are reported to have 20% likelihood of tumor recurrence in the first 10 years after surgery, although many patients do not experience recurrence [20]. During the first few years of adjuvant endocrine therapy [21], patients with a high clinical risk and/or pathologic features have a higher risk of disease recurrence. According to the monarchE phase III trial, distant metastasis accounts for more than 75% of early recurrences during endocrine therapy [22]. Therefore, optimizing adjuvant therapy according to patient specific risk is important for improving invasive disease-free survival. After reviewing the limitations of our previous study, the nomogram was constructed based on the MMP result, but we did not consider the treatment outcome of a patient who received adjuvant therapy. This study aimed to provide evidence of the possibility that the nomogram could help distinguish the groups with favorable and worse outcomes. Moreover, by validating the nomogram using a patient cohort from another time period, 2010 to 2013, this study strengthens the usability of the nomogram.

There are some limitations that could be found into this study. First, this study is based on a retrospective analysis of patient datasets and, therefore, there might have been selection bias in selecting the patient cohort. Validation Set 1 and 2 show different patients characteristics, because set 1 only enrolled patients selected for costly MMP test and, therefore, Validation Set 1 had few ER-PR, negative, or weak positive patients, leading to inaccurate predictions in the ER-PR negative or weak positive groups. However, this aspect can also be a strong point of this study, because in real-world clinical settings, most ER-PR negative or weak positive patients are omitted from MMP tests. In addition, as Ki-67 levels may vary between institutions, there can be issues with reproducibility in other institutions. Therefore, further validation studies using cohorts from other institution may help confirm the usability of this nomogram. Mistranslation of the nomogram results into overall prognosis of individual patients by other clinicians is also a concerning issue. Due to its retrospective nature, this study is incapable of showing the predictive value of this test. Moreover, the validation of nomogram was conducted by the cohort of one institute, therefore, the usability of nomogram could be later validated by multi-institutional analysis.

In conclusion, for decision making regarding treatment in HR+/HER2− and node positive, clinically high-risk patients, our nomogram is useful for quick prediction of low genomic risk patients who can be spared MMP testing. In addition, the two groups distinguished by the nomogram score differed in the 5-year DFS and BCSS, and OS. Further studies are needed to validate this model in international cohorts and large prospective cohorts from other institutions.