Introduction

Breast cancer (BC) is the commonest malignancy worldwide and the leading cause of cancer death in Chinese women younger than 45 years [1,2,3]. In 2020, nearly 2,261,419 women were newly diagnosed breast cancer and 684,996 died from breast cancer, with a cumulative lifetime (age range: 0–74 years) risk of 5.20% [1, 4]. Hormonal receptor positive human epidermal receptor 2 negative (HR + /Her2−) comprises of approximately 60% of all BC [5], and multiple gene expression profiles are frequently used to evaluated the recurrence risk and benefit from adjuvant chemotherapy for HR + /Her2− early breast cancer (EBC) patients [6,7,8]. Currently, the 70-gene signature (70-GS, MammaPrint) test and 21-gene recurrence score (RS) have been validated in large prospective cohort, and recommended by the National Comprehensive Cancer Network (NCCN) guidelines as the mainstream of gene expression assays for HR + /Her2− EBCs [9].

Van’t Veer et al. analyzed treatment-naive EBC samples with DNA microarray analysis and found this 70-GS covering 7 pathways related to tumor proliferation, angiogenesis, invasion and migration which could predict risk of distance metastases [10]. In MINDACT trial, 46% of patients at clinical high risk (C-high) were assessed as genetic low risk (G-low) by MP 70-GS, and this group of patients could spare chemotherapy safely. A prospective multicenter study including 660 HR + /Her2− EBC patients showed that MP 70-GS changed half of the physician-intended recommendation of adjuvant chemotherapy [11]. Another study also demonstrated that use of the 70-GS changed patients’ inclination to receive adjuvant chemotherapy and facilitated decision-making [12]. Patients with 70-GS ultralow risk manifested good prognosis which was even distinctive from the 70-GS low risk, with 8-year breast cancer-specific survival (BCSS) rate of 99.6%, and distant metastasis-free interval (DMFI) rate of 97% [13] and might potentially be candidates for further de-escalation of treatment including the duration endocrine therapy [13, 14]. In the neoadjuvant setting, the adaptive randomized I-SPY2 trial and the observational prospective NBRST trial showed that the MP 70-GS high and ultrahigh risk could be associated with pathological complete response (pCR) rate and determine the chemo-sensitivity and long-term outcomes as predictive and prognostic biomarker [15, 16].

Previously we established immunohistochemistry 3 (IHC3) model based on the 21-gene RS and survival data to evaluate the personalized prognosis of HR + /Her2− EBCs and guide treatment choice [17]. We also combined the Clinical Treatment Score post-5 years (CTS5) model and 21-gene RS to develop a novel nomogram for prognosis prediction [18]. Study on BC patients among African-American females (AAF) who has unfavorable outcome compared to Caucasians showed that the 21-gene RS and 70-GS offered different prognostic information [19]. Another study revealed that 70-GS could provide useful information in addition to 21-gene assay resulting in changes of treatment decision in 33.6% of HR + /Her2− BC patients [20]. Given the expenses for the 70-GS assay, it is not always available and affordable worldwide, particularly in developing countries. There is little information about distribution of 70-GS risk among Chinese women and prediction models for 70-GS risk.

In this study, we planned to establish a nomogram model based on individualized medical history, imaging features and clinicopathological characteristics to predict the binary (high/low) and quartile categorized (ultrahigh, high, low, ultralow) risk of 70-GS test in HR + /Her2− EBCs among Chinese from a consecutive clinical cohort.

Patients and methods

Ethics statement

This retrospective study was approved by the Ethics Committee of the Peking Union Medical College Hospital (PUMCH), Chinese Academy of Medical Sciences.

Patient population

There were 150 consecutive female patients diagnosed with HR + /Her2− breast cancer and received treatment in Dept. Breast Surgery, PUMCH from November 2019 to March 2022. The 70-GS (MammaPrint) test was performed by ZhenHe Genecast Biotechnology Ltd, sole and exclusive appointed partner of 70-GS assay in China by Agendia. Patients’ medical history, reports of ultrasound (US) and mammogram (MG) and clinicopathological characteristics were reviewed collected (Fig. 1).

Fig. 1
figure 1

Flowchart of the study design with case number of each risk group of patients with eligible 70-gene signature test. The annotation and the number of tables and figures in accordance with the comparison and analysis results were italicized and in gray

Comparison of medical history risk factors, imaging features and clinicopathological characteristics between high vs low risk of 70-GS test

Comparison of 40 parameters including the patients’ medical history risk factors, imaging features and clinicopathological characteristics was performed between patients with high risk (N = 62) and low risk (N = 88) of 70-GS test. Imaging features including MG density, micro-calcification cluster, nodule/mass and breast imaging reporting and data system (BI-RADS) category, US aspect ratio, boundary, morphology, hyperechoity, multicentricity/multifocality, blood flow, lymph node condition and BI-RADS category were extracted from imaging reports and coded for comparison (Fig. 1).

Comparisons of risk calculations from established models among patients with different categories of 70-GS risk

Comparisons of risk calculations from established models including Adjuvant! Online (AOL) version 8.0 (Additional file 1: Fig. S1) [21], CTS5 [18, 22], IHC3 [17] and Nottingham prognostic index (NPI) [23] were performed between patients with high risk (N = 62) and low risk (N = 88) of 70-GS binary risk classification, as well as among patients with ultra-high (N = 12, defined as 70-GS score < − 0.569) [24], high (N = 50, 70-GS score − 0.569–0), low (N = 65, 70-GS score 0–0.355) and ultra-low risk (N = 23, 70-GS score > 0.355) of 70-gene quartile categorized risk classification [13].

Establishment and validation of nomogram models to predict binary and quartile categorized risk of 70-GS

The data of 150 patients were randomly split by 4:1 ratio and the training set included data from 120 patients and testing set 30 patients. Univariate analyses and multivariate logistic regression were performed both based on binary 70-GS risk classification (high vs low risk), and based on quartile categorized risk classification (ultra-high, high, low and ultra-low risk). Two nomograms were established to predict the binary and quartile risk categories of 70-GS  (Fig. 1).

Statistical analysis

The quantitative variables were compared with t-test, the categorical variables with chi-square tests. Univariate analysis was performed to identify variables associated with 70-GS risk. Multivariate logistic regression was used to develop the nomogram models. Risk predictors were selected using both stepwise regression analyses which based on Akaike information criterion (AIC), clinical importance and our previous study. Area under curve (AUC) of receiver operating characteristic (ROC) curves and C-index with 95% confidence interval (CI) was calculated to evaluate accuracy and discrimination of nomogram models. Calibration curve were used for visual inspection of calibration. The decision curve analysis (DCA) was used to reveal the potential clinical utility only for nomogram model of binary risk prediction. Statistical analyses were performed using R (4.0.3) software. All the statistical tests were two-sided, and statistical significance was defined as p value < 0.05.

Results

Comparison of medical history risk factors, imaging features and clinicopathological characteristics between 70-GS high vs low risk patients

Compared to 70-GS low-risk patients, the 70-GS high-risk patients had less cardiovascular co-morbidity (12.9% vs 27.3%, p = 0.034), more grade 3 BC (19.4% vs 4.5%, p = 0.006), lower progesterone receptor (PR) positive percentage (53.92 ± 36.49% vs 68.83 ± 30.43%, p = 0.007), more Ki67 high BC (≥ 20%, 87.1% vs 45.5%, p < 0.001) (Table 1). There were no significant differences in age, body mass index (BMI), childbirth, menarche, menopause, screen-detected BC, bilateral BC, all the included imaging parameters, TNM stage, multi-focality, lymphovascular invasion (LVI), estrogen receptor (ER) positivity and Her2 expression (Table 1).

Table 1 Comparison of medical history risk factors, imaging features and clinicopathological characteristics between Chinese patients with high versus low risk of 70-gene signature test

Comparisons of risk calculations from established models among patients with different categories of 70-GS risk

There was no significant difference in CTS5 score and in percentage of high-risk patients evaluated by AOL, CTS5 and NPI models between 70-GS binary risk groups of patients (Table 2) or in quartile categorized risk classification of patients (Table 3). There were more high-risk patients evaluated by IHC3 model in the 70-GS high-risk group (43.5% vs 11.4%, p < 0.001) (Table 2), and the percentage of IHC3 risk decreased accordingly among 70-GS ultra-high, high, low and ultra-low risk subgroups of patients (83.3%, 34.0%, 15.4% and 0.0%, p < 0.001) (Table 3). The NPI score also decreased accordingly with the 70-GS risk (Table 3).

Table 2 Comparison of risk calculated from established models between Chinese patients with high versus low risk based on binary risk classification of 70-gene signature test
Table 3 Comparison of risk predicted from established models among Chinese patients with ultra-high, high, low and ultra-low risk based on quartile risk classification of 70-gene signature test

Establishment and validation of nomogram models to predict binary and quartile categorized risk of 70-GS

The risk factors identified by the univariate analyses and multivariate logistic regression included cardiovascular co-morbidity, histological grade, PR positive percentage and Ki67 index for both nomograms (Figs. 2, 4). The points for each factor were marked on the scale and the total points for each individual could indicate the possibility of high risk (Fig. 2) based on binary risk classification (high vs low) as well as the possibility of ultra-high, high or low risk (Fig. 4) based on quartile categorized risk classification (ultra-high, high, low and ultra-low risk) of 70-GS test.

Fig. 2
figure 2

Forest plots of univariate (A) and multivariate (B) analyses of logistic regression showing the risk factors included and the according nomogram model (C) based on binary risk classification (high vs low risk) of 70-gene signature test

The calibration plots indicated the predicted 70-GS risk generated by the nomograms had a good consistency with the original 70-GS risk tested (Figs. 3, 5). The DCA indicated that when the threshold for predicted probability of high risk (binary risk classification) was within the range of 0.2–0.8, the nomogram model would add more net benefit than “all or none” strategy. The AUC of ROC curves of the nomogram for binary high risk prediction were 0.826 (C-index 0.903, 95%CI 0.799–1.000) for training and 0.737 (C-index 0.785, 95%CI 0.700–0.870) for validation dataset respectively (Table 4, Figs. 2, 3). The AUC of ROC of nomogram for quartile risk prediction was 0.870 (C-index 0.854, 95%CI 0.746–0.962) for training and 0.592 (C-index 0.769, 95%CI 0.703–0.835) for testing set (Table 4, Figs. 4, 5). The prediction accuracy of the nomogram for quartile categorized risk groups were 55.0% (likelihood ratio tests, p < 0.001) and 53.3% (p = 0.04) for training and validation, which more than double the baseline probability of 25%. The AIC was 128.54 (training) and 46.16 (testing) for binary risk nomogram and 247.07 (training) and 73.85 (testing) for quartile risk nomogram.

Fig. 3
figure 3

Receiver operating characteristic (ROC) curve (A), calibration curve (C) and decision curve analysis (DCA) (E) of the training set (N = 120) as well as the ROC curve (B), calibration curve (D) and DCA analysis (F) of the testing set (N = 30) from the established nomogram model (Fig. 2C) based on binary risk classification (high vs low risk) of 70-gene signature test

Fig. 4
figure 4

Forest plots of univariate (A) and multivariate (B) analyses of logistic regression showing the risk factors included and the according nomogram model (C) based on quartile categorized risk classification (ultra-high, high, low and ultra-low risk) of 70-gene signature test

Fig. 5
figure 5

Receiver operating characteristic (ROC) curve (A) and calibration curve (C) of the training set (N = 120) and counterparts (B, D) of the testing set (N = 30) from the established nomogram model (Fig. 4C) based on quartile categorized risk classification (ultra-high, high, low and ultra-low risk) of 70-gene signature test

Table 4 Parameters include area under curve (AUC) and C-index to evaluate the accuracy and discrimination of binary and quartile categorized nomogram models

Discussion

Multi-gene assays are used worldwide for prognostic evaluation and predictive information beyond histological parameters for escalation and de-escalation of individualized treatment of HR + /Her2− BC patients. The 21-gene recurrence score (RS) has been validated as quantifying the likelihood of distant recurrence in tamoxifen treated patients with HR + /Her2− BC [25] as well as the potential benefit from adjuvant chemotherapy according to the TAILORx and RxPONDER trial [7, 26]. Study showed that the prognostic and predictive value of the 21-gene RS was consistent in different countries [27], however, there might be difference in the 21-gene assay performance across racial groups with better performance among white compared with African American and Hispanic individuals. [28]

The 21-gene RS could be predicted by nomograms and machine-learning models based on clinicopathological parameters [29,30,31,32]. The 70-GS further stratifies HR + /Her2− BC patients with clinically high-risk according to AOL model into binary (high vs low) or quartile categorized (ultra-high, high, low and ultra-low) risk classifications [6, 33]. Study revealed that the 70-GS could provide additional information regarding patients classified as intermediate risk by the 21-gene assay resulting in changes of treatment decision in 33.6% of HR + /Her2− BC patients [20]. Chemotherapy was added or withheld by the treating physician with more confidence based on the results of the 70-GS test [20]. Given the considerable expenses of the 70-GS assay (approximately 19,800 RMB Yuan, USD 2828$), there was also endeavor to establish deep learning model to predict the binary 70-GS risk from microscopic pathological whole slide images (WSIs) [34]. However, easy-to-use prediction models for 70-GS risk are still lacking.

To our knowledge our study was the first to establish user-friendly nomograms based on imaging and clincopathological parameters to predict the 70-GS risk both in binary (high vs low) and quartile categorized (ultra-high, high, low and ultra-low) risk groups among Chinese population. It might reduce the cost of tumor genetic testing and address optimal management of early breast cancer patients. Moreover, it was in a format that can be translated to patients that would simplify decision-making on their part. Risk factor indicating hormonal conditions such as childbirth, menarche and menopause were not associated with 70-GS risk because all the patients’ BCs were already hormonal receptor positive. However, the co-morbidity of cardio-vascular disease provided additional information of the patient’s general condition and was associated with a lower 70-GS risk (Table 1, Figs. 2, 4). Interestingly, there were no significant differences in all the included imaging features of ultrasound and mammogram between 70-GS high vs low risk patients (Table 1). Not surprisingly, the most ‘classic’ clincopathological parameters associated with 70-GS risk were still histological grade, PR and Ki-67 index (Table 1, Figs. 2, 4), which was in accordance with studies building nomogram predicting 21-gene RS [17, 29, 30].

Majority (71.3%) of patients were clinical high risk according to AOL model. However, study showed that AOL model over-estimated survival of Asian BC patients [35], which might explain why there were 43 (28.7%) patients with AOL low risk and were still judged by their physician as ‘clinical’ high risk and received 70-GS test (Table 2). The CTS5 risk model was developed based on the data from the ATAC (Arimidex, Tamoxifen, Alone or in Combination) trial and the BIG (Breast International Group) 1–98 trial to estimate risk of late distant recurrence with parameters including age, tumor size, grade and lymph node [22]. Our previous study developed a nomogram combining both CTS5 and 21-gene RS could improve the evaluation of HR + /Her2− BC patients [18]. The NPI is used to determine prognosis for invasive BC of all subtypes which combines nodal status, tumor size and histological grade, stratifies patients with breast cancer into good, moderate, and poor prognostic groups with validation in large cohort [23, 36, 37]. The IHC3 model was developed in our previous study with parameters including Ki-67 index, PR positive percentage, tumor size and grade and it improved the evaluation of prognosis of HR + /Her2 BC patients compared to 21-gene RS [17]. Notably, the IHC3 risk and the NPI calculated score instead of the NPI risk group significantly correlated with both the binary (high vs low) and quartile categorized (ultra-high, high, low and ultra-low) risk classifications (both p < 0.001, Tables 2, 3). Thus, although the IHC3 model was developed to evaluate the lymph-node negative BC patients, it might also be used to evaluate patient with 1–3 positive nodes combined with the NPI score. For example, if a HR + /Her2 BC patient was evaluated as AOL clinical high-risk and IHC3 low-risk with a NPI score < 4.0, she might potentially spare or de-escalate chemotherapy (Table 3). Furthermore, if a HR + /Her2 BC patient was judged by both of our 70-GS nomogram as low-risk then she might also potentially spare or de-escalate chemotherapy (Figs. 2, 4).

The main reason for constructing a quartile risk model after the establishment and evaluation of the binary risk model is to improve the discrimination and recognition of those patients who were ultra-low risk (70-GS score 0.355–1) or ultra-high risk (70-GS score − 1 ~ − 0.569), who showed clinical prognostic significance and were justified for personalized treatment [13,14,15,16]. Although ROC curve is usually for accuracy evaluation of models predicting binary/dichotomic results [38], we managed to calculate ROC curves both for training and testing set for nomogram model predicting quartile risk (Fig. 3, Table 4). However, the DCA analysis is for evaluating alternative diagnostic and prognostic strategies only for models predicting binary/dichotomic results [39]. The threshold probability used in DCA analysis is used to determine both whether a patient is defined as test-positive or negative and to model the clinical consequences of true and false positives using a clinical net benefit function [40]. Therefore we did not perform DCA analysis for models predicting categorized/multiple classification.

Our study also has several limitations. First, it was a retrospective study with limited sample size, and the testing set is small with only 30 patients, which limited interpretation of the results and also made it difficult for establishment of prediction model with artificial intelligence. Second, the survival data was still unavailable in the currently study, so the prediction model could only be validated with 70-GS risk categories without the actual follow-up outcome. Therefore, our next-step research would be modification on the nomogram models with increased sample size and survival outcome. Third, the imaging features were extracted from the text from the reports of ultrasound and mammogram and there were no actual images analyzed by deep learning and integrated into prediction model. Fourth, the treatment information was not included in the currently study, yet different treatment may affect the prognosis.

Conclusion

We established easy-to-use nomogram models to predict the individualized binary (high vs low) and the quartile categorized (ultra-high, high, low and ultra-low) risk classification of 70-GS test with acceptable performance, which could guide treatment decision making for those who have no access to the 70-GS testing.