Introduction

Breast cancer is a highly heterogeneous disease and is currently classified into four general molecular subtypes according to the status of hormone receptors, including estrogen receptor (ER) or progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) [1]. Each subtype has distinct molecular characteristics and although individual patient prognosis varies, patients with the hormone receptor-positive, HER2-negative (HR+/HER2) subtype generally have a more favorable prognosis whereas those with hormone receptor-negative (HR−) breast cancer have a poor prognosis [2,3,4]. Because treatment strategies for breast cancer are dependent on molecular subtype and patient prognosis, it is important to identify specific prognostic biomarkers for each molecular subtype to determine appropriate treatments.

Gene expression-based approaches provide significant prognostic or predictive information, and commercial assays such as Oncotype DX [5, 6], MammaPrint [7, 8], Prosigna [9, 10], and EndoPredict [11] based on multigene expression profiling in frozen or formalin-fixed, paraffin-embedded (FFPE) samples have been developed for ER-positive (ER+) breast cancer. These assays predict the risk of distant recurrence after hormone therapy and are useful to identify patients who will benefit from adjuvant chemotherapy by discriminating high- and low-risk patients with early ER+ breast cancer. However, there are certain limitations to the currently available assays that use multigene expression signatures based on proliferation-related genes (p-genes) including diminished prognostic ability to predict the late distant recurrence (beyond 5 years from diagnosis or primary treatment). Furthermore, commercialized kits based on various multigene predictors of clinical outcome are prognostic only for HR+ subtypes, whereas there is no available commercial assay for HR breast cancer. Meta-analysis using publicly available microarray data from over 2100 patients showed that the key biological processes associated with the clinical outcome of patients with breast cancer differs according to the molecular subtype [12]. This study selected seven prototype genes (AURKA, PLAU, STAT1, VEGF, CASP3, ESR1, and ERBB2) representing different biological processes, proliferation, tumor invasion/metastasis, immune response, angiogenesis, apoptosis phenotypes, and ER and HER2 signaling, respectively, and assessed the association between the expression of these seven gene modules and clinical variables and relapse-free survival of patients in each subtype of the breast cancer. The results showed that the prognostic performance of the proliferation module was limited to the ER+/HER2 subgroup, and genes associated with tumor invasion and immune response have prognostic value in ER/HER2 or ER/HER2+ subtypes. Recent studies reporting prognostic genes or gene signatures predicting recurrence or distant metastasis for HR breast cancer [13,14,15,16] further confirm that expression of immune response-related genes (i-genes) is primarily associated with good clinical outcome in patients with HR breast cancer, unlike the strong prognostic significance of p-genes, which predict recurrence in HR+ breast cancer. However, these results are based mainly on gene expression microarray data, and validation of most of the identified prognostic genes or signatures has not been performed.

We previously identified 384 candidate prognostic genes associated with distant metastasis in patients with lymph node-negative (LN) early breast cancer using public microarray gene expression data [17]. This study aimed to identify novel prognostic genes associated with the risk of distant metastasis in patients with various subtypes of breast cancer from the candidate list established in our previous study. We validated the expression of 16 candidate prognostic genes by performing quantitative real-time reverse transcription-PCR (qRT-PCR) in a large number of FFPE tissue samples, and we then assessed the association between their expression and the risk of distant metastasis in 819 patients with breast cancer. Based on the resulting set of significant prognostic genes, we developed a prognostic model to predict the risk of distant metastasis in HR−/HER2+ breast cancer.

Materials and methods

Ethical statement

This study was approved by the Institutional Review Board (IRB) of the Samsung Medical Center (SMC) (Seoul, Korea) and performed in accordance with the Declaration of Helsinki. The study was retrospective and informed consents from the patients involved in the study were not required, as per the guidelines of the IRB. Patient information was anonymized and de-identified prior to analysis.

Study population

Our study adhered to the Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK) criteria in the design, analysis, and interpretation of the results [18]. A total of 997 FFPE tissue specimens were obtained from patients with breast cancer who underwent curative resection for primary tumors with LN dissection at the SMC between 1994 and 2002. We also obtained 50 frozen tissue samples paired with FFPE samples from the same patients. Detailed inclusion/exclusion criteria for tissue samples are described in Supplementary materials and methods. Molecular subtypes of breast cancer were categorized as HR+/HER2 (ER+ or PR+/HER2), HR+/HER2+ (ER+ or PR+/HER2+), HR−/HER2+ (ER/PR/HER2+), or triple-negative breast cancer (TNBC, ER/PR/HER2) according to the expression status of ER, PR, and HER2, as classified in our previous study [19].

Selection of candidate prognostic genes

From 384 candidate genes identified in our previous study using public gene expression microarray data [17], a total of 30 candidate genes were selected based on the following criteria: (1) high correlation with either proliferation or immune response, (2) high variability between samples (large interquartile range), and (3) high mean expression value. Based on the results of qRT-PCR, 16 genes with high correlation of expression between FFPE and frozen tissues were further selected. For details, see Supplementary materials and methods.

qRT-PCR and normalization of qRT-PCR data

RNA extraction and qRT-PCR were performed as described in Supplementary materials and methods. The relative expression value of each gene was calculated based on the difference between the average Cq value of the three reference genes (CTBP1, CUL1, and UBQLN1) and the target Cq value for each sample:

$$ \Delta C_{\text{q}} \_{\text{target}} = \left( {\left( {C_{\text{q}} \_CTBP1 + C_{\text{q}} \_CUL1 + C_{\text{q}} \_UBQLN1} \right)/ 3} \right) - C_{\text{q}} \_{\text{target}} + 30 $$
(1)

Development of the prognostic model for HR−/HER2+ breast cancer and cross validation

Based on stepwise multivariate analyses results, a prognostic model to predict the risk of distant metastasis in HR/HER2+ breast cancer was developed. Relative expression values of the two prognostic genes normalized by the average expression level of three reference genes were used to calculate the risk score, a molecular predictor of distant metastasis. The risk score was defined as follows:

$$ {\text{Risk score}} = 0. 4 5\times \Delta C_{\text{q}} \_MMP11 - 0. 4 8\times \Delta C_{\text{q}} \_CD2 $$
(2)

Higher values indicate a higher risk of distant metastasis. For development and performance evaluation of the prognostic model, a 10-fold cross validation procedure was used [20].

We compared the prognostic performance of our prognostic model with other prognostic models based on clinical variables, including The Nottingham prognostic index (NPI) score [21] and two web-based prediction tools, SNAP (www.CancerMath.net) [22] and PREDICT (www.predict.nhs.uk) [23, 24]. The Harrell’s concordance index (C-index) [25] was calculated to estimate the discrimination capability of each prognostic model and to compare their prognostic performance. Detailed information on the development of the prognostic model is provided in the Supplementary materials and methods.

Statistical analyses

Distant metastasis-free survival (DMFS) was defined as the time from the date of surgery for the primary tumor to the date of distant metastasis. Overall survival (OS) and Disease-free survival (DFS) were defined as described in our previous study [26]. Univariate and multivariate analyses were performed using Cox proportional hazard model. For univariate and multivariate analyses, missing C q values were imputed using the algorithm developed by McCall et al. [27]. In particular, selected variables in univariate analysis were entered in a stepwise multivariate Cox proportional hazard model to determine independent contributions of predictors for the primary endpoint. Probability of distant metastasis was estimated by the Kaplan–Meier method and the log-rank test was used to test the differences in survival between the groups. Differences were to be considered statistically significant if the P value was less than 0.05. All statistical analyses were performed using R 3.2.0 (http://r-project.org).

Results

Patient characteristics

Of the 997 FFPE tissue samples, histologically ineligible samples or those with an insufficient amount of tissue were excluded, as were samples that produced an insufficient amount of RNA. Gene expression was measured in a total of 926 FFPE samples by qRT-PCR. Cases with missing Cq values for reference genes in the qRT-PCR data or with missing clinical information were further excluded, resulting in a total of 819 breast cancer patients with informative clinical data that were finally included in the analysis. The median patient age was 47.3 years (range 23.8–81.2), and the mean tumor size was 2.8 ± 1.6 cm (mean ± SD). Of the 819 patients, 51.6% (423/819) were LN, whereas 48.4% (396/819) were LN+. A total of 86.3% (707/819) of the patients received adjuvant chemotherapy. The details on the clinicopathological characteristics of breast cancer patients grouped by molecular subtypes are illustrated in Table 1. Of the 819 cases, the majority comprised HR+ tumors, including HR+/HER2 (50.1%, 410/819) and HR+/HER2+ (13.7%, 112/819) subtypes. The HR+/HER2 subtype had the highest percent of histologic grade 1 and 2 tumors, whereas the HR/HER2+ and TNBC subtypes consisted of a higher proportion of grade 3 tumors.

Table 1 Clinical characteristics of the breast cancer patients in this study

Kaplan–Meier curves for DMFS, DFS, and OS were generated according to molecular subtype. The median follow-up durations for DMFS, DFS, and OS were 9.68 (range 0.04–19.46), 9.45 (range 0.04–19.46), and 10.33 years (range 0.05–19.46), respectively. During this entire follow-up period, there were no significant differences in patient survival between molecular subtypes (Supplementary Fig. S1A). However, in terms of 5-year OS and DFS, significant (P < 0.001 for OS) or marginally significant (P = 0.069 for DFS) differences in patient survival between molecular subtypes were observed (Supplementary Fig. S1B). Patients with HR−/HER2+ subtype showed poorer 5-year survival than those with the HR+/HER2− subtype (Supplementary Fig. S1B).

Univariate analysis of clinical variables according to molecular subtype

First, we analyzed the association of traditional clinicopathological factors with clinical outcome according to molecular subtype. Univariate analysis for DMFS showed that clinical variables such as larger tumor size, positive LN (LN+) involvement, and higher histologic grade were significantly associated with an increased risk of distant metastasis in the HR+/HER2 subtype (Table 2). However, only LN status was significantly correlated with DMFS in HR+/HER2+ cancers, whereas tumor size and LN status were significant for DMFS in TNBC. The significance of tumor size was limited to HER2 subtypes, including HR+/HER2 and TNBC. Of note, none of the clinical variables were significant in HR/HER2+ breast cancer.

Table 2 Univariate analysis of clinical variables for DMFS according to molecular subtype

Univariate analysis results for DFS and OS were similar to those for DMFS (Supplementary Table S1). None of the clinical variables were significantly associated with DFS or OS in HR/HER2+ subtype tumors. In HR+/HER2+ tumors, younger age (age < 50) and LN+ status showed a significant association with increased risk of recurrence, whereas none of the clinical variables were significant for OS.

Univariate analysis of gene variables according to molecular subtype

Univariate analysis for gene variables showed that the association between expression of each of the 16-candidate prognostic genes and distant metastasis differed according to molecular subtype. Most p-genes showed a significant association with DMFS in the HR+/HER2 subtype. High level expression of nine p-genes correlated significantly with a greater risk of distant metastasis in this subtype (Table 3).

Table 3 Univariate analysis of gene variables for DMFS according to molecular subtype

Subgroup analysis by LN status in each molecular subtype showed a slight difference in the significant genes between LN+ and LN cancers. In particular, in HR+/HER2, LN tumors, five p-genes (FOXM1, MK167, RRM2, TOP2A, and UBE2C) were statistically significant and the expression of immune response-related BTN3A2 showed a marginal significance in DMFS (Table 3). However, BTN3A2 was not significant in LN+ breast cancer. Three p-genes (MMP11, RRM2, and UBE2C) were significant in DMFS in the HR+/HER2+ subtype. In HR/HER2+ breast cancer, MMP11 and three i-genes (BTN3A2, CD2, and TRBC1) were significantly associated with clinical outcomes, whereas no clinical variable was significant. Higher MMP11 expression was significantly associated with higher risk of distant metastasis, while higher expression levels of BTN3A2, CD2, and TRBC1 were related to favorable outcomes (Table 3). The significant association of these i-genes with a favorable outcome was only observed in LN breast cancer, and not in LN+ tumors.

Consistent with the findings for DMFS, genes significantly associated with DFS or OS were dependent on molecular subtype and the list of genes was similar to that associated with DMFS (Supplementary Tables S2, S3). A significant relationship between high level expression of i-genes and favorable outcome was also observed only in HR/HER2+ breast cancer. Subgroup analysis by LN status in each molecular subtype showed only a slight difference between LN and LN+ breast cancer in the list of genes significantly associated with DFS or OS.

Multivariate analysis according to molecular subtype

Using the clinical and gene variables that were significant in the univariate analysis, stepwise variable selection of multivariate analysis was performed to identify independent predictors of DMFS for each molecular subtype. Hazard ratios and 95% confidence intervals (CIs) for DMFS are shown in Table 4. In HR/HER2+ breast cancer, MMP11 (hazard ratio 1.49; 95% CI 1.08–2.04; P = 0.014) and CD2 (hazard ratio 0.66; 95% CI 0.47–0.94; P = 0.022) retained their statistical significance for DMFS in multivariate analysis. These results demonstrated that the expression of MMP11 and CD2 are independent prognostic factors for HR/HER2+ breast cancer. In other subtypes, positive LN status was an independent negative prognostic factor. Moreover, TOP2A was independently associated with DMFS in the HR+/HER2 subtype.

Table 4 Multivariate analysis of DMFS according to molecular subtype

With regard to DFS, the expression of MMP11 and UBE2C were independent prognostic factors in HR/HER2+ breast cancer (Supplementary Table S4). Unexpectedly, increased UBE2C expression showed a significant association with the decreased risk of recurrence in this subtype. In HR+/HER2 cancers, LN+ status and expression of MMP11 and TOP2A were independently associated with the increased risk of recurrence. Age, LN status, and UBE2C expression were independent prognostic factors in the HR+/HER2+ cancer. LN status was an independent negative prognostic factor only in TNBC.

Independent prognostic factors for OS in HR/HER2+ breast cancer included the expression of MMP11 and BTN3A2 (Supplementary Table S4). In the HR+/HER2+ subtype, RRM2 expression retained its significance. By contrast, LN status only was an independent prognostic for OS in HER2 breast cancer, whereas no gene variable was significant.

Prognostic performance of the risk model for distant metastasis in HR−/HER2+ breast cancer

To assess the prognostic significance of our prognostic model, HR/HER2+ breast cancer patients were classified into two groups, high risk and low risk, according to the risk score developed by our prognostic model. Kaplan–Meier curves demonstrated that DMFS for patients in the high-risk group was significantly lower than for those in the low-risk group (log-rank test; P < 0.001; Fig. 1). The probabilities of DMFS at 10 years for patients in the high-risk and low-risk groups were 56.1% and 87.7%, respectively. That is, patients in the high-risk group had a significantly higher 10-year distant metastasis rate (43.9%) than those in the low-risk group (12.3%). When we analyzed the difference in clinical characteristics between the risk groups, we found no significant differences between the two groups (Supplementary Table S5). These results indicate that our prognostic model is useful for differentiating HR/HER2+ breast cancer patients at high risk and low risk of distant metastasis, whereas clinical variables alone are not sufficient to identify these patients. There was no association between clinical variables with our risk score in HR/HER2+ breast cancer (Supplementary Fig. S2).

Fig. 1
figure 1

Kaplan–Meier plot of distant metastasis-free survival (DMFS) in low-risk and high-risk groups defined by our prognostic model in patients with HR/HER2+ breast cancer. Survival estimates between two risk groups were compared using the log-rank test and the hazard ratio was derived using Cox proportional hazard model

In the multivariate analysis after adjustments for traditional clinicopathological parameters, our risk score retained statistical significance (hazard ratio 2.49; 95% CI 1.46–4.24; P = 0.001; Table 5) and was more significant than other prognostic models based on clinical variables (Table 6). These results indicate that our model is an independent prognostic indicator of risk of distant metastasis in HR/HER2+ breast cancer.

Table 5 Multivariate analysis of our prognostic model and traditional clinicopathological parameters for DMFS in HR−/HER2+ breast cancer
Table 6 Multivariate analysis of our prognostic model and other prognostic models based on traditional clinicopathological parameters for DMFS in HR−/HER2+ breast cancer

Our model showed the best performance in predicting the risk of distant metastasis with the highest C-index (0.694) among other traditional prognostic factors (Fig. 2) or prognostic models based on clinicopathological factors alone (Supplementary Fig. S3). These results reinforce that our prognostic model is superior to other conventional models based on clinical variables alone in predicting the risk of distant metastasis in HR/HER2+ breast cancer and provides more accurate prognostic information than traditional clinicopathological factors in this subtype of breast cancer.

Fig. 2
figure 2

Prognostic performance of our risk score in predicting distant metastasis in HR/HER2+ breast cancer compared with that of traditional clinicopathological parameters based on C-index. Values on the x-axis are unbiased estimates of the C-index of the linear combination of one or more variables by Cox regression

Prognostic significance of MMP11 and CD2 expression in HR−/HER2+ breast cancer in public dataset

We also examined the relationship between MMP11 and CD2 gene expression and prognosis of patients with HR/HER2+ breast cancer using public dataset to confirm their clinical significance in other cohorts. Gene expression and clinical data from METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) cohort [28] were obtained from cBioPortal (http://www.cbioportal.org/) [29]. Consistent with the results in our cohort, significantly shorter OS in patients with high MMP11 expression than those with low MMP11 expression was observed (P = 0.030), whereas patients with high CD2 expression had a significantly longer OS than those with low CD2 expression (P = 0.027) (Fig. 3).

Fig. 3
figure 3

Prognostic significance of MMP11 and CD2 gene expression in public dataset. Kaplan–Meier plot of overall survival in two subgroups classified based on the expression of a MMP11 or b CD2 in patients with HR−/HER2+ breast cancer from Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) cohort

Discussion

Based on the 384 genes identified from our previous study, we selected 16 candidate prognostic genes, and assessed the association between their expression and patient outcome in different molecular subtypes of breast cancer.

Univariate analysis identified significant factors correlated with distant metastasis in different molecular subtypes of breast cancer. Among the traditional clinicopathological factors, LN status showed a significant relationship with DMFS and DFS in all molecular subtypes except HR/HER2+ breast cancer. Of note, larger tumor size was significantly associated with higher risk of distant metastasis in HER2 breast cancer, but not HER2+ breast cancer. Moreover, we identified subtype-specific prognostic genes whose expression was significantly associated with the risk of distant metastasis. Higher expression of most of the p-genes correlated significantly with a higher risk of distant metastasis in HR+/HER2 cancer, whereas no significance was observed in TNBC. These results are consistent with the previous finding that proliferation is the most important component of the prognostic signature in ER+ breast cancer [12]. Our study provides novel proliferation-related prognostic gene sets for HR+ breast cancer that may be used to develop a multigene assay for predicting the risk of distant recurrence and thereby identify patients who will benefit from specific treatment in this subtype.

In addition, elevated expression of i-genes (BTN3A2, CD2, and TRBC1) was significantly correlated with favorable clinical outcome in HR/HER2+ breast cancer, but not in other subtypes. The prognostic value of immune gene signatures as predictors of distant metastasis in HR breast cancer has been reported [13, 14, 16]. In particular, both BTN3A2 and CD2 are involved in the T-cell immune response. However, it is difficult to exclude the possibility that the increased expression of these genes is due to infiltrating immune cells. The prognostic significance of infiltrating immune cells as a key component of the tumor microenvironment has been recognized for breast cancer, and a positive correlation between lymphocyte infiltration or expression of lymphocyte-associated genes and HER2 amplification/overexpression in breast cancer has been reported [30, 31]. Moreover, higher expression of lymphocyte-associated genes is associated with a favorable prognosis in HER2+ breast cancer [30,31,32]. Our findings further expand the prognostic significance of i-genes in HR−/HER2+ breast cancer. It is notable that among i-genes, T-cell-related genes are associated with favorable prognosis of HR/HER2+ breast cancer in our study. This is supported by a recent study reporting that T-cells, but not B-cells, have significant prognostic value in HER2+ breast cancer [32].

Interestingly, a recent study revealed that CD2 is critical for antibody-dependent responses by adaptive natural killer (NK) cells, suggesting an important role for CD2 in stimulating the NK cell response to therapeutic antibodies [33]. This recent finding raises the possibility that the correlation between high CD2 expression and favorable prognosis of patients with HR/HER2+ breast cancer in our study is related to the augmentation of NK cytotoxic activity against cancer cells by CD2. However, this relationship was not assessed in this study, and further studies designed to unravel the association between CD2 expression and anti-HER2 antibody response or the value of CD2 in predicting anti-HER2 antibody response in HER2+ breast cancer will be required.

Importantly, we found that the gene expression of MMP11 and CD2 are independent prognostic factors for DMFS in HR/HER2+ breast cancer, whereas clinical variables were not significant prognostic indicators. With regard to prognostic models for HR/HER2+ breast cancer, several attempts have been made to identify prognostic multigene signatures for this subtype using gene expression microarray data, but a few validated prognostic genes have been established. In this context, it is important that the expression of MMP11 and CD2 are validated as independent prognostic factors and this is in line with previous studies showing that the main gene signatures associated with prognosis in HER2+ breast cancer include genes related to tumor invasion and immune response [12, 34]. The roles of MMP11 in tumor progression have been reported in breast cancer. Its overexpression promotes anoikis resistance [35] and enhances tumorigenesis in HER2 breast cancer cell lines via IGF-1 signaling [36]. Recent studies also showed that MMP11 is a downstream target of oncogene or tumor suppressor microRNA, thereby contributes to tumor cell migration, invasion, or angiogenesis in breast cancer cells. Oncogenic transcription factor Gli1 promotes migration and invasion of ER− breast cancer cells through the up-regulation of MMP11 [37] and reduced MMP11 expression mediates the anti-angiogenic and invasion effect of microRNA miR-98 in ER− breast cancer cells [38]. However, the clinical and functional significance of MMP11 in HR/HER2+ breast cancer remains unclear. Here, our findings demonstrate for the first time the prognostic significance of MMP11 and CD2 expression in HR/HER2+ breast cancer and suggest that they are promising biomarkers or drug targets for this subtype of breast cancer. Further studies for validation will be required.

Generally, patients with ER breast cancer have a worse prognosis than those with ER+ breast cancer [39,40,41]. In contrast, there were no statistically significant differences in patient survivals between molecular subtypes during the entire follow-up period in our study. This discrepancy may be in part due to the chemotherapy effects on the subtypes. Most patients (86.3%) including TNBC patients (92.7%) of our study received adjuvant chemotherapy and our previous study [19] demonstrated that TNBC patients with chemotherapy had significantly longer DFS and OS than those without chemotherapy, whereas TNBC without chemotherapy showed a relatively worse prognosis. However, a significant population of ER breast cancer cases not receiving adjuvant chemotherapy has a good prognosis [34]. More accurately identifying these patients is important because this population may benefit from less aggressive therapy. Our data revealed a significant difference in DMFS between high-risk and low-risk groups as defined by our prognostic model, illustrating that our model can discriminate patients at low risk and high risk of distant metastasis in HR/HER2+ breast cancer. Therefore, our prognostic model may help to guide treatment for patients with HR/HER2+ breast cancer by identifying those with a good prognosis within this subtype.

Conclusions

In summary, we identified molecular subtype-specific novel prognostic genes in breast cancer and developed a novel prognostic model to predict the risk of distant metastasis for HR/HER2+ breast cancer based on the gene expression of MMP11 and CD2. Our prognostic model was superior to traditional clinicopathological factors in prognostic performance and may be used in identifying patients with good prognosis from this aggressive subtype of breast cancer. Consequently, the novel prognostic genes validated in this study may be used to develop assays to accurately predict the prognosis of these patients and thereby provide useful information for determining treatment options in patients with HR/HER2+ breast cancer.