Background

Globally, breast cancer (BC) is the most frequently common carcinoma in women [1, 2]. In light of the statistics of the American Cancer Society, approximately 279,100 new BC cases and 42,690 cancer deaths are estimated to occur in the United States in 2020 [3]. Although various therapeutic strategies are currently being applied to improve BC prognosis, including surgery, chemotherapy, radiotherapy, endocrine therapy and targeted therapy, many BC patients still have poor survival outcomes because of recurrence [4]. Survival prognosis prediction is mainly based on the American Joint Committee on Cancer (AJCC) tumor-node-metastasis (TNM) staging system. However, it should be noted that BC patients at the same TNM stage may have distinct survival outcomes [5]. This highlights that the TNM staging system is inadequate to achieve accurate prognosis evaluation and that it is unable to reflect the biological heterogeneity of BC. For example, the system is mainly based on clinicopathological parameters rather than on molecular signatures [6, 7]. Increasing evidence has revealed that molecular biomarkers have the potential to improve prognostic assessment and identify high-risk cancer patients [8,9,10,11,12,13,14]. Accordingly, there is an urgent need to screen effective molecular biomarkers for improving survival prognosis prediction and identify high-risk BC patients with tumor recurrence.

With advancements in transcriptomics, long noncoding RNAs (lncRNAs) have captured considerable attention in human cancers in the past decade [15,16,17,18,19]. LncRNAs are a class of mRNA-like transcripts with a length of > 200 nucleotides that lack protein-coding ability. Accumulating evidence indicates that lncRNAs play important roles in various biological processes, including transcriptional modifications, cell proliferation, differentiation, epigenetic modulation, and immune system-modulated pathways [20,21,22,23,24,25]. For example, lncRNA SNHG1 enhances the differentiation of Treg cells to provoke immune escape in BC [26]. LncRNA AC025580.2 has a positive influence on immune escape in pancreatic cancer [27]. In addition, a large number of lncRNAs have been confirmed to act as diagnostic and prognostic biomarkers or potential therapeutic targets in multiple tumors [11, 28,29,30,31,32]. However, the role of immune-related lncRNAs (IRLs) in the prognostic evaluation of BC remains unclear.

Hence, the objective of this study was to establish and validate a novel IRL model to predict the 5-year RFS of BC patients. This tool is able to improve the predictive accuracy of the TNM staging system and identify high-risk BC patients with tumor recurrence to facilitate optimal therapeutic schemes.

Methods

Patients and study design

Gene and lncRNA expression profiles as well as corresponding BC cases information were acquired from The Cancer Genome Atlas (TCGA) database. The inclusion criteria were as follows: (1) pathologically confirmed invasive BC, (2) complete follow-up data, including RFS status and survival time, and (3) survival time > 1 month. The following clinical parameters for each patient were collected: age, estrogen receptor (ER) status, progesterone receptor (PR) status, T stage, N stage, TNM stage, human epithelial growth factor receptor 2 (HER2) status, tumor subtype, RFS, and survival time. The Research Ethics Committee of Guangdong Provincial People’s Hospital (GDPH) approved our study. As the study used data from the database TCGA, written informed consent was waived.

Establishment of IRL risk score and the IRL-based model

First, immune-related genes (IRGs) were retrieved from the Molecular Signatures Database (MSDB) v4.0 (immune response M19817, and immune system process M13664) [33, 34]. According to the GENCODE project, lncRNA expression data were obtained from the gene expression profile [35]. Next, IRLs were defined based on association analysis between the mRNA expression level and lncRNA expression data (|R|> 0.7, P < 0.05). Subsequently, univariate Cox proportional hazards regression analysis (CPHRA) was conducted to explore the association between the IRL expression level and patient RFS. Multivariate CPHRA was carried out to investigate independent prognostic IRLs. Regarding multivariate analysis, the collinearity diagnosis was confirmed via the variance inflation factor (VIF), and the final independent prognostic IRLs were selected to develop the IRL-based risk score, as follows: IRL risk score = sum of coefficients × expression level of IRLs. Finally, a novel IRL-based model incorporating the IRL signature and clinical variables was constructed to improve the prediction power of TNM staging in BC patients.

Assessment of the IRL-based nomogram

The area under the curve (AUC) of the time-dependent receiver operating characteristic (ROC) curve for 5-year RFS was calculated to evaluate the performance of the IRL-based model. The nomogram score for each patient was allocated on the basis of the IRL-based nomogram. Patients were divided into low- and high-risk nomogram score groups using the median score as the cutoff point. Subgroup analysis was applied to assess the risk stratification ability of the IRL-based model. A calibration curve was drawn to estimate the calibration ability of the IRL-based nomogram. Decision curve analysis (DCA) was used to weigh the clinical practicability of the IRL-based nomogram [36,37,38,39,40,41].

Gene enrichment analysis of the IRL signature

Gene enrichment analysis of the four IRLs was applied using Metascape, a free online method for gene annotation [42]. Functional pathway enrichment was assessed on the basis of co-expressing genes of the four IRLs in the same module.

Statistical analysis

The χ2 test, Fisher's exact test or Mann–Whitney U test was employed to explore differences in indicators. The optimal cut-off values of IRL-based nomogram scores were calculated via X-tile software, version 3.6.1 (Yale University, New Haven, CT, USA). Kaplan–Meier survival analysis and the log-rank test were used to examine the 5-year RFS difference between low- and high-risk groups. All statistical analyses were utilized via R software (www.r-project.org, version 3.6.1) and Stata/MP, version 14.0 (StataCorp LP, College Station, TX). A P value < 0.05 was defined as significant.

Results

Baseline characteristics

In total, 911 BC samples were collected as the primary cohort in our study. As in our previous studies, each patient can be assigned a computer-generated allocation number (0 or 1) based on Stata software, and two cohorts were differentiated from the primary cohort according to the allocation number. Of the total, 456 patients (allocation number = 0) were considered as the validation cohort [4, 5]. The baseline characteristics of all patients in the two datasets are shown in Table 1. There were no significant differences between the variables examined in the two independent cohorts (all P > 0.05). The median age of the included patients was 58 years (IQR: 48–66) and 58 years (IQR: 48–65) in the two datasets.

Table 1 Baseline characteristics of the included patients

Identification of the IRL signature

First, 332 IRGs were collected from Molecular Signatures Database (MSDB), and the expression profiles of the 332 IRGs were downloaded from TCGA. We defined a lncRNA with (|R|> 0.7 and P < 0.05) expression levels between the lncRNAs and genes as IRLs. Thus, on the basis of Pearson correlation analysis, 52 lncRNAs were confirmed as IRLs (|R|> 0.7 and P < 0.05). After univariate CPHRA, 4 IRLs were identified in the primary cohort (P < 0.05). Subsequent multivariate CPHRA further screened four IRLs (A1BG-AS1, AC004477.3, AC004585.1 and AC004854.2) as independent prognostic factors in the primary cohort. The VIFs ranged from 1.01 to 1.03, indicating no collinearity among the predictors. The detailed characteristics of the four IRLs in the training dataset are listed in Table 2.

Table 2 The four-IRLs significantly associated with 5-year RFS in the primary cohort

Development of the IRL-based risk score and prognostic nomogram

An IRL-based risk score formula was built on the basis of IRL expression levels and their estimated regression coefficients in the multivariate CPHRA, where IRL-based risk score = −0.451 × expressionA1BG-AS1 + 0.294 × expressionAC004477.3–0.450 × expressionAC004585.1 + 0.627 × expressionAC004854.2. Based on univariate and multivariate CPHRA (Table 3), three prognostic variables (IRL-based risk score, tumor subtype, and TNM stage) were selected as independent indicators to build an IRL-based model. The novel IRL-based nomogram is illustrated in Fig. 1.

Table 3 Univariate and multivariate analyses in the primary cohort
Fig. 1
figure 1

The immune-related lncRNAs model to predict 5-year RFS in breast cancer patients. 1 HR+/HER2-, 2 HR+/HER2+, 3 HR-/HER2+, 4 TNBC, and 5 unknown

Evaluation of the IRL-based prognostic nomogram

The AUCs of the IRL-based nomogram were 0.728 (95% CI: 0.658–0.797) and 0.751 (95% CI: 0.656–0.846) in the primary and validation datasets, respectively, showing that this model had good predictive performance (Fig. 2a). Calibration plots revealed ideal agreement between the IRL-based nomogram-predicted likelihoods and the actual observations of 5-year RFS, indicating that this tool had high calibration ability (Fig. 2b). In addition, compared with the predictive accuracy between the IRL-based model and TMN stage, the time-dependent ROC curve suggested that the IRL-based nomogram outperformed the TNM stage for both the primary and validation cohorts (P < 0.05, Fig. 3a, b). Moreover, DCA indicated that the IRL-based tool added more net benefit than did the TNM stage for the primary and validation datasets; therefore, this nomogram showed superior clinical usefulness (Fig. 3c, d). On the basis of the IRL-based model, patients in the primary and validation cohorts were stratified into low- or high-risk groups using the median nomogram score as the cutoff point. The distribution characteristics of the IRL-based model score, RFS, and RFS status are shown in Fig. 4, indicating that patients with lower nomogram scores had better 5-year RFS than those with higher nomogram scores (P < 0.001). Subgroup analyses demonstrated that the IRL-based model had good risk stratification ability for T1 (P < 0.0001), T2 (P < 0.0001), T3 (P = 0.0067), N0 (P = 0.029), N1 (P < 0.0001), N2 (P = 0.013), HR + /HER2- (P = 0.00033), HR-/HER2 + (P = 0.038), and TNBC (P < 0.0001) (Fig. 5).

Fig. 2
figure 2

a Time-dependent receiver operating characteristic (ROC) curves at 5-years based on the immune-related lncRNAs model in the primary cohort and validation cohort. b Calibration curves of the immune-related lncRNAs model in the primary cohort and validation cohort

Fig. 3
figure 3

Comparisons of the predictive accuracy between the immune-related lncRNAs model and TNM stage using time-dependent ROC curves in the primary cohort (a) and validation cohort (b). Comparisons of the clinical utility between the immune-related lncRNAs model and TNM stage using decision curve in the primary cohort (c) and validation cohort (d)

Fig. 4
figure 4

The distribution of immune-related lncRNAs model score, RFS, and RFS status in the primary cohort (a) and validation cohort (b). The dotted line indicates the optimal cutoff value of the model score to divide patients into the low- and high-risk set. Kaplan–Meier curves of the low- and high-risk patients based on the immune-related lncRNAs model in the primary cohort (c) and validation cohort (d)

Fig. 5
figure 5

Subgroup analysis of the immune-related lncRNAs model for breast cancer patients in T stage, N stage, and tumor subtype

Functional enrichment analysis of the IRL signature

To further explore the potential functional roles of the four IRLs, significantly associated IRGs (Pearson coefficient > 0.7 and P < 0.05) were included in pathway enrichment via Metascape. The top 20 highly enriched pathways are displayed in Fig. 6. The IRGs of the four IRLs clustered most significantly in lymphocyte activation, cytokine-mediated signaling pathway, regulation of cytokine production, negative regulation of immune system process, regulation of immune effector process, alpha–beta T cell activation, leukocyte migration, antigen receptor-mediated signaling pathway, B cell activation, TCR pathway, response to bacterium, T cell costimulation, regulation of peptidyl-tyrosine phosphorylation, calcium-mediated signaling, myeloid leukocyte activation, positive regulation of defense response, lymphocyte migration, interleukin-10 signaling, positive regulation of cytosolic calcium ion concentration, and regulation of interferon-gamma production categories. Moreover, according to LncRRIsearch, a web tool used for comprehensive prediction of lncRNA-mRNA interactions [43], the predicted targets of A1BG-AS1 are FKBP10, NFYC, UBR4, CEACAM19, MT-ND4, PKD1P6, and RNA28S5. The predicted target of AC004477.3 is SEH1L.

Fig. 6
figure 6

Gene enrichment analysis of the immune-related lncRNAs signature in the Metascape database. a The 20 enrichment terms. b The network of the enrichment terms

Discussion

With the increasing development of clinical management and comprehensive treatment of BC, prognosis of the survival of these patients has greatly improved. Nevertheless, many BC patients experience tumor recurrence [44]. High-throughput biological technologies have provided a platform for exploring the molecular characteristics of different tumors [45]. Many lncRNAs have been used to predict BC prognosis [44, 46, 47], though few IRL signatures have been identified as improving BC survival prediction. Consequently, it is necessary to single out prognostic IRLs integrating TNM stage to predict 5-year RFS of BC patients. In this study, with the combination of four IRLs, TNM stage and tumor subtype, we efficiently constructed and validated a novel IRL-based nomogram to accurately predict 5-year RFS in BC patients. In addition, time-dependent ROC analysis revealed that the IRL-based nomogram had more favorable prognostic accuracy than did TNM stage. Moreover, the DCA indicated that the IRL-based model had better clinical application than TNM stage; our IRL-based tool also exhibited good risk stratification ability to significantly categorize BC patients into high- and low-risk groups.

To date, several prognostic tools have been established to predict survival in BC patients [44, 46,47,48,49,50,51,52,53,54,55]. Yao et al. identified five lncRNAs as vital prognostic factors to evaluate BC prognosis [46]. However, these models ignored the prognostic value of IRLs for BC patients. Recent studies have suggested that the immune response plays a critical role in cancer progression and recurrence, and accumulating evidence has revealed that IRLs harbor vital predictive value for survival prognosis. For example, Zhang et al. found that si-lncMALAT1 both suppressed osteosarcoma progression and resulted in self-destruction of cells deep within the tumor. This would enable clearance of osteosarcoma cells from the body by the immune system. Therefore, lncMALAT1 was individually correlated with immune system activity and overall survival [56]. LncRNAs cannot encode proteins but do regulate gene expression at different levels, including through epigenetic, transcriptional or posttranscriptional regulation [57]. In the past decade, this new type of gene regulator, namely, lncRNA, has been associated with tumor development, progression, and prognosis, and many lncRNAs have been found to act as prognostic indicators in various tumors. Moreover, lncRNAs can regulate the immune microenvironment of BC [58]. IRLs can be applied to characterize the infiltration of immune cells in tumors. In our study, the IRL signature developed had a better prognostic value than the IRG signature. Thus, IRLs should be integrated into nomograms to improve prognosis prediction in BC. The survival prediction based on the TNM staging system is not yet satisfactory in clinical practice [59], though it is universally acknowledged that the TNM staging system is beneficial to achieve optimal treatment choice for BC patients. The TNM staging system is based on tumor size, lymph node status and metastasis [60]. Nonetheless, malignant behavior in BC is mainly determined by the molecular characteristics. Thus, incorporating the TNM staging system and molecular biomarkers is indispensable for developing an accurate tool for prognostic assessment in BC patients [61].

Some limitations should be noted in the present study. First, TCGA does not include postoperative treatment information (chemotherapy and hormone therapy). Therefore, we are unable to screen low-risk patients to tailor adjuvant treatment. Second, the molecular mechanism of the IRLs should be further explored by additional experiments. Third, our established model should be verified by prospective, large-scale multicenter datasets.

Conclusions

In summary, we identified IRLs signatures significantly associated with the 5-year RFS of BC and efficiently constructed a novel prognostic model for 5-year RFS prediction by incorporating four IRLs, TNM stage and tumor subtype. Our model can improve the predictive power of the AJCC TNM staging system and confirm high-risk patients with tumor recurrence to receive appropriate treatment.