Background

There were an estimated 80,470 new cases and 17,670 deaths as a result of bladder cancer (BC) in 2019, and BC is the ninth most common malignant tumor [1]. Nonmuscle-invasive bladder cancer (NMIBC) accounts for 75% of BCs, and 50% of NMIBC cases progress to muscle-invasive bladder cancer (MIBC) [2]. The main treatment for NMIBC is transurethral resection of bladder tumor (TURBT) followed by bladder irrigation, and the treatment strategy for MIBC is usually radical cystectomy combined with cisplatin chemotherapy [3]. The prognosis of patients with BC confined to the mucosa or submucosa is relatively good, and the 5-year survival rate is approximately 80%; however, the 5-year survival rate of BC patients with advanced metastasis is only 15%, and routine treatment has unsatisfactory effects [4, 5]. Therefore, it is essential to identify biomarkers that can reliably predict the prognosis of BC patients and to develop more effective targeted drugs to guide the treatment of BC.

An increasing number of studies have indicated that immune system disorders are closely related to tumorigenesis and development [6,7,8]. Therefore, immunotherapy has become a promising antitumor strategy in which the body’s own immune response is induced to recognize tumors as foreign antigens and inhibit the proliferation and metastasis of tumor cells by inducing active or passive immune effects [9, 10]. In the past few years, immunotherapy has changed the treatment of solid tumors, and numerous cancer patients have experienced durable responses and long-term survival benefits [11]. To date, Bacillus Calmette-Guerin (BCG) immunotherapy, the gold standard for high-risk NMIBC, has been the most successful; it induces inflammatory cell infiltration and cytokine production in the bladder mucosa, resulting in an immune response against tumor cells [12, 13]. For NMIBC patients with BCG failure, quadruple immunotherapy with BCG, interferon, interleukin-2 (IL-2) and granulocyte-macrophage colony-stimulating factor (GMCSF) has also demonstrated success [14]. However, side effects are very common with BCG, and more than 90% of patients have symptoms of cystitis [15, 16]. In addition, the in-depth study of immune checkpoint inhibitors (ICIs), such as inhibitors of programmed death-1 (PD-1), programmed death ligand-1 (PD-L1) and cytotoxic T lymphocyte antigen-4 (CTLA-4), has led to breakthroughs in immunotherapy [17]. In phase II clinical trials, neoadjuvant use of ICIs in patients with MIBC has shown pathological complete responses [18]. In summary, immunotherapy still has considerable potential in BC. In addition, tumor mutation burden (TMB), also defined as the total number of somatic coding errors, has been considered closely related to tumors [19]. Recent studies also confirmed that TMB was an essential biomarker to predict the effect of ICIs and immunotherapy in tumors, and the TMB level was significantly increased in responders [20,21,22].

In this study, we identified immune-related gene pairs (IRGPs) based on immune gene data downloaded from The Cancer Genome Atlas (TCGA) database. The IRGPs related to prognosis were selected to build a risk signature via least absolute shrinkage and selection operator (LASSO) Cox regression. A microarray dataset (GSE31684) obtained from the Gene Expression Omnibus (GEO) database was used to validate the accuracy of the risk signature (Fig. 1). The relationship between the immunological parameters (immune cell infiltration, immune checkpoints, tumor microenvironment (TME) and TMB) and the risk score was investigated. Finally, gene set enrichment analysis (GSEA) was used to explore the molecular mechanisms of the risk score.

Fig. 1
figure 1

Flow diagram of the current study. BC, bladder cancer; TCGA, The Cancer Genome Atlas; GEO, Gene Expression Omnibus; IRGs, immune-related genes; IRGPs, immune-related gene pairs; TME, tumor microenvironment; TMB, tumor mutation burden; GSEA, gene set enrichment analysis

Methods

Data acquisition

The mRNA transcriptome data, simple nucleotide variation and clinical information of patients with BC were downloaded from the TCGA database (TCGA-BLCA) (https://portal.gdc.cancer.gov/). The mRNA transcriptome data and clinical information were also obtained from GSE31684 (https://www.ncbi.nlm.nih.gov/geo/). After excluding normal samples, 411 patient samples in the TCGA database were analyzed to build a risk signature for evaluating prognosis, and 93 patient samples in GSE31684 were used to validate the signature. A list of immune-related genes (IRGs) was obtained from the Immunology Database and Analysis Portal (ImmPort) [23]. The clinical information of BC patients in the TCGA and GSE31684 databases is shown in Table 1. Then, the TMB of each sample could be calculated as the number of somatic mutations counted in the total length of exons [24]. Moreover, an independent cohort of patients with metastatic urothelial cancer (mUC) receiving PD-1 blockade therapy, as described in the IMvigor210 (mUC) trial, was also included to validate our signature (the mRNA and clinical data were obtained from the package “IMvigor” of R software). The clinical information of BC patients in the TCGA and GSE31684 databases is shown in Table 2. The need for ethical approval was waived because the data we used were obtained from public databases.

Table 1 Characteristics of the patients obtained from the TCGA database and GSE31684
Table 2 Characteristics of the patients obtained from the IMvigor210

IRGPs

We paired the IRGs in each sample and compared the expression between the two. If the expression of the first IRG was higher than the expression of the second IRG, the value of the IRGP was 1; otherwise, the value was 0 [25]. Then, the gene pairs whose ratio was 0 to 1 in less than 20% of samples were deleted to retain gene pairs that might be related to survival [26].

The risk signature

These IRGPs and the survival time were considered for further analysis. IRGPs significantly related to prognosis were identified via univariate Cox regression (P < 0.05). The risk signature was constructed via LASSO regression, and the number of variables included was reduced and overfitting was effectively avoided by constructing a penalty function. The penalty parameter (λ), a hyperparameter for the risk signature, was determined by tenfold cross-validation following the lowest partial likelihood deviance. The IRGPs included in the risk signature and the corresponding coefficients were obtained through the determined λ value. The risk score was calculated based on the coefficients. The appropriate cutoff value for dividing BC patients into a high-risk group and a low-risk group was determined via the TCGA database by receiver operating characteristic (ROC) curve analysis. The accuracy of the risk signature was also estimated via ROC curve analysis. Kaplan-Meier survival curves and the log-rank test were used to compare the overall survival (OS) between the high-risk group and the low-risk group. Then, univariate and multivariate Cox regression were performed to evaluate whether the risk score was an independent predictor of poor OS in BC patients. Subgroup analyses were performed to prove the robustness of the signature. The GSE31684 and IMvigor210 cohorts were used to validate the signature. In addition, the immune-related signatures from six other articles were compared with the current IRGP signature [27,28,29,30,31,32].

Immune parameters

To assess immune infiltration in different risk groups, cell type identification by estimating relative subsets of RNA transcripts (CIBERSORT) was used. CIBERSORT is a deconvolution algorithm that can predict the abundance of 22 immune cells, including naïve B cells, memory B cells, plasma cells, CD8 T cells, naïve CD4 T cells, and resting CD4 memory T cells, based on gene expression profiles (GEPs) [33,34,35]. The GEPs from the TCGA database were uploaded and used to analyze immune cell infiltration.

The relationship between the risk score and the expression of common immune checkpoints in BC was estimated, including PD-1, PD-L1, CTLA4, lymphocyte activating 3 (LAG3), B and T lymphocyte associated (BTLA) and hepatitis A virus cellular receptor 2 (HAVCR2).

The immune score (the infiltration level of immune cells), the stromal score (the infiltration of stromal cells) and tumor purity were calculated using the GEPs of TCGA database via Estimation of Stromal and Immune cells in Malignant Tumor tissues using Expression data (ESTIMATE) to explore the TME further [36, 37]. Then, we investigated the relationship between the risk score and the results of ESTIMATE. The impacts of the immune score and stromal score on the survival of all BC patients were also evaluated via the Kaplan–Meier method. Finally, we assessed whether there was a correlation between the risk score and the TMB.

The GSE31684 and publicly available “IMvigor210” datasets were used to perform the same analysis to determine the changes in immune parameters and risk score.

Gene set enrichment analysis (GSEA)

In the current study, we used GSEA to explore the molecular mechanisms underlying the risk score. The gene sets of “c2.cp.kegg.v7.1.symbols.gmt” and “c5.go.v7.2.symbols.gmt” from the Molecular Signatures Database (MSigDB) were downloaded for further analysis. The phenotype labels were high-risk group and low-risk group. Normalized enrichment scores (NESs), the nominal P value (NOM P value) and the false discovery rate Q value (FDR Q value) were acquired. NOM P value < 0.05 and FDR Q value < 0.25 were considered to indicate significant enrichment.

Statistical analysis

All statistical analyses were performed using R 4.03 software. The log-rank test and univariate and multivariate Cox regression were completed via the survival package of R. LASSO Cox regression was performed via the glmnet package of R. All ROC curves were generated via the survival ROC package of R. The nomogram and the calibration were illustrated via the rms package of R. ESTIMATE analysis was achieved via the estimate package of R. Fisher’s exact tests were used to estimate differences in clinical variables between the groups in the TCGA database. Pearson test was used for all correlation analysis.

Results

Construction of the IRGP signature

After filtering out unqualified IRGPs, 652 IRGPs remained. Then, 104 IRGPs significantly related to prognosis were identified via univariate Cox regression (P < 0.05) and used as candidate pairs for signature building. The risk score was calculated based on the coefficients of the selected IRGPs obtained from LASSO Cox regression (the optimal λ was 0.03253915) (Fig. 2A-B). Ultimately, 30 IRGPs were included in the risk signature (Table 3, Fig. 3A-C). Through ROC curve analysis, the cutoff value that divided BC patients into the high-risk group and low-risk group was determined to be 0.538 (Fig. 3D). The ROC curve results showed a moderate prognostic power of the risk score [area under the curve (AUC) at 1 year = 0.758, AUC at 3 years = 0.800, AUC at 5 years = 0.801] (Fig. 3E). The results of Kaplan-Meier survival analyses and Fisher’s exact tests revealed that a high risk score was significantly correlated with advanced age (P = 0.021), advanced clinical stage (P < 0.001), high T classification (P = 0.007), high N classification (P = 0.002), high M classification (P < 0.001) and poor OS (P < 0.001) (Table 4, Fig. 3F). Univariate Cox regression showed that T stage [hazard ratio (HR) = 2.408, 95% confidence interval (CI) = 1.215–4.771, P = 0.012], N stage (HR = 2.185, 95% CI = 1.303–3.662, P = 0.003), clinical stage (HR = 2.501, 95% CI = 1.184–5.284, P = 0.016) and risk score (HR = 6.221, 95% CI = 3.690–10.487, P < 0.001) were closely related to poor prognosis in BC (Fig. 3G). We then performed multivariate Cox regression, which identified only the risk score as associated with prognosis (HR = 6.953, 95% CI = 3.964–12.198, P < 0.001) (Fig. 3H). The nomogram could predict the survival probability of 1-year, 3-year and 5-year OS (Fig. 4A). The calibration curve revealed the accuracy of the prediction using the nomogram (Fig. 4B-D). To validate that the risk signature performed similarly for other datasets, the independent cohort from the GSE31684 dataset was employed for external validation. We divided BC patients from GSE31684 into a high-risk group and a low-risk group (Fig. 5A-C). The ROC curve results showed a moderate prognostic power for BC patients of the risk score (AUC at 1 year = 0.753, AUC at 3 years = 0.675, AUC at 5 years = 0.608) (Fig. 5D). Kaplan-Meier curves revealed that the prognosis of the high-risk group was worse than that of the low-risk group (P < 0.001) (Fig. 5E). The Cox regression results indicated that the risk score was an independent predictor of poor OS in BC (Fig. 5F-G). The nomogram could predict the survival probability of 1-year, 3-year and 5-year OS (Fig. 6A). The calibration curve revealed the accuracy of the prediction using the nomogram (Fig. 6B-D). Another cohort from the IMvigor210 dataset was also subjected to the same analysis to validate the signature (Fig. 7A-G). A series of subgroup analyses performed on data from the TCGA (Fig. 8A-F), GSE31684 (Supplementary Fig. S1A-F) and IMvigor210 (Supplementary Fig. S2A-D) indicated that the risk signature was robust.

Fig. 2
figure 2

Construction of the IRGP signature by LASSO regression analysis. LASSO coefficient profiles of the included genes in TCGA-BLCA (A). Selection of the optimal parameter (λ) in the LASSO model for TCGA-BLCA (B). LASSO, least absolute shrinkage and selection operator; BLCA, bladder cancer; IRGP, immune-related gene pair

Table 3 Information on the 30 selected IRGPs
Fig. 3
figure 3

Characteristics of the IRGP signature using the TCGA. The score of included IRGPs in different groups (A). The categorization of BC patients into different groups (B). The survival status of patients in the high-risk group and low-risk group (C). Through ROC curve analysis, the cutoff value for dividing BC patients into the high-risk group and the low-risk group was determined to be 0.538 (D). The AUC of the ROC curve results showed a moderate prognostic power of the risk score (E). The results of Kaplan-Meier survival analysis revealed that a high-risk score was significantly related to poor OS (F). Univariate and multivariate Cox regression analyses demonstrated that the risk score was an independent prognostic factor (G-H). TCGA, The Cancer Genome Atlas; LASSO, least absolute shrinkage and selection operator; BLCA, bladder cancer; BC, bladder cancer; IRGPs, immune-related gene pairs; AUC, area under curve; ROC, receiver operating characteristic; OS, overall survival

Table 4 Differences in the characteristics of BC patients between the high risk and low risk in TCGA
Fig. 4
figure 4

Nomogram (TCGA). The nomogram could predict the probability of 1-year, 3-year and 5-year OS (A). The calibration curve revealed the accuracy of the nomogram for predicting 1-year (B), 3-year (C) and 5-year (D) OS. TCGA, The Cancer Genome Atlas; OS, overall survival

Fig. 5
figure 5

Validation of the IRGP signature using GSE31684. The score of included IRGPs in different groups (A). The categorization of BC patients into different groups (B). The survival status of patients in the high-risk group and low-risk group (C). The AUC of the ROC curve results showed a moderate prognostic power of the risk score (D). The results of Kaplan-Meier survival analysis revealed that a high-risk score was significantly related to poor OS (E). Univariate and multivariate Cox regression analyses demonstrated that the risk score was an independent prognostic factor (F-G). BC, bladder cancer; IRGPs, immune-related gene pairs; AUC, area under the curve; ROC, receiver operating characteristic; OS, overall survival

Fig. 6
figure 6

Nomogram (GEO). The nomogram could predict the survival probability of 1-year, 3-year and 5-year OS (A). The calibration curve revealed the accuracy of the nomogram for predicting 1-year (B), 3-year (C) and 5-year (D) OS. GEO, Gene Expression Omnibus; OS, overall survival

Fig. 7
figure 7

Validation of the IRGP signature using IMvigor210. The scores of the included IRGPs in different groups (A). The categorization of BC patients into different groups (B). The survival status of patients in the high-risk group and low-risk group (C). The AUC of the ROC curve results showed that the risk score had moderate prognostic power (D). Kaplan-Meier survival analysis revealed that a high risk score was significantly related to poor OS (E). Univariate and multivariate Cox regression analyses demonstrated that the risk score was an independent prognostic factor (F-G). BC, bladder cancer; IRGPs, immune-related gene pairs; AUC, area under the curve; ROC, receiver operating characteristic; OS, overall survival

Fig. 8
figure 8

Subgroup analyses (TCGA). Subgroup analyses were performed based on age (A-B), clinical stage (C-D) and T stage (E-F) to confirm the robustness of the risk signature. TCGA, The Cancer Genome Atlas

A comparison of the current model with the previous model by using data from the TCGA (Fig. 9A-C), GEO (Fig. 9D-F) and IMvigor210 (Fig. 9G), revealed that the model had acceptable accuracy, with the largest AUC value obtained with data from the TCGA.

Fig. 9
figure 9

Comparison of signatures. AUC for multiple signatures at 1 year (A), 3 years (B), and 5 years (C) using TCGA datasets via ROC curves. AUC for multiple signatures at 1 year (D), 3 years (E), and 5 years (F) using GEO datasets via ROC curves. AUC for multiple signatures at 1 year (G) using IMvigor210 datasets via ROC curves. AUC, area under the curve; ROC, receiver operating characteristic; TCGA, The Cancer Genome Atlas; GEO, Gene Expression Omnibus; IRGPs, immune-related gene pairs; IRGs, immune-related gene; IRLncRNA, immune-related long non-coding RNA

Evaluation of immune parameters

CIBERSORT was used to evaluate immune cell infiltration in different risk groups. The results were visualized by radar plot. Memory B cell memory, resting memory CD4 T cells, eosinophils, plasma cells, CD8 T cells, activated memory CD4 T cells, follicular helper T cells, M0 macrophages, M1 macrophages and M2 macrophages were differentially enriched in the different risk groups. The levels of memory B cells (P = 0.040), plasma cells (P = 0.006), M1 macrophages (P < 0.001), CD8 T cells (P < 0.001), activated memory CD4 T cells (P < 0.001) and follicular helper T cells (P < 0.001) were higher in the low-risk group than in the high-risk group. The levels of M0 macrophages (P < 0.001), M2 macrophages (P < 0.001), eosinophils (P = 0.031) and resting memory CD4 T cells (P = 0.022) were higher in the high-risk group than in the low-risk group (Fig. 10A). The other two datasets, GSE31684 (Fig. 10B) and IMvigor210 (Fig. 10C), were used to verify the related changes in immune cells in the TCGA database. The results obtained from the three datasets were mostly consistent.

Fig. 10
figure 10

Immune cells and risk scores. In the TCGA, the levels of memory B cells, plasma cells, M1 macrophages, CD8 T cells, activated CD4 memory T cells and follicular helper T cells were higher in the low-risk group than in the high-risk group, and the levels of M0 macrophages, M2 macrophages, eosinophils and resting CD4 memory T cells were higher in the high-risk group than in the low-risk group (A). The other two databases, GSE31684 (B) and IMvigor210 (C), were used to verify the related changes in immune cells in the TCGA database. The results obtained from the three datasets were mostly consistent. TCGA, The Cancer Genome Atlas; *, P < 0.05; **, P < 0.01; **, P < 0.001

Then, the relationship between the risk score and the expression of common immune checkpoints in BC was explored. We found that the expression levels of PD-1 (correlation coefficient = − 0.19, P < 0.001) (Fig. 11A), CTLA4 (correlation coefficient = − 0.20, P < 0.001) (Fig. 11B), and LAG3 (correlation coefficient = − 0.18, P < 0.001) (Fig. 11C) were significantly negatively correlated with the risk score. However, there was no significant difference between the risk score and other immune checkpoints, including PD-L1 (correlation coefficient = − 0.098, P = 0.051) (Fig. 11D), BTLA (correlation coefficient = − 0.74, P = 0.140) (Fig. 11E), and HAVCR2 (correlation coefficient = − 0.043, P = 0.390) (Fig. 11F). We also confirmed that the stromal score was significantly positively correlated with the risk score (correlation coefficient = 0.31, P < 0.001) (Fig. 11G). However, there was no significant correlation between the immune score and tumor purity (correlation coefficient = − 0.046, P = 0.370) (Fig. 11H) or risk score (correlation coefficient = − 0.094, P = 0.061) (Fig. 11I). The related immune changes were also observed using GSE31684 (Supplementary Fig. S3A-I) and IMvigor210 (Supplementary Fig. S4A-I). The results obtained from the three datasets were mostly consistent.

Fig. 11
figure 11

Immunological parameters and risk scores (TCGA). The expression levels of PD-1 (A), CTLA4 (B), and LAG3 (C) were significantly negatively correlated with the risk score. However, there was no significant difference between the risk score and other immune checkpoints, including PD-L1 (D), BTLA (E), and HAVCR2 (F). The stromal score (G) was significantly positively correlated with the risk score. There was no significant correlation between the immune score (H), tumor purity (I) and risk score. TCGA, The Cancer Genome Atlas; PD-1, programmed death-1; CTLA-4, cytotoxic T lymphocyte antigen-4; LAG3, lymphocyte activating 3; PD-L1, programmed death ligand-1; BTLA, B and T lymphocyte associated; HAVCR2, hepatitis A virus cellular receptor 2; BC, bladder cancer

A high stromal score (P = 0.032) (Fig. 12A) and a low immune score indicated a poor prognosis in BC (P = 0.022) (Fig. 12B). We performed the same survival analysis using GSE31684 (Fig. 12C-D) and IMvigor210 (Fig. 12E-F). The results obtained from the three datasets revealed that a high immune score was closely correlated with a good prognosis.

Fig. 12
figure 12

Immune microenvironment and prognosis. A high stromal score (A) and a low immune score (B) indicated a poor prognosis in the TCGA. We performed the same survival analysis using GSE31684 (C-D) and IMvigor210 (E-F). The results obtained from the three datasets revealed that a high immune score was closely related to a good prognosis. TCGA, The Cancer Genome Atlas

In the TCGA, the level of TMB was significantly negatively correlated with the risk score (correlation coefficient = − 0.11, P = 0.026) (Fig. 13A), and an increased level of TMB correlated with improved OS (P < 0.001) (Fig. 13B). These results were consistent with the results obtained from IMvigor210 (Fig. 13C-D). Moreover, in the IMvigor210 cohort, low-risk patients had more significant immunotherapy effects (PD-1 blockade therapy) (Fig. 13E).

Fig. 13
figure 13

TMB. In the TCGA, the level of TMB was significantly negatively correlated with the risk score (A), and an increased level of TMB correlated with improved OS (P < 0.001) (B). These results were consistent with the results obtained from IMvigor210 (C-D). Moreover, in the IMvigor210 cohort, low-risk patients had more significant immunotherapy effects (PD-1 blockade therapy) (E). TMB, tumor mutation burden; TCGA, The Cancer Genome Atlas; OS, overall survival; CR, complete response, PR, partial response, SD, stable disease; PD, progressive disease

GSEA

We used GSEA to explore the molecular mechanisms underlying the risk score. The Gene Ontology (GO) results, as shown in Table 5 and Fig. 14A, revealed the most significant signaling pathways enriched in the high-risk phenotype. The GO results, as shown in Table 6 and Fig. 14B, revealed the most significant signaling pathways enriched in the low-risk phenotype. The Kyoto Encyclopedia of Genes and Genomes (KEGG) results, as shown in Table 7 and Fig. 14C, revealed the most significant signaling pathways enriched in the high-risk phenotype. The KEGG results, as shown in Table 8 and Fig. 14D, revealed the most significant signaling pathways enriched in the low-risk phenotype [38].

Table 5 Gene sets enriched in the high risk phenotype via GO
Fig. 14
figure 14

GSEA. Gene sets enriched in the high-risk phenotype via GO (A). Gene sets enriched in the low-risk phenotype via GO (B). Gene sets enriched in the high-risk phenotype via KEGG (C). Gene sets enriched in the low-risk phenotype via KEGG (D). GSEA, gene set enrichment analysis; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; NES, normalized enrichment scores; NOM P value, nominal P value; FDR Q value, false discovery rate Q value

Table 6 Gene sets enriched in the low risk phenotype via GO
Table 7 Gene sets enriched in the high risk phenotype via KEGG
Table 8 Gene sets enriched in low risk phenotype via KEGG

Discussion

BC is the most common malignant tumor of the urinary system and has complex biological behavior, a high recurrence rate and a high metastasis rate [39]. Among the BC treatments being studied, immunotherapy seems to be the most promising [40]. In 1990, BCG was approved for immunotherapy for BC and achieved great success, but it should be recognized that approximately 40% of BC patients have no response to BCG, and even 15% of BCs progress to MIBC after treatment [41, 42]. In recent years, new findings have suggested that tumor cells can escape the immune response by affecting immune checkpoints [43, 44]. Therefore, research on ICIs to prevent immune escape is receiving much attention at present [45,46,47]. Five ICIs, pembrolizumab, nivolumab, atezolizumab, durvalumab and avelumab, have been approved by the Food and Drug Administration (FDA) for the treatment of advanced and metastatic BC [48]. One study indicated that in PD-L1-positive BC patients, durvalumab showed controlled safety and meaningful clinical activity [49]. In summary, the immunology of BC is worthy of further exploration. Importantly, in this study, we constructed a prognostic risk signature by using IRGPs, which is significant for furthering the understanding of the immune response in BC.

In general, GEPs identified from large public databases can be used to build risk signatures. However, there are many deficiencies in traditional construction schemes. The overfitting of a small sample training set and lack of sufficient verification can reduce the accuracy of statistics [50]. In some public databases, such as TCGA, the number of tumor samples is far greater than that of normal samples, and paired data are scarce [51]. If a model is constructed by screening differentially expressed genes (tumor vs normal), its robustness is doubtful. This problem can be solved by jointly considering GEPs from multiple databases. Unfortunately, the data from multiple platforms are difficult to standardize because of biological heterogeneity and technical biases [52]. Hence, we built our risk signature using IRGPs, which were identified based on the relative ranking and pairwise comparison of gene expression within the same patient, thus overcoming the batch effects encountered when data from different platforms are analyzed [25, 53]. Additionally, this new method also avoids issues related to an imbalance between the numbers of tumor samples and normal samples. Some tumor studies have shown convincing results using this method [54, 55].

Our risk signature was constructed with 30 IRGPs consisting of 28 IRGs. A high risk score independently predicted poor prognosis in BC patients. CD14 was among the 28 IRGs, and BC cells with high CD14 expression have been shown to produce tumor-promoting inflammation and promote tumor cell proliferation [56]. Joint blockade of complement C5a receptor 1 (C5AR1) and PD-1 prevented lung cancer metastasis and improved the prognosis of patients [57]. Overexpression of Dickkopf WNT signaling pathway inhibitor 1 (DKK1) has been shown to be related to poor OS in patients with BC [58]. An increased level of serum interleukin 18 (IL18) was found in patients with BC, which might be the result of the patients’ immune systems fighting to inhibit the growth of tumor cells [59]. Upregulated matrix metallopeptidase 9 (MMP9) is closely related to the metastasis of BC [60, 61]. In another study, the knockdown of pentraxin 3 (PTX3) activated the proliferation of BC cells and enhanced the metabolism of tumor cells [62]. In this study, CIBERSORT was used to evaluate immune cell infiltration in the different risk groups. The levels of memory B cells, plasma cells, M1 macrophages, CD8 T cells, activated memory CD4 T cells and follicular helper T cells were higher in the low-risk group than in the high-risk group. The levels of M0 macrophages, M2 macrophages, eosinophils and resting memory CD4 T cells were higher in the high-risk group than in the low-risk group. Recent studies have indicated that high infiltration levels of CD8 T cells and CD4 T cells can exert anti-BC effects [63, 64]. High levels of M2 macrophages were significantly associated with poor prognosis in patients with BC, and metastasis of BC cells was inhibited by inducing M1 macrophage polarization [65]. In the low-risk group, the main effector immune cell infiltration level was increased, implying a stronger immune response, which may be the reason for the better prognosis of the low-risk group.

The level of TMB and the expression of immune checkpoints (PD-1, CTLA4 and LAG3) were both significantly negatively correlated with the risk score, suggesting that in the low-risk group, with the immune response enhanced, the expression of immune checkpoints was also increased, but fortunately, the immune response was activated more than suppressed, and the response to immunotherapy or ICIs could be more effective.

ESTIMATE was performed to analyze the association between the TME and the risk score. The TME, the cellular environment of tumor cells, is mainly composed of immune cells, stromal cells, extracellular matrix (ECM), small organelles and secreted proteins [66]. The results of our research demonstrated that the infiltration level of stromal cells was upregulated in the high-risk group, but there was no correlation between the immune score, tumor purity and risk score. Due to many different subtypes of immune cells, although some infiltrating immune cells in the high-risk group could not produce immune effects, they were clustered in both the high-risk group and the low-risk group via CIBERSORT analysis, and it was possible that there was no difference in the total immune score between the two groups. Additionally, low immune scores and the level of TMB were associated with poor OS in patients with BC. Therefore, we believe that immunotherapy is effective for patients with BC.

We did the same analysis with the GSE31684 dataset. Unfortunately, the results from the TCGA data were only partially observed in the results of the analysis of the GSE31684 dataset. There might be many reasons for this discrepancy. First, the changes in the TME and immune checkpoints were quite complicated. Second, the sample size of each BC data cohort in the GEO database was smaller than that in the cohorts in the TCGA database, and no mutation data were included. Third, the sequencing methods and data normalization used for each data cohort in the GEO database were not as advanced and rigorous as those used for the TCGA data cohorts. Given the deficiencies of the GEO data, the IMvigor210 dataset was also used for analysis, and we observed that more immune-related changes were found in the TCGA data than in the IMvigor210 data, but some results were still inconsistent. This discrepancy may be because the included samples were all advanced metastatic BC. Despite the limitations of the validated cohorts, we still found that many immune-related changes were consistent across the three cohorts (TCGA, GEO and IMvigor210), some immune cells (including M0 macrophages, M1 macrophages, M2 macrophages, activated memory CD4 T cells and resting memory CD4 T cells) showed the same changes. Moreover, TMB and some immune checkpoints also showed the same changes in the IMvigor210 data (there are no mutation data in GSE31684), suggesting that immunotherapy can achieve significant benefits in low-risk BC patients.

Finally, the molecular mechanisms underlying the risk score were explored via GSEA. The GO results showed that gene sets related to the ECM, stromal cells (chondrocytes, endothelial cells and fibroblasts) and epithelial-mesenchymal transition (EMT) were enriched in the high-risk group. Focal adhesion and ECM-receptor interaction, all connected with ECM, were the top two significant enrichment pathways in the high-risk group via KEGG. The enrichment of stromal cells in the high-risk group was consistent with the ESTIMATE results. However, we think it was valuable to discover that ECM, which constitutes scaffolds of tissues and organs, was enriched in the high-risk group [67]. The ECM is one of the most abundant components in the TME, and as the key to maintaining tissue homeostasis, the ECM is a dynamic environment, and ECM disorder can promote tumor occurrence, progression, and metastasis by inducing EMT [68,69,70,71,72]. The literature has shown that focal adhesion-related molecules, such as focal adhesion kinases (FAKs), play a vital role in EMT and upregulate the metastatic capacity of tumor cells in BC [73,74,75]. We also found that a high risk score was related to advanced M stage (metastasis). In the low-risk group, we found that there were a number of immune-related pathways in the enriched pathways and biofunctions via GO and KEGG, such as T cell receptor complex, immunoglobulin production, CD4-positive or CD8-positive alpha-beta T cell lineage commitment, primary immunodeficiency, intestinal immune network for IgA production and RIG I-like receptor signaling pathway, which might imply the activation of immune responses in the low-risk group.

However, the limitations of our study should be acknowledged. First, our study was a retrospective analysis, and the results need to be verified by a prospective cohort study. Second, the specific mechanism of the immunological parameters changing with the risk score was not studied in depth. Third, our analysis of the biofunctions underlying the risk score was not verified by in vitro or in vivo experiments.

Although our research had some limitations, the IRGP risk signature that we constructed for BC could predict the prognosis of patients, and use of this signature will be helpful for individualized treatment decisions, clinical decision-making and evaluation of the benefits of immunotherapy. In addition, the relevant genes included in the risk signature could also be used for further research to identify new therapeutic targets for BC.

Conclusions

In this study, we used a new tool, IRGPs, to build a risk signature to predict the prognosis of BC. By evaluating immune parameters and molecular mechanisms, we gained further understanding of the mechanisms underlying the risk signature. The signature could also be used as a tool to predict the effect of immunotherapy in patients with BC.