Introduction

Breast cancer (BC) is one of the most common cancers around the world and was estimated the most frequent cancer among women (25% of all new cancers recorded)1. The heterogeneity of BC reduces the specificity of biological features (e.g., histological grade and hormone receptor status) which are usually utilized for the diagnosis and prognosis of BC and to address a therapy2,3. The classification of biological BC subtypes is based on the use of techniques such as immunohistochemistry and gene expression profiling4.

In 2011 The St. Gallen International Breast Cancer Conference reported a molecular subtype approach to guide the therapy of BC based on immunohistochemical markers: estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2)4. In addition to the detection of these standard biomarkers, St. Gallen in 2013 included the evaluation of a marker of cell proliferation: Ki-675. Luminal A is defined by ER positive and/or PR positive and Ki-67 < 14%, and luminal B by ER positive and/or PR positive and Ki-67 ≥ 14%. ER negative, PR negative and Her2 positive tumors are classified as HER2 + 6. Triple negative BC (TNBC) are characterized by ER negative and PR negative and Her2 negative6.

The development of gene expression profiling with microarray demonstrated that the classification based on gene expression profiling reflects the differences of BC subtypes at the molecular level3. The pioneer study of Perou et al. in 2000 reported that BC could be classified into four intrinsic molecular subtypes by gene expression profiling: luminal A, luminal B, HER2-enriched (HER2), and basal7,8. Gene expression classifi-cation defines TNBC of immunohistochemistry with term basal BC. However, previous studies reported that there is a concordance of 80% between TNBC and basal BC9. Unlike the TNBC subtype, basal BC is characterized by the expression of other proteins, such as cytokeratins 5,6 and 1710.

BC molecular subtypes can be detected by different genetic tests with a different gene signature (e.g., PAM50, MammaPrint, and Oncotype DX). Several studies, applied to publicly available gene expression datasets, demonstrated a poor reproducibility among different genetic tests. This can be explained by the differences of gene signature in different genetic tests11,12. These observations forced the research towards the discovery of new biomarkers to be used for BC subtype characterization.

Luminal A is the most common BC subtype with a higher favorable prognosis and a slower evolution13. Luminal B subtype is characterized by an intermediate prognosis compared with luminal A and HER2 BC and an increased expression of genes associated with growth receptor signaling14. HER2 BC frequently tend to metastasize in the brain, liver and lung. In addition, the overexpression of HER2 is implicated in the cell proliferation, blocking apoptosis and cell spreading15. Basal BC subtype has a worse prognosis compared with other subtypes and high cell proliferation. Non-luminal tumors form metastases into distant organs more frequently than luminal tumors, but surprisingly luminal A and basal subtypes develop the regional lymph node metastases less often16,17. The luminal A is well differentiated compared to luminal B, HER2 and basal that are poorly differentiated17.

Previous studies reported that the evolution from normal breast cell types to BC subtypes derives from mutations or genetic rearrangements in stem cells and progenitor cells giving rise to a heterogeneous population of cells18.

New more accurate methods are needed to increase prognostic value and to personalize the most appropriate treatment for patients with BC and to investigate the molecular mechanisms responsible of BC subtypes differentiation. In the recent years Long Non-Coding RNAs (lncRNAs) and RNA binding-proteins (RBPs) emerged as key regulators of post-transcriptional events, and they are dysregulated in many human solid cancers, including BC19,20.

LncRNAs, longer than 200 nucleotides in length, belong to a large class of noncoding RNAs and are implicated in the regulation of gene expression by different mechanisms that are not yet fully characterized21,22. Previous studies observed their role in several physiological and pathological events23.

Because of the poor prognosis detected in BC patients and the lack of standard therapeutic treatments that avoid chemoresistance is needed the study of molecular profiling to better describe the BC subtypes with higher accuracy. This would allow the understanding of the altered molecular mechanisms in a specific subtype of BC.

Recently, several studies have observed a strong association of lncRNAs with BC development, progression, and metastasis. Basically, lncRNAs could act as promoters or inhibitors of BC cell invasion and metastasis. However, few studies reported the association between lncRNAs and molecular subtypes of BC24.

Notably, as lncRNAs could be found in human body fluids, the characterization of lncRNAs offer the opportunity to avoid the difficulties related with tissue biopsy of the currently genetic tests (e.g., OncotypeDX).

In addition, the importance of lncRNAs in the administration of anticancer treatment is encouraged by their involvement in drug resistance in cancer. For example, in prostate cancer, numerous lncRNAs are correlated with resistance to hormonal therapy, such as NEAT1 and PRNCR125,26.

RBPs are involved in a wide range of molecular processes including cell adhesion and response to stress. Many RBPs bind to sequence-specific motifs or RNA secondary structures, or a combination of both to regulate RNA metabolism and function. RBPs participate in the generation of ribonucleoprotein complexes that are principally implicated in gene expression processes such as splicing, mRNA synthesis and degradation27,28. In addition, RBPs are found differentially expressed in different cancers and are able to regulate the expression of oncogenes and tumor suppressor genes29. Therefore, the characterization of RPBs could reveal novel targets of cancer treatment by studying the mechanisms behind RBP expression and the association between RBPs and RNAs30. This notwithstanding, many lncRNAs and RBPs have not yet been studied in detail31.

The main goal of the present study was to assess the interactions between differentially expressed genes, lncRNAs and RBPs in different BC subtypes classified by PAM50 classifier. Firstly, we identified differentially expressed genes, lncRNAs and RBPs for each BC subtype using a published dataset. Then, we studied their interactions specific for each BC subtype. Finally, we validated the interactions with a machine learning approach on an independent dataset.

Methods

Data

The study is applied on a BC dataset originated from The Cancer Genome Atlas (TCGA): TCGA-BRCA. In particular, we used the expression levels of mRNA, lncRNA and RBP extracted from Illumina HiSeq RNASeqV2 platform derived by 233 BC luminal A samples, 103 BC luminal B samples, 74 BC basal samples, 43 BC HER2 samples and 113 normal samples (NS). Clinical data were downloaded from TCGA and BC subtypes were previously determined by the molecular classification of 50-genes (PAM 50 predictor)32. We used TCGABiolinks package 2.1833 to download RNAseq-data of BC subtypes and to estimate differentially expressed genes between BC subtypes and normal tissues.

Figure 1 shows the workflow of the computational approach.

Figure 1
figure 1

Workflow of the computational approach. The computational method was applied considering the comparison of each breast cancer (BC) subtype vs normal breast tissue. Differential expressional analysis with quantile-adjusted conditional maximum likelihood was performed on The Cancer Genome Atlas (TCGA) data between each breast cancer subtype and normal breast tissue. The analysis identified differentially expressed genes (DEGs), differentially long non-coding RNAs (DE lncRNAs) and differentially RNA-binding proteins (DE RBPs). lncRNAs and RBPs were defined by LNCipedia and by study of Hentze et al. Furthermore, we evaluated the interactions among DEGs, DE lncRNAs and DE RBPs using RNAct tool. The expression levels of interacting DEGs, DE lncRNAs and DE RBPs specific for each BC subtype were considered as biomarkers to classify BC subtypes with Support Vector Machine (SVM) classification, using an independent dataset of Gene Expression Omnibus (GEO), GSE58212.

Differential expression analysis

Quantile-adjusted conditional maximum likelihood was used as statistical test to detect differentially expressed RNAs between normal breast tissue and BC32. This analysis was performed considering each BC subtype at time compared to normal breast tissue. Those genes with a |log2(fold change)|≥ 1 and adjusted p-value < 0.05 were considered to have statistical significance. The p-values were adjusted using the Benjamini–Hochberg procedure for multiple testing correction34.

We further identified significantly dysregulated RBPs based on RBP catalog of Hentze et al.27. To identify lncRNAs present in the RNA-seq data we consulted Ensembl Biomart 10035 and LNCipedia database version 5.236.

Venn Diagram among RNAs in the four BC subtypes was represented using the R-package VennDiagram 1.6.2037.

The functional role of differentially expressed lncRNAs was investigated using Cancer lncRNA Census 2.038.

Protein-RNA interaction predictions

Protein-RNA networks were retrieved from the RNAct database39 containing predicted and experimentally validated40,41 protein-RNA interactions (which is based on UniProt release 2017_10 and GENCODE release 27). Within RNAct, the binding abilities were predicted using the catRAPID algorithm3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44. A z-score is calculated based on the values for experimentally determined protein-RNA interactions from eCLIP data from the ENCODE project and provides a score for the interaction of a protein-RNA pair of interest (for more details39).

Survival analysis

Survival analysis was performed in October 2021 from The Human Protein Atlas website45. Kaplan–Meier analysis investigated the prognosis of BC patients and the differences between the survival curves were explored with the log-rank tests46. We considered a gene/RBP to be prognostic if p-value < 0.05.

Machine learning approach

We identified subtype-specific networks and we validated with a machine learning approach their ability to classify the four BC molecular subtypes. The performances of RNA interactions for each BC subtype were evaluated with a linear support vector machine (SVM) and random forest classifiers, using the R-package caret 6.0.8647. We used default parameters for linear SVM and random forest classifiers.

RNA expression levels were normalized to make their scale comparable with the caret function “preprocess”. We used an independent GEO dataset (GSE58212) that includes: 121 luminal A, 69 luminal B, 32 HER2, and 36 basal samples.

Results

Differentially expressed long non-coding RNAs

By comparing the four breast cancer subtypes to their respective normal tissues, we found 3199 differentially expressed genes (DEGs) from the comparison “Luminal A vs. NS”, 4074 from “luminal B vs. NS”, 4134 from “HER2 vs. NS”, and 4181 from “basal vs. NS”. Supplementary file 1 shows the list of DEGs for each subtype. We used normal breast tissue as reference, as the aim of this study is the identification of biomarkers that could explain the molecular mechanisms that are implicated in the differentiation of BC subtypes from normal breast tissue.

In total, we found 19 unique lncRNAs. We obtained 7 lncRNAs (HOTAIR, HCG11, HPYR1, TSIX, EMX2OS, MYCNOS, and DLEU2) in luminal A (4 up-regulated lncRNAs and 3 down-regulated lncRNAs), 11 lncRNAs (SNHG5, PVT1, HCG11, HOTAIR, HPYR1, DLEU2, MYCNOS, LOH12CR2, EMX2OS, UCA1,and PART1) in luminal B (6 up-regulated lncRNAs and 5 down-regulated lncRNAs), 11 lncRNAs (SNHG8, XIST, SNHG5, GAS5, MYCNOS, HOTAIR, HCG11, UCA1, EMX2OS, LOH12CR2, and EGOT) in HER2 (3 up-regulated lncRNAs and 8 down-regulated lncRNAs), and 11 lncRNAs (XIST, UCA1, HCP5, MYCNOS, HOTAIR, SNHG3, MIR155HG, PART1, DLEU2, EMX2OS, and EGOT) in basal (8 up-regulated lncRNAs and 3 down-regulated lncRNAs). Table 1 shows differentially expressed long non-coding for each BC subtype.

Table 1 Differentially expressed long non-coding (LNC) RNAs in luminal A, luminal B, HER2 and basal breast cancer.

1 lncRNA was found only in luminal A (TSIX), 1 lncRNA was found only in luminal B (PVT1), 2 lncRNAs were found only in HER2 (SNHG8 and GAS5), and 3 lncRNAs were found only in basal (HCP5, SNHG3 and MIR155HG). 3 lncRNAs were found in common among 4 BC subtypes (HOTAIR, EMX2OS and MYCNOS). Figure 2A shows a Venn diagram representing common lncRNAs among BC subtypes and subtype-specific lncRNAs.

Figure 2
figure 2

Venn diagram. It identifies: (A) common differentially expressed long non-coding RNAs (lncRNAs) and (B) RNA-binding proteins (RBPs) among breast cancer molecular subtypes.

Table 2 shows the functional information of lncRNAs extracted by Cancer lncRNA Census 2.0. We found that 14 of 15 lncRNAs were previously associated with different cancer types. GAS5, PVT1, and XIST showed a dual role as tumor suppressors and oncogenes. DLEU2 and EGOT play a role in cancer as tumor suppressors. HCG11, HCP5, HOTAIR, MYCNOS-01, PART1, SNHG3, SNHG5, SNHG8, and UCA1 were oncogenes in different caner types. To date, MIR155HG is not associated with a clear function with cancer.

Table 2 Functional information of lncRNAs as reported the Cancer lncRNA Census 2.0.

Differentially expressed RNA binding-proteins

We considered 1393 RBPs curated by Hentze et al.27 and identified significantly dysregulated RBPs based on the differential expression analysis as described above.

Among the 3199 differentially expressed genes in luminal A, there were 99 out of 1393 RBPs, among the 4074 DEGs in luminal B there were 176 RBPs. 155 RBPs out of 4134 DEGs were found in HER2 and among the 4181 DEGs there were 179 RBPs. Table 3 summarizes the results of differential expression analyses considering long non-coding RNAs and RNA-binding proteins compared to normal tissues.

Table 3 The table shows differentially expressed genes (DEGs), differentially expressed long Non-Coding (DE-lncRNAs) RNAs and differentially RNA binding-proteins (DE-RBPs).

4 RBPs (NUDT16L1, RBM20, PCDH20, and PCBP3) were found only in luminal A, 29 RBPs were found only in luminal B, 23 RBPs were found only in HER2, and 62 RBPs were found only in basal (Fig. 2B). Supplementary file 2 shows the list of RBPs for each subtype.

Interactions of RNA-binding proteins

Overall, from the differential expression analysis we found 5980 unique genes for all subtypes, 19 unique lncRNAs, and 281 unique RBPs.

We obtained interaction predictions for the 5980 RNAs and 281 RBPs from the RNAct protein–RNA interaction database. Interaction predictions are prioritized by a normalized score (z-score). The distribution of z-scores in our data is shown in the Supplementary Fig. 3. We selected the protein-RNA interactions that obtained a z-score ≥ 2 since this should provide a good balance between sensitivity and specificity. We obtained a network of 2585 nodes (241 RBPs, 7 lncRNAs, and 2337 RNAs), altered in at least one BC subtype, with 45,727 interactions. Supplementary file 4 shows these interactions.

We selected direct interactions involving only differentially expressed RBPs and genes present in a single subtype, namely, we considered the subtype-specific interactions. For luminal A we found a network consisted of 2 RBPs (RBM20 and PCDH20) and 2 genes (HOXB3 and RASSF7). The specific network for luminal B includes 21 RBPs and 53 DEGs. The specific network for HER2 includes 14 RBPs and 30 DEGs. The specific network for basal includes 54 RBPs and 302 DEGs. However, we did not find lncRNAs specific for a BC subtype interacting with the RBPs and DEGs specific for the same BC subtype.

From the subtype-specific interactions we evaluated the number of RNA targets for each RBP and the number of RBP for each RNA (Fig. 3).

Figure 3
figure 3

The boxplots show the number of RNAs or RNA-binding proteins (RBPs) that there are per protein or RNA, considering direct interactions involving only differentially expressed RBPs and genes present in a single breast cancer subtype (t-test, * p-value < 0.05, ** p-value < 0.01, *** p-value < 0.001).

We found that dysregulated RNAs in luminal A have a lower number of differentially expressed RBPs than dysregulated RNAs in luminal B, HER2 and basal BC (p-value < 0.001). We can explain these results indicating that the abnormal expression of RBPs increase with the worsening of the prognosis. In addition, the number of altered RNAs regulated by each altered RBP is directly proportional to the aggressiveness of the BC subtype, suggesting RBP as potential biomarkers responsible of BC subtypes differentiation.

We analyzed the prognostic role of subtype-specific RBPs and genes (Table 4). We found that 43% of specific RBPs for luminal B, 57% of specific RBPs for HER2, and 46% of specific RBPs for basal can differentiate BC patients with good and poor prognosis. 50% of specific genes for luminal A, 51% of specific genes for luminal B, 40% of specific genes for HER2, and 40% of specific genes for basal may influence BC patient survival.

Table 4 List of potential prognostic subtype specific RBPs/genes as obtained from The Human Protein Atlas.

We found a lower p-value (p-value = 0.00001) for DCTPP1 and NFKBIE.

Validation of protein-RNA interactions for each BC subtype

We validated subtype-specific interactions on an independent dataset from the NCBI Gene Expression Omnibus (GEO), GSE58212. We used the expression levels of interacting biomarkers for the classification, namely the expression levels of 2 RBPs (RBM20 and PCDH20) and 2 genes (HOXB3 and RASSF7) that we obtained for luminal A, the 21 RBPs and 53 DEGs obtained for luminal B, the 14 RBPs and 30 DEGs for HER2, and 54 RBPs and 302 DEGs obtained for basal.

Using BC subtype-specific interactions we trained (75% of the original dataset) and tested (25% of the original dataset) a classifier with a linear support vector machine (SVM) and random forest models obtaining good performances (Table 5). We achieved the best accuracy from the comparison HER2 vs other subtypes (accuracy = 0.96) and basal vs other subtypes (accuracy = 0.96) using SVM classifier. Good performances were achieved with both classifiers for luminal A vs other subtypes and basal vs other subtypes. Low sensitivity was obtained in luminal B vs other subtypes and HER2 vs other subtypes with random forest classification.

Table 5 Performance of classification Support Vector Machine (SVM) and Random Forest (RF) using subtype-specific interactions (sensitivity, specificity and accuracy).

Discussion

In this study, we examined the DEGs, differentially expressed lncRNAs and differentially expressed RBPs and their interactions in different BC subtypes. Firstly, we analyzed differentially expressed genes from the comparison luminal A vs normal samples, luminal B vs normal samples, HER2 vs normal samples, and basal vs normal samples. We obtained 3199 DEGs in luminal A, 4074 DEGs in luminal B, 4134 DEGs in HER2, and 4181 DEGs in basal.

Then, we focused on subtype-specific lncRNAs and we found 1 luminal A-specific lncRNA (TSIX), 1 luminal B-specific lncRNA (PVT1), 2 HER2-specific lncRNAs (SNHG8 and GAS5) and 3 basal-specific lncRNAs (HCP5, SNHG3 and MIR155HG), as reported in Fig. 2A.

TSIX mediates the X chromosome inactivation acting as a XIST repressor. Indeed, a strong inverse correlation between XIST and TSIX was demonstrated: a down-regulation of TSIX leads to an up-regulation of XIST that causes the inactivation of the X chromosome48. A previous study demonstrated that TSIX together with other lncRNAs such as OIP5-AS1, TUG1, NEAT1, MALAT1, and XIST were able to synergistically regulate cancer genes and pathways across different cancer types49. In addition, TSIX was differentially expressed in lung cancer50.

A previous study reported that PVT1 was up-regulated in BC tissues when compared with the normal tissues and its silencing repressed tumor growth51. PVT1 is associated with other types of cancers such as lung and ovarian cancer and is correlated with the survival of patients52. Furthermore, PVT1 regulates several cancers processes and pathways such ad cell–cell adhesion and TGF‐β signaling pathway. A previous study reported that PVT1 was co-expressed with another gene in the TGF‐β signaling pathway. Indeed, PVT1 can regulate the protein stability of the MYC oncogenic protein53.

In our study, 2 HER2-specific lncRNAs (SNHG8 and GAS5) were found. SNHG8 was overexpressed in different types of cancer, suggesting its role in the progression of these tumors. The silencing of SNHG8 inhibits the proliferation and invasion of BC cells, MCF-7 and ZR-75-3054. A recent study showed a possible molecular mechanism of action of SNHG8: it is a sponge for miR-656-3p and modulates SATB1 expression. SATB1 is associated with cancer cell proliferation, migration, and invasion55.

GAS5 is a strong candidate as prognostic biomarker since it was identified significantly downregulated in BC and correlated with poor prognosis. In addition, GAS5 is also a potential drug target since it is implicated in resistance to multiple drugs in BC such as tamoxifen and lapatinib56.

3 basal-specific lncRNAs (HCP5, SNHG3 and MIR155HG) were found. HCP5 was positively associated with the expression of immune checkpoints since it is mainly found expressed in immune system cells. In addition, HCP5 promotes tumor growth in vivo and in vitro as well as apoptosis and proliferation57. A recent study suggested that HCP5 could be a promising drug target in triple negative BC58.

The oncogene SNHG3 was up-regulated in BC cells and is associated with the growth of cell proliferation regulating tRNA processing and signal transduction. A recent study suggested that the down-regulation of SNHG3 might act as a possible therapeutic strategy for BC. In addition, it was demonstrated an inverse correlation between SNGH3 and miR-154-5p: increase of SNHG3 inhibits miR-154-5p and upregulates BC cell proliferation59.

Few is known about the role of MIR155HG in BC. MIR155HG is a precursor of miR-155-5p and was identified as a direct target of FOX340. A recent study proposed the study of the expression MIR155HG together with FANCI and C-MYC as potential diagnostic test and drug targets in gynaecological malignancies60.

Then, in our study we focused on subtype-specific RBPs. 4 RBPs were found only in luminal A, 29 RBPs were found only in luminal B, 23 RBPs were found only in HER2, and 62 RBPs were found only in basal (Fig. 2B).

We found NUDT16L1, RBM20, PCDH20, and PCBP3 as luminal A-specific RBPs.

NUDT16L1, also called SDOS and TIRR, is a novel RBP that regulates several transcripts encoding for centrosomal proteins and has a key role controlling cilia formation. Cilia are organelles present on eukaryotic cells that plays a role in cell progression. However, for a long time NUDT16L1 has been little studied and novel uncharacterized associations with cancer must be studied61.

RBM20 plays a role in the familial cardiomyopathy acting on titin and tropomyosin, two proteins involved in the biomechanics of the striated muscle. It is also associated with fasting glucose regulating insulin damage in cardiac tissues. However, its role in cancer has not yet been demonstrated62.

PCDH20, member of subfamily of the cadherin family, is down-regulated in non-small cell lung cancer63, nasopharyngeal carcinoma64, and hepatocellular carcinoma65. The prognostic role of PCDH20 was reported in a recent study: patients with high PCDH20 expression showed a better overall survival than those with low PCDH20 expression in hepatocellular carcinoma66. The tumor-suppressor gene PCDH20 through the Wnt/β-catenin signaling pathway acts inhibiting cell proliferation and cell migration67.

PCBP3 was associated with favorable prognosis in pancreatic cancer. However, no previous study investigated the molecular mechanism of PCBP3 in carcinogenesis68.

Among 29 luminal B-specific RBPs we focused on GSTP1 and RRS1 because previous studies reported an interesting association with BC.

GSTP1 is involved in the drug resistance of tumor cells, including BC. A recent study showed that high expression of GSTP1 can activate the NF-κB signaling pathway in tumor associated macrophages (TAMs) and regulates the expression of IL-669.

RRS1 is a crucial nuclear protein implicated in ribosome biogenesis. It was overexpressed in several human cancers including BC. In addition, elevated RRS1 expression levels were correlated with lymph node metastasis and unfavorable clinical outcome. Some evidence also provided new molecular mechanisms of RRS1 in the proliferation of BC through RPL11/MDM2/p53 pathway70.

Among 23 HER2-specific RBPs we identified ALDH18A1 and LASP1 as potential BC biomarkers as they were associated with BC in previous studies. ALDH18A1 is an enzyme implicated in the conversion of glutamine to proline through glutamate. Its over-expression increases proline levels and decreases cell survival in BC as well as reduces reactive oxygen species. ALDH18A1 is also associated with an oncogene, MYC able to regulate cell metabolism and key genes implicated in cancer71,72.

LASP1 is a well-known protein that interacts with many proteins regulating tumor cell migration and invasion. A previous study showed that LASP1 binds to Ago2 that plays a key role in BC cell motility in response to CXCR4 activity73.

Among 62 basal-specific RBPs we found as interesting biomarkers: SERPINH1 and DKC1.

SERPINH1 plays a key role for the correct folding and secretion of different types of collagen and has previously been associated to cancer progression. Its overexpression is correlated with angiogenesis, migration, and invasion74. A previous study demonstrated that SERPINH1 is regulated by miR-148a-5p, a miRNA predictive of unfavorable prognosis75.

DKC1 is associated with a poor prognosis in BC. Indeed, patients with a higher expression of DKC1 metastasize more frequently to lymph node than patients with lower DKC1 expression levels. The role of DKC1 in cancer prognosis could be explained by the role of DKC1 in regulating mRNA translation76.

Furthermore, in the present study we selected direct interactions involving subtype-specific differentially expressed RBPs and DEGs. Although previous studies demonstrated that numerous lncRNAs are deregulated in different cancer types and RBPs could play a role to the deregulation of lncRNAs31,77. in our study we did not find interactions involving differentially expressed lncRNAs and RBPs specific for each subtype. Indeed, the final signature for each subtype is composed of only subtype-specific interacting genes and RBPs. This result can derive by some limitations of our study, such as: (i) lncRNA profiles based on TCGA data, which contain cancer cells and stromal cells could influence the results obtained by differential expression analysis, (ii) the low characterization of lncRNAs in the TCGA data. The characterization of lncRNAs could be more accurate with the new knowledge of data in the future.

However, in our study we found interesting networks consisted of subtype-specific interacting genes and RBPs: a network of only 2 RBPs (RBM20 and PCDH20) and 2 genes (HOXB3 and RASSF7) for luminal A, a network of 21 RBPs and 53 DEGs for luminal B, a HER2-specific network of 14 RBPs and 30 DEGs, and a network of 54 RBPs and 302 DEGs for basal BC. From these networks we investigated the number of RNA targets for each RBP and the number of RBP for each RNA. We found that the number of RBPs per RNA and the number of RNAs per RBP increases with the aggressiveness of the BC molecular subtype. This finding could indicate the key role of the interactions between differentially expressed RBPs and DEGs in the progression of BC. Indeed, luminal A, the less aggressive BC subtype showed a lower number of RNA targets for each RBP and of RBP targets for each RNA. To our knowledge this is the first study that obtained a similar association. Encouraged by the results obtained that demonstrated the specific RBP-RNA interactions for each subtype we validated subtype-specific networks using a machine learning approach on an independent BC dataset from GEO. Overall, we obtained good performances of classification with an accuracy > 0.80 (Table 5). We achieved the best performances from the classification HER2 vs other subtypes (accuracy = 0.96) and basal vs other subtypes (accuracy = 0.96). Overall, given the good results of the classifier we propose the study of these BC subtype-specific interacting biomarkers as potential candidates for differential diagnosis of BC.

Among biomarkers, we found novel RBPs and genes that the survival analysis showed to have a prognostic role.

The low expression of RASSF7, a specific gene of luminal A, plays a prognostic role in BC as it is associated with a poor prognosis. To date, there is not a clear association of RASSF7 with BC.

The survival analysis in this study found that high expression of a specific gene of luminal B, DCTPP1, in a group of 609 BC patients is associated with a poor prognosis respect to 466 BC patients with a low expression. Although previous studies showed its role in DNA damage and genetic instability further studies are needed to investigate its potential therapeutic in BC78.

We found that DHRS11, KLC3, NAGS, and TMEM98, specific genes for HER2, are associated with a poor prognosis in BC patients. DHRS11 is implicated in the pathway of cytochrome P450, KLC3 in RHO GTPases activate KTN1M, NAGS in the urea cycle, and TMEM98 in oligodendrocyte differentiation79. However, to date there is not a clear association of these genes with BC.

We obtained ABHD14A and ADSSL1 as potential candidate prognostic biomarkers for basal BC. ABHD14A is associated with metabolic disorders of biological oxidation enzymes, and ADSSL1 with Purine ribonucleoside monophosphate biosynthesis80.

Although we propose several potential prognostic biomarkers for BC subtypes our study presents some limits. We selected in silico the biomarkers and validated them with a machine learning approach using an independent GEO dataset, and survival analysis. Molecular validation of these biomarkers will be performed in the near future as further studies are needed for translating them to clinical practice.

Conclusions

In this study, we firstly examined BC subtype-specific DEGs, differentially expressed LNCs and RBPs using BC-TCGA dataset. Then, we investigated the regulatory interactions between RBPs and their target genes in BC subtypes. We found different networks specific for each BC subtype: a network of 2 RBPs (RBM20 and PCDH20) and 2 genes (HOXB3 and RASSF7) for luminal A, a network of 21 RBPs and 53 DEGs for luminal B, a HER2-specific network of 14 RBPs and 30 DEGs, and a network of 54 RBPs and 302 DEGs for basal BC. Overall, the analysis sheds light on the role of RBPs in regulating different BC subtypes and we provided a data exploration analysis to aid future experimental studies. In addition, the analyses in this study suggested some novel prognostic BC biomarkers: RASSF7 for luminal A, DCTPP1 for luminal B, DHRS11, KLC3, NAGS, and TMEM98 for HER2, and ABHD14A and ADSSL1 for basal.