Integrative analysis of LncRNA-mRNA signature reveals a functional LincRNA in triple-negative breast cancer

Triple-negative breast cancer (TNBC) is an aggressive subtype of breast cancer. It is still unclear that the mechanisms by which long non-coding RNA (lncRNA) regulates tumorigenesis of TNBC. In this study, we explored the function and regulation of lncRNA in TNBC. The differentially expressed and overlapped lncRNAs were obtained from two microarray datasets of breast cancer. The cancer genome atlas (TCGA) data was applied to validate the roles of top differentially expressed lncRNAs. The potential relationship among lncRNAs, miRNAs, and mRNAs and the effects of them on the TNBC tumorigenesis were further explored through multiple bioinformatic methods. Long intergenic non-protein coding RNA 1351 (LINC01351) was first discovered to play an oncogenic role in TNBC. The highly expressed LINC01351 was significantly related to aggressive subtypes, advanced stages and metastasis of breast cancer. The overexpressed LINC01351 was associated with adverse prognosis of patients with TNBC. LINC01351 was found to negatively regulate ELK4 which was involved in the transcriptional regulation in cancer. The high expression of ELK4 showed favorable prognosis of patients with basal-like 1 subtype and luminal androgen receptor subtype of TNBC. The dysregulation of LINC01351 played an important role in triple-negative breast cancer. LINC01351 could be a poor prognostic marker and a potential target for patients with TNBC.


Introduction
Breast cancer is the most common malignant tumor and the second cause of cancer-related mortality in women [1]. Since 2015, breast cancer has become the top cancer of new cancer cases and deaths in China [2]. Triple-negative breast cancer (TNBC), which accounts for 15%-20% of breast cancer, is an aggressive subtype characterized by negative expression of estrogen receptor (ER), progesterone receptor (PgR) and ErbB-2/human epidermal growth factor receptor 2 (Her2) [3]. It has been well known that TNBC is associated with earlier metastasis, faster tumor proliferation, more drug resistance and poorer prognosis [4]. Except for poly (ADP-ribose) polymerase (PARP) inhibitors that have been used to treat patients with genomic instability and/or BRCA mutations, the therapies of TNBC are restricted to conventionally available chemotherapies [5]. In recent years, immunotherapy agents showed impressive result of clinical trials in the treatment of metastatic TNBC with positive expression of programmed cell death ligand 1 (PD-L1), however, it still need more validation to stratify the patient with immune infiltration [6]. It is urgently needed to uncover the mechanism of TNBC and to identify the novel prognostic target and therapeutic strategy for patients with TNBC.
Long non-coding RNAs (lncRNA) are a class of RNA transcripts with a length of > 200 nucleotides that do not encode proteins [7]. The function of lncRNA is involving in the flux of genetic information and diverse biological processes, such as splicing, stability of messenger RNA, epigenetic modulation, transcriptional, and posttranslational modifications and cell proliferation and differentiation [7,8]. Moreover, lncRNA is also reported to regulate tumorigenesis of multiple human cancers including breast cancer [9,10]. However, the mechanism of lncRNA in TNBC has not been well investigated. In this study, we explored the function and regulation of lncRNA in TNBC to obtain the candidate prognostic marker and molecular target of therapy.

Microarray data
Two lncRNA microarray datasets (GSE60689, TNBC vs. normal breast and GSE112848, breast cancer vs. normal breast) were downloaded from the Gene Expression Omnibus (GEO) database (https:// www. ncbi. nlm. nih. gov/ geo/) [11,12]. From GSE60689 dataset, the expression of lncRNAs was profiled and generated from the comparison of two stage III TNBC tissues and their paired adjacent noncancerous tissues by LncRNA Array 3.0 (ArrayStar). The pathological type of TNBC tissues was infiltrating ductal carcinoma. The ages of two female patients are 67 and 73, separately. The lncRNAs expression of GSE112848 were profiled and generated from three pairs of primary breast cancer tissues and adjacent noncancerous tissues. The pathological type of cancer tissues was ductal carcinoma in situ. The ages of three female patients are 43, 67 and 73, separately. Raw data was processed by SangerBox package (http:// sange rbox. com/), GEO2R online software (https:// www. ncbi. nlm. nih. gov/ geo/ geo2r/) and MultiExperiment Viewer (http:// mev. tm4. org). The differentially expressed and overlapped lncRNAs with the criteria of |log fold change (FC)|> 3.0 and p value < 0.05 was screened out for further analysis.

Functional analysis of lncRNA
The correlation between the expression of lncRNA and the clinical characteristics including TNM stages, molecular subtypes, overall survival (OS) and relapse-free survival (RFS) were investigated with the data from The Cancer Genome Atlas (TCGA) Breast Cancer (BRCA) RNAseq, which was obtained from TANRIC database and UCSC Xena database (https:// xenab rowser. net/) online databases. The survival analysis was performed by Kaplan-Meier Plotter (http:// kmplot. com/ analy sis/) software. All the above databases were applied for in-depth analysis on the function and regulation of lncRNA.

Statistical analysis
The results were shown as mean ± standard deviation. The statistical significance of the microarray result was analyzed by fold change and the student's t-test. The threshold value we used to screen differentially expressed lncRNA is a fold change > 3.0 or < -3.0 (p < 0.05). The t-test was performed to analyze the difference between groups. Pearson correlation was applied to calculate different parameters in terms of correlation. The p value < 0.05 was considered as statistically significant. All analyses were conducted in SPSS Statistics 22.

Dysregulated lncRNAs were found in triple-negative breast cancer
Two microarray datasets (GSE60689 and GSE112848) were analyzed according to the flow chart. As shown in Fig. 1 Fig. 1, after the annotation and verification of candidate 21 lncRNAs, three upregulated lncRNAs (LINC02303, LINC01351 and LINC01511) and two downregulated lncRNAs (CHL1-AS2 and LINC01612) were obtained from the strict verification via NON-CODE and BLAST online software. Based on the literatures about the five dysregulated lncRNAs, the long intergenic non-protein coding RNA 1351 (LINC01351) was reported to be associated with the adverse prognosis of patient with colorectal cancer [13]. However, the effect of LINC01351 in breast cancer was not clear. Thus, LINC01351 was chosen for the following studies about the potential roles on breast cancer, especially TNBC.

LINC01351 was an oncogenic indicator in breast cancer
To date, a large number of lncRNAs have been discovered through functional genomic studies, however, the potential role of LINC01351 in breast cancer has not been well investigated yet. The function and regulation of LINC01351 were explored in the study.
The expression of LINC01351 was tested by using the data from GEO datasets (GSE60689 and GSE112848). All the data was processed to be two groups including BC groups (BC vs. normal breast) and TNBC groups (TNBC vs. normal breast). As shown in Fig. 3A and breast cancer tissues comparing to normal breast tissues ( Fig. 3A-c, all p < 0.001).
In breast cancer, hormone-receptor negative (ER and PR negative) and Her2 positive are considered to be poor prognostic markers. So, the correlation between the expression of LINC01351 and molecular pathological features of breast cancer were further detected via TCGA and TANRIC database. As shown in Fig. 3B, increased expression of LINC01351 was discovered in ER-negative, PR-negative and Her2-positive breast cancer tissues (ERnegative vs. ER-positive, p = 3.21e-4; PR-negative vs. PRpositive, p = 9.30e-8; Her2-positive vs. Her2-negative, p = 3.83e-8), which suggested that LINC01351 contributed to the malignant phenotypes of breast cancer. In the GDC TCGA BRCA data, 18 cases of luminal A, 24 cases of luminal B, 10 cases of Her2-enriched and 15 cases of TNBC samples were induced into the analysis. Similarly, the expression of LINC01351 was significantly higher in Her2-enriched or triple-negative breast cancer tissues than in luminal subtypes of breast cancer tissues (TNBC vs. luminal A, p = 0.012; TNBC vs. luminal B, p = 0.045; Fig. 3C). There was no significant difference in the expression of LINC01351 between Her2-enriched A High-expressed LINC01351 was found in BC tissues or TNBC tissues compared with normal breast tissues. a LINC01351 was higher expressed in BC tissues than normal breast tissues by using data from GSE112848 datasets (p = 0.007). b LINC01351 was higher expressed in TNBC tissues than normal breast tissues by using data from GSE60689 datasets (p = 0.034). c LINC01351 was found to be higher expressed in BC tissues or TNBC tissues than normal breast tissues by using the GDC TCGA BRCA data (p < 0.001). B Increased expression of LINC01351 was associated with aggressive subtypes of breast cancer. Higher expression of LINC01351 was discovered in ER-negative, PR-negative or Her2-positive breast cancer tissues than ER-positive, PR-positive or Her2-negative breast cancer tissues (p < 0.001). C The expression of LINC01351 was higher in triple-negative breast cancer tissues than in luminal-subtype breast cancer tissues (p < 0.05). D For the TNM stages of breast cancer, the expression of LINC01351 was progressively increasing from stage I/II to stage III/IV (p = 0.007). E Higher expression of LINC01351 was detected in metastatic breast cancer tissues than in primary breast cancer tissues by using the GDC TCGA BRCA data (p = 0.003) and triple-negative breast cancer (TNBC vs. Her2-positive, p = 0.073).
Moreover, the expression of LINC01351 was progressively increasing with the TNM stages of breast cancer (p = 0.007, Fig. 3D). Generally, TNBC is an aggressive subtype of breast cancer with early metastasis. Then the expression of LINC01351 was further detected in both metastatic tumor samples and primary tumor samples by exploiting the GDC TCGA BRCA data. The higher expressed LINC01351 was found in the metastatic breast cancer tissues than in the primary breast cancer tissues (Fig. 3E, p = 0.003).
All these results indicated that highly expressed LINC01351 contributed to the aggressive subtypes, advanced TNM stages, and metastasis of breast cancer. It was the first time to report that LINC01351 played as an oncogenic and poor molecular pathological indicator of breast cancer.

LINC01351 was a poor prognostic marker in triple-negative breast cancer
To understand the potential role of LINC01351 on the survival of patients with breast cancer, the survival data was further analyzed via the TCGA and TANRIC database. The primary breast cancer tissues were divided equally into two groups based on the expression of LINC01351 (low-expression and high-expression, n = 372, separately). In the study, all the duration of following-up was 10 years. As demonstrated in Fig. 4A, the result of survival analysis showed that LINC01351 played no effect on the overall survival (OS) of breast cancer (p = 0.3265). For the relapse-free survival (RFS), 588 cases of breast cancer with survival data were introduced into the analysis. Low-expressed group (n = 294) and high-expressed group (n = 294) were also divided by the expression of LINC01351. LINC01351 showed no effect on the RFS of breast cancer (p = 0.06, Fig. 4B).
Then for the analysis about TNBC, the higher expression of LINC01351 was found to be significantly related with poorer prognosis of patients. 83 cases of TNBC were divided into low-expressed group (n = 42) and highexpressed group (n = 41). Higher expressed LINC01351 was associated with poor OS of patients with TNBC (p = 0.047, Fig. 4C). Similarly, LINC01351 was significantly related to poor RFS of patients with TNBC from the analysis about 30 cases of low-expressed TNBC and 29 cases of high-expressed TNBC (p = 0.029, Fig. 4D).
All the above data indicated that highly expressed LINC01351 was associated with poor prognosis of patients with TNBC. It was suggested that the role of LINC01351 on survival of patients was specific for TNBC Fig. 4 LINC01351 was a poor prognostic indicator in triple-negative breast cancer. A LINC01351 played no effect on the overall survival of breast cancer after 10-year-following up (p = 0.3265). B LINC01351 showed no effect on the relapse-free survival of breast cancer after 10-year-following up (p = 0.06). C Higher expressed LINC01351 was associated with poor overall survival of patients with TNBC after 10-year-following up (p = 0.047). D LINC01351 was significantly related to unfavorable relapse-free survival of patients with TNBC after 10-year-following up (p = 0.029) but not for whole types of breast cancer. LINC01351 was an unfavorable prognostic marker of TNBC.

Integrative analysis of lncRNA, mRNA and microRNA signature
Long non-coding RNA presents interaction domains for DNA, mRNA, microRNA (miRNA, miR), and protein, depending on both sequence and secondary structure [8]. Emerging evidence indicates that lncRNA could be ceRNA for miRNAs in cancer [14]. Thus, we explored the potential regulation of LINC01351 to mRNAs and miRNAs. The network of regulation was inquired and further visualized by Cytoscape software. Among them, 6 miRNAs (miR-504, miR-548n, miR-567, miR-618, miR-718, and miR-1280) and 4 mRNAs (ELK4, ALPI, SCEMOL and RAD23B) were identified to be targets of LINC01351. Based on the analysis from TANRIC database, LINC01351 downregulated ELK4, SCEMOL and RAD23B and upregulated ALPI (all p < 0.001). For the miRNAs, miR-504 was downregulated by LINC01351, while miR-548n, miR-567, miR-618, miR-718, and miR-1280 were upregulated by LINC01351 (all p < 0.01). The positive and negative regulation between LINC01351 and target mRNAs or miRNAs were all described in Fig. 5. It is well known that miRNAs are small non-coding RNAs that usually inhibit gene expression through partially complementary elements in the 3′ UTR of their target mRNAs [15]. Furthermore, the downstream target genes of six miRNAs (miR-504, miR-548n, miR-567, miR-618, miR-718, and miR-1280) were obtained from Tar-getScan and miRTarBase database. All the target mRNAs were predicted to be downregulated by their miRNAs. Then the overlapped target mRNAs of both TargetScan and miRTarBase databases were also shown by Cytoscape software in Fig. 5. The above result demonstrated that LINC01351 was involved in the complex regulation network of mRNAs and miRNAs.

ELK4 was down-regulated by LINC01351 and associated with transcriptional regulation in cancer
To further understand the effect of LINC01351 during carcinogenesis, functional enrichment and GO annotations of both overlapped genes and target genes of LINC01351 were analyzed by Metascape database. It was found that gene signature was enriched in the transcriptional misregulation in cancer (p < 0.001, Fig. 6A). In the enriched functional group ETS transcription factor ELK4, also known as SAP1, was chosen for the following analysis, because it was identified as the direct target negatively regulated by LINC01351 (p = 0.00034). ETS-like transcription factor 4 (ELK4) is a member of the ETS family of transcription factors and of the ternary complex factor (TCF) subfamily [16]. The androgen respondent ELK4 was reported to be correlated with the oncogene MYC [17]. However, the exact role of ELK4 in TNBC was not well understood yet.
To clarify the effect of ELK4 and the inhibition role regulated by LINC01351, the expression of ELK4 was tested via immunohistochemistry (IHC) in breast cancer tissues compared with normal breast tissues. As shown in Fig. 6B, the results of IHC showed that ELK4 was highexpressed in normal breast tissue and weak-expressed in breast cancer tissue, which was exactly opposite to the expression of LINC01351 in breast cancer compared with normal breast tissue. Moreover, the correlation between the expression of ELK4 and LINC01351 was investigated by analyzing the TCGA and TANRIC database. As shown in Fig. 6C, the expression of ELK4 showed negative correlation with the expression of LINC01351 (p = 0.019). All the results demonstrated that LINC01351 down-regulated the expression of ELK4 in breast cancer.

LINC01351/ELK4 axis predicted prognosis and sensitivity of therapy in triple-negative breast cancer
In the study, to explore the regulation of LINC01351/ ELK4 axis on TNBC, the survival analysis was performed by using Kaplan-Meier Plotter database. The high expression of ELK4 was associated with the favorable prognosis of patients with TNBC (p = 0.0035, Fig. 6D). TNBC is defined by lack of expression of ER, PR and Her2 and is still a big challenge for anticancer therapy. To identify the best treatment strategy, 6-subtype-classification of TNBC was reported by Lehmann in 2011 [18]. For each subtype, available therapy was predicted and recommended in the study.  < 0.05, Fig. 6D). The survival data was exactly opposite to the survival data of LINC01351 in TNBC, which suggested that LINC01351 negatively regulated the expression of ELK4 and affected its roles on the overall survival of patients with BL1 and LAR subtypes of TNBC. Besides, LINC01351/ELK4 axis could be a potential prognostic marker of TNBC. Based on the subtypes of Lehmann classification, BL1 and BL2 subtypes preferentially respond to the therapy regulating the cell cycle and DNA damage. The patient with BL1 subtype of TNBC could receive more benefits from chemotherapy. IM subtype is associated with immunotherapy. M and MSL subtypes are related to epithelialmesenchymal transition, growth factor pathways and PI3K/mTOR pathway. The LAR subtype is characterized by androgen receptor (AR) signaling and sensitive to AR antagonist [18]. The above results showed that the higher expression of ELK4 was significantly related to better overall survival in BL1 and LAR subtypes of TNBC. The BL1 or LAR subtype of TNBC with low-expressed LINC01351 and high-expressed ELK4 was considered to be more sensitive to the treatment of chemotherapy (e.g., cisplatin) or AR antagonist drugs (e.g., bicalutamide), separately. It could be an easy approach to test the expression of LINC01351/ELK4 axis to predict the prognosis and sensitivity of anticancer therapy.

Discussion
Long non-coding RNAs (lncRNA) are non-coding transcripts > 200 nucleotides long that have been found to act as master regulators for gene expression and play a critical role in various biological functions and diseases including human cancers. Several studies have identified that lncRNA with predictive properties of cancer progression or diagnostic power in breast cancer [19]. In eukaryotic cells, long intergenic non-coding RNAs (lin-cRNAs) are the largest class of lncRNAs. The transcripts of lincRNAs do not overlap protein-coding sequences [20]. Increasing evidence suggests that lincRNAs play effects on the maintenance of cell integrity, regulation of gene function, and involvement in development and disease. Marked upregulation of lincRNAs such as Linc-ROR was observed in breast cancer [21,22]. However, the function and regulation of long intergenic non-protein coding RNA 1351 (LINC01351) in breast cancer has not been reported yet.
Here, we found that LINC01351 was high-expressed in triple-negative breast cancer tissues compared with normal breast tissues or luminal-subtype breast cancer tissues. The overexpressed LINC01351 was positively correlated with aggressive subtypes, advanced TNM stages, and metastasis of breast cancer. Additionally, the upregulated expression of LINC01351 predicted unfavorable overall survival and relapse-free survival of patients with TNBC. It was the first time that LINC01351 was discovered to be a poor prognostic indicator and an oncogenic marker in triple-negative breast cancer.
In our study, LINC01351 was found to down-regulate the expression of ELK4. ELK4 is a member of the ternary complex factor subfamily of E twenty-six domain transcription genes [16]. ELK4 was reported to be correlated with the oncogene MYC in breast cancer [17]. In colorectal cancer, ELK4 cooperated with BRD2 to regulate the transcription of LAMB [23]. ELK4 was involved in the regulation of the chimeric fusion RNA, SLC45A3-ELK4, and played effects on cell proliferation in androgen-dependent prostate cancer cells [24]. However, the exact role of ELK4 in TNBC was not well understood. In our studies, ELK4 and LINC01351 showed the inverse impact on overall survival of patient with TNBC. Highexpression ELK4 was associated with favorable prognosis of patients. Furthermore, higher expression of ELK4 was significantly related to better overall survival in BL1 and LAR subtypes of TNBC after 10-year-following up. All the above finding showed that LINC01351 negatively modulated ELK4 in triple-negative breast cancer. The upregulation of LINC01351 was a poor prognostic marker for patient with triple-negative breast cancer. In addition, the BL1 or LAR subtype of TNBC with lowexpressed LINC01351 and high-expressed ELK4 could be more sensitive to the treatment of chemotherapy or AR antagonist drugs. The potential relationship is worth to explore in future studies.

Conclusion
In our study, it revealed that highly expressed LINC01351 was significantly related to aggressive subtypes, advanced stages and metastasis of breast cancer. LINC01351 was an oncogenic indicator and a poor prognostic marker in TNBC by negatively regulating ELK4. The high expression of ELK4 showed favorable prognosis of patients with BL1 and LAR subtype of TNBC. A novel regulatory axis was revealed, which may offer new insight into the prognostic evaluation and treatment strategies for patients with triple-negative breast cancer.