Background

Colon adenocarcinoma (COAD) is one of the most common gastrointestinal (GI) cancers and is the second leading cause of cancer-related death, globally [1, 2]. It is demonstrated that COAD occurs in approximately 5% of overall population at any given time in the world [3]. Despite the current screenings and therapies such as endoscopic resection and radical surgery, nearly half of the patients are diagnosed as advanced cases of COAD, experiencing tumor recurrence and relapse. COAD tumorigenesis has complicated multi-step processes including colon epithelial cell proliferation, aberration in differentiation, apoptosis resistance, survival, and invasion mechanisms [4]. Molecular aberrations of tumor suppressors and/or oncogenes are also one of the main contributors in different types of tumors especially COAD tumorigenesis [5]. However, due to complicacy of the underlying molecular pathways, the exact pathogenic contributors of COAD have not yet been clarified. Hence, there is an urgent need to indicate promising diagnostic and prognostic biomarkers for COAD. Recent investigations have highlighted the role of non-coding RNAs in the tumorigenesis of various malignancies. Among different kinds of non-coding RNAs, long non-coding RNA (lncRNA) is a putative class of non-coding RNA with more than 200 nucleotides in length, without any open-reading-frame (ORF) to encode proteins [5, 6]. Interestingly, a large body of evidence indicates that lncRNAs plays critical roles in a variety of biological processes including cell proliferation, cellular development, differentiation, carcinogenesis, and metastasis through modulating gene expression at the transcriptional and posttranscriptional levels directly or by recruiting chromatin remodeling factors [6,7,8]. Aberrant expression of lncRNAs has been well-documented in different sorts of cancers [9]. Dysregulation of lncRNA HOTAIR, H19, MALAT1, SNHG7, GAS8-AS, and NEAT1 were extensively well-studied and have been demonstrated to contribute in tumorigenesis and poor prognosis [5, 9,10,11,12,13]. Numerous investigations have shown that the lncRNAs can exert their function by competing endogenous RNA (ceRNA) crosstalk. For instance, it has been shown that lncRNA SCARNA2 was overexpressed in COAD tissues and it remarkably correlated with chemoresistance. Mechanistically, SCARNA2 via targeting miR-342-3p, upregulates EGFR and BCL2 expression in COAD cells [14]. Furthermore, overexpression of lncRNA SNHG1 has been shown to promote epithelial-mesenchymal transition (EMT) by binding to miR-497/miR-195-5p in COAD cells [15]. Also, lncRNA BDNF-AS was downregulated in COAD patients and served as a tumor suppressor gene. Unsurprisingly, ectopic expression of BDNF-AS suppressed cell proliferation and migration via epigenetically downregulating GSK-3β expression through EZH2 [16]. Moreover, several investigations have considered lncRNAs as therapeutic opportunities in COAD. For instance, it has been demonstrated that overexpression of LINC00152 can promote Fascin actin-bundling protein 1 (FSCN1) expression via sponging miR-632 and miR-185-3p, which consequently leads to proliferation and metastasis in COAD [17]. A recent study has demonstrated that targeting lncRNA FLANC by 1,2-dioleoyl-sn-glycero-3-phosphatidylcholine nanoparticles loaded with a specific small interfering RNA, decreased metastasis without any significant toxicity. They proposed that FLANC may act as a novel therapeutic strategy in COAD [18].

Additionally, many researches have suggested the potency of lncRNAs as biomarkers in the blood and serum. They suggested microvesicles and exosomes as carriers, being protected and stabilized in circulation [19]. In the current study, we comprehensively investigate lncRNAs, miRNAs, and mRNAs expressions from a public database, “Cancer Genome Atlas (TCGA)” and we constructed a ceRNA network in COAD. Also, we proposed novel potential biomarkers for COAD.

Methods

Sample and data collection

Clinical data of COAD were retrieved from the TCGA database (https://portal.gdc.cancer.gov/repository). The inclusion criteria were: (1) the histopathological diagnosis was COAD; (2) having complete demographic data including age, vital status, race, ethnicity, pathological stage, TNM classification, and overall survival time. Totally, 459 COAD were enrolled in this study. Two hundred and thirty participants had age > 68 years and 229 patients had age ≤ 68 and 243 and 216 patients were male and female, respectively. Among 459 patients, only 4 patients were Hispanic or Latino and 271 were non-Hispanic or non-Latino. Two hundred and fourteen patients were white, 29 patients were Black or African American, 11 were Asian and 1 American Indian/Alaska native. Pathological stages of I, II, III, and IV were 76, 178, 129 and 65, respectively. The clinical characteristics are summarized in Table 1.

Table 1 Clinicopathological characteristics of COAD patients

RNA-Seq and miR-Seq data analysis

RNA-Seq and miR-Seq Level 3 data were collected from the TCGA database. The raw count of the reads of RNA-Seq and miR-Seq data was normalized by Voom and TMM normalization methods. All the analyses were conducted in R software. The “limma” package in R software was utilized to indicate the differentially expressed mRNAs (DEmRNAs), lncRNAs (DElncRNAs), and miRNAs (DEmiRNAs) between normal solid tissues and primary tumors. The concluded data were filtered based on the |log2 fold change (FC)| > 1 for DEmRNA, DElncRNA, and DEmiRNA. P-value < 0.05 and false discovery rate (FDR) < 0.05 were considered as significant thresholds.

Functional enrichment analysis and protein-protein interaction (PPI) network

For in silico functional enrichment analysis, gene ontology (GO) in three domains including biological processes, cellular components, and molecular functions, in addition to Kyoto Encyclopedia of Genes and Genomes (KEGG) signaling pathways were conducted. The GO and KEGG outputs were visualized by R software (ggplot2 package). The PPI network was constructed based on the STRING online database by Cytoscape 3.7.2. Molecular Complex Detection (MCODE) was used to analyze and predict the interactions (score value > 0.4).

LncRNA-miRNA-mRNA ceRNA network construction

LncRNA-miRNA-mRNA ceRNA network was constructed by “GDCRNATools” (http://bioconductor.org/packages/devel/ bioc/html/GDCRNATools.html) package in R software based on starbase database [14]. The nodes and edges were virtualized by Cytoscape 3.7.2.

Statistical analysis

All the differentially expressed data were analyzed by using R software (3.5.2) through the “GDCRNATools” package. Kaplan-Meier survival analysis (log-rank test) was used to indicate the relation between over or downregulation of RNA, based on median expression with patient’s survival time. ROC curve analysis, univariate, and multivariate Cox regression analysis were conducted by SPSS v21. P-value < 0.05 was considered as a significant threshold.

Results

Differentially expressed genes

Our data demonstrated that 2995 mRNAs (1094 up-regulated and 1901 down-regulated) were differentially expressed in COAD. Moreover, 205 lncRNAs (128 up-regulated and 77 down-regulated) were identified that were deferentially expressed in patients. Three hundred and forty-five miRNAs containing 183 up-regulated and 162 down-regulated have been found with differential expression in the COAD samples. The data are presented in Figs. 1, 2 and Tables 2, 3.

Fig. 1
figure 1

Bar graph of the differentially expressed genes in the COAD samples. TEC: To be Experimentally Confirmed; TR: T cell receptor; IG: Immunoglobulin

Fig. 2
figure 2

Volcano plot of the differentially expressed genes and miRNAs. a Volcano plot of differentially expressed lncRNAs and mRNAs. Overexpressed genes are demonstrated in red and down-regulated genes are demonstrated in green. b Volcano plot of differentially expressed miRNA. Overexpressed and down-regulated genes are demonstrated in red and green, respectively

Table 2 Top 20 upregulated mRNAs, lncRNAs, and miRNAs
Table 3 Top 20 downregulated mRNAs, lncRNAs, and miRNAs

GO enrichment and KEGG pathway analysis

GO enrichment analysis demonstrated that the differentially expressed mRNAs were enriched in different biological processes such as leukocyte migration, extracellular matrix organization, T cell activation, mitotic nuclear division, and adaptive immune response. Furthermore, GO analysis in cellular component revealed that the differentially expressed mRNAs predominantly contributed to collagen-containing extracellular matrix, basement membrane, microvillus, apical part of cell, and external side of plasma membrane. GO molecular function domain indicated that the genes were mainly enriched in glycosaminoglycan binding, heparin binding, sulfur compound binding, extracellular matrix structural constituent, and cytokine activity. GO outputs are presented in Fig. 3. In addition, KEGG pathway analysis indicated that the differentially expressed genes in the COAD patients remarkably participated in pathways involving in cancer, cell cycle, PPAR signaling pathway, PI3K-Akt signaling pathway, Wnt signaling pathway, and p53 signaling pathway (Fig. 4).

Fig. 3
figure 3

GO enrichment analysis of the differentially expressed mRNAs in COAD (Top 10 GO enrichment are presented)

Fig. 4
figure 4

KEGG signaling pathway analysis of the differentially expressed mRNAs in COAD (Top 20 KEGG terms are presented)

PPI network construction

The PPI network was constructed based on the STRING database to better understand the roles of the differentially expressed mRNAs. The data demonstrated that CDKN2A, CCND1, MYC, E2F, CDK4, BRCA2, CDC25B, and CDKN1A were the protein-protein interaction (PPI) critical hubs (Fig. 5).

Fig. 5
figure 5

Protein-protein interaction (PPI) network of the differentially mRNAs in COAD (score > 0.4) with Node:118, eadge:1745, MCADE score: 29.82

Kaplan-Meier survival analysis of differentially expressed genes

Kaplan-Meier survival analysis was used to indicate the association of differentially expressed mRNAs, lncRNAs, miRNA, and prognosis of COAD patients. The data showed that 215 mRNAs, 14 lncRNAs, and 39 miRNAs were associated with overall survival time in the patients. The top 10 hits of each group are presented in Table 4.

Table 4 Top 10 mRNAs, lncRNAs, and miRNAs that were associated with overall survival

Diagnostic analysis of differentially expressed lncRNAs

AUC analysis was conducted to demonstrate the diagnostic value of each lncRNAs in the COAD samples. All 205 differentially expressed lncRNAs indicated significant diagnostic values. The top 50 hits of the lncRNAs are summarized in Table 5.

Table 5 Top 50 lncRNAs that had significant diagnostic value

Novel lncRNA biomarkers

After merging the overall survival, and the diagnostic value data, we found that 14 lncRNAs had high ranks in prognostic and diagnostic areas which can be considered as COAD biomarkers. The data are presented in Table 6.

Table 6 the lncRNAs as diagnostic and prognostic biomarkers in COAD

Kaplan-Meier and ROC curve analysis were conducted for the top three lncRNAs (AC087388.1, SLC16A1-AS1, and ELFN1-AS1) from aforementioned analysis shown in Fig. 6. Moreover, univariate and multivariate analysis were conducted to demonstrate the power of the lncRNAs and to diminish the covariate effects. Univariate and multivariate analysis are summarized in Table 7.

Fig. 6
figure 6

Kaplan-Meier and ROC curve analysis of the AC087388.1, SLC16A1-AS1, and ELFN1-AS1). a Kaplan-Meier curve of AC087388.1. b Kaplan-Meier curve of SLC16A1-AS. c Kaplan-Meier curve of ELFN1-AS1. d ROC curve of the lncRNAs

Table 7 Univariate and multivariate survival analyses of AC087388.1, SLC16A1-AS1, and ELFN1-AS1

LncRNA-miRNA-mRNA ceRNA network construction

According to ceRNA hypothesis, which implicates that lncRNAs regulate mRNA expression level by competing the shared miRNAs in cells, a ceRNA network was built based on lncRNAs, mRNAs, and miRNAs expression in the samples based on starbase online tool in R software. The nodes and edges were drawn by Cytoscape 3.7.2. The ceRNA network data demonstrated important lncRNAs including MIR17HG, H19, SNHG1, KCNQ1OT1, MALAT1, GAS5, SNHG20, OR2A1-AS1, and MAGI2-AS3, which have implied in the development of COAD (Fig. 7).

Fig. 7
figure 7

LncRNA-miRNA-mRNA ceRNA network construction of the differentially expressed genes in COAD (Red: LncRNA, Yellow: miRNA, and Green: mRNA)

Discussion

LncRNAs regulate critical and canonical biological functions in different types of normal human cells and in a variety of tumor cells [20]. An escalating number of investigations have reported the function of lncRNAs in tumor proliferation, cell invasion and migration, chemotherapy resistance, and stemness capability in tumorigenesis and progression of COAD [21,22,23]. However, the exact underlying mechanisms of lncRNAs in progression of COAD are still unclear. So far, several different biological regulatory functions have been proposed for lncRNAs. Some previous studies have demonstrated that lncRNAs regulate mRNA expression via binding and sponging miRNA known as competing endogenous RNA theory, which generates a new aspect in the lncRNA regulatory mechanism [24, 25]. To the best of our knowledge, only a few investigations have displayed ceRNA networks between lncRNAs and miRNAs in COAD. Thus, a clear image of lncRNAs-miRNAs links still remains uncharacterized. In the current study, we studied the differentially expressed genes including lncRNAs, miRNAs, and mRNAs in the COAD patients based on TCGA database. Gene set enrichment by GO and KEGG signaling pathway identified the differentially expressed genes which were significantly enriched in cell proliferation, differentiation, protein phosphorylation, and signaling pathways. Furthermore, KEGG signaling pathway analysis demonstrated several canonical signaling pathways including Wnt, PI3K/Akt and PPAR signaling pathways that have been shown to contribute in tumor progression [26, 27]. A mounting of evidence has emphasized on Wnt/ β-catenin signaling pathway, promoting tumor growth, invasion and metastasis, and chemoresistance in COAD [28, 29]. For instance, it has been demonstrated that lncRNA H19 overexpression induces the EMT of colorectal cancer (CRC) cells by sponging miR-29b-3p to directly upregulate PGRN and activate Wnt axis [30]. Moreover, the up-regulation of lncRNA colorectal cancer-associated lncRNA (CCAL) promotes CRC progression through suppressing the activator protein 2α (AP-2α) to initiate Wnt/β-catenin signaling pathway [31]. In the present study, the KEGG analysis indicated that the peroxisome proliferator-activated receptor (PPAR) pathway contributes in Wnt signaling. It has been shown that the PPAR signaling pathway reduces cell proliferation and inhibits tumorigenesis in different types of cancers. Down-regulation of PPAR-α has been correlated with poor clinicopathological features of CRC that was remarkably higher in well to moderately differentiated adenocarcinoma than in mucinous adenocarcinoma [32]. In addition, lncRNA TINCR modulates PPAR signaling pathway through binding to miR-107 to up-regulate CD36 in CRC [33]. Recently, the PPAR aberration expression and its prime roles in gastrointestinal tract has been extensively reviewed [34].

It has been shown that PI3K/Akt signaling pathway had prominent roles in carcinogenesis of a variety of cancers particularly COAD. LncRNA AB073614 can take under control CRC growth and invasion by PI3K/Akt signaling pathway [35]. In addition, lncRNA SNHG7 elevated GALNT7 level and induced PI3K/Akt/mTOR pathway by sponging miR-34a in CRC cells [36]. Our ceRNA network data demonstrated important lncRNAs including MIR17HG, H19, SNHG1, KCNQ1OT1, MALAT1, GAS5, SNHG20, OR2A1-AS1, and MAGI2-AS3 which previously have been highlighted in the development of COAD. LncRNA MAGI2-AS3 have been discovered to play a crucial role as a tumor suppressor in breast cancer by targeting Fas/FasL in tumor cells [37]. Moreover, MAGI2-AS3 hampers hepatocellular carcinoma cell growth and its invasion through sponging miR-374b-5p to up-regulate SMG1 axis [38]. On the other hand, overexpression of MAGI2-AS3 has been explained to promote tumor progression by absorbing miR-141/200a and consequently, up-regulating ZEB1 which is an EMT promoting transcription factor, in gastric cancer cells [39]. MAGI2-AS3 up-regulation has also been shown to induce CRC proliferation and migration by modulating miR-3163 through upregulating TMEM106B [40].

LncRNA SNHG1 is a prominent lncRNA that is involved in a variety of cancers. SNHG1 expression is associated with unfavorable overall survival and tumor recurrence in patients with COAD. Moreover, SNHG1 promote cell growth and cell migration via upregulating EZH2 and miR-154a-5p in COAD [41]. LncRNA KCNQ1OT1 can promote EMT by decreasing miRNA-217 expression to upregulate ZEB1 axis in COAD [42]. Furthermore, KCNQ1OT1 has been demonstrated to promote chemoresistance of oxaliplatin by iR-34a/ATG4B pathway and it is associated with poor prognosis in COAD [43]. A previous study showed that lncRNA MALAT1 was remarkably upregulated in COAD cells. MALAT1 can promote metastasis of COAD via RUNX2 as a survival factor in tumor cells [44]. MALAT1 evokes EMT and angiogenesis via sponging miR-1265p to upregulate VEGFA, SLUG, and TWIST [45]. Several investigations demonstrated that lncRNA GAS5 can act as a tumor suppressor gene by different actions. It has been illustrated that GAS5 inhibited angiogenesis and metastasis via regulating Wnt signaling pathway in COAD cells [46]. Finally, lncRNA SNHG20 has been reported overexpressed prominently in CRC tissues in comparison to normal ones. Overexpression of SNHG20 was correlated with poor prognosis in the patients [47]. Although, there are several similar studies, the novelties of the current study include; an extensive exploration of lncRNA, mRNA and miRNA signatures, revealing the diagnostic and prognostic value of lncRNA, and constructing a COAD lncRNA-miRNA-mRNA ceRNA network.

Hence, our data elucidated that, the suggested lncRNAs can be considered as potential promising biomarkers, which could drive tumorigenesis through hijacking canonical signaling pathways in COAD.

Conclusions

Our data highlighted the importance of lncRNA regulatory networks that might provide a promising therapeutic approach for clinical application by considering lncRNA hubs as potential efficient biomarkers.