Long non-coding RNA LINC01296 is a potential prognostic biomarker in patients with colorectal cancer
- First Online:
- Cite this article as:
- Qiu, J. & Yan, J. Tumor Biol. (2015) 36: 7175. doi:10.1007/s13277-015-3448-5
Colorectal cancer (CRC), one of the most malignant cancers, is currently the fourth leading cause of cancer deaths worldwide. Recent studies indicated that long non-coding RNAs (lncRNAs) could be robust molecular prognostic biomarkers that can refine the conventional tumor-node-metastasis staging system to predict the outcomes of CRC patients. In this study, the lncRNA expression profiles were analyzed in five datasets (GSE24549, GSE24550, GSE35834, GSE50421, and GSE31737) by probe set reannotation and an lncRNA classification pipeline. Twenty-five lncRNAs were differentially expressed between CRC tissue and tumor-adjacent normal tissue samples. In these 25 lncRNAs, patients with higher expression of LINC01296, LINC00152, and FIRRE showed significantly better overall survival than those with lower expression (P < 0.05), suggesting that these lncRNAs might be associated with prognosis. Multivariate analysis indicated that LINC01296 overexpression was an independent predictor for patients’ prognosis in the test datasets (GSE24549, GSE24550) (P = 0.001) and an independent validation series (GSE39582) (P = 0.027). Our results suggest that LINC01296 could be a novel prognosis biomarker for the diagnosis of CRC.
KeywordsColorectal cancer lncRNA Biomarker LINC01296
Colorectal cancer (CRC) is currently one of the most common cancers and the fourth leading cause of cancer deaths worldwide . According to an estimate, there are 1.2 million new CRC cases and >600,000 deaths every year, which accounts for ∼8 % of all cancer deaths . The incidence and death rates of CRC have been rapidly increasing over the last few years in Asian countries . In China, the rates are much higher than the worldwide average .
Currently, clinicopathologic tumor staging, which is based on the tumor-node-metastasis (TNM) system, is the commonly used prognostic marker of CRC clinical outcomes. However, the TNM staging system is not a reliable predictor of CRC outcome. Histologically identical CRC patients may have totally different disease progression and clinical outcomes owing to their different genetic and epigenetic backgrounds. For example, although most TNM stage II patients with no lymph node metastasis have a better prognosis, one fourth of these patients may still have a high risk for relapse after surgical resection (classified as high-risk stage II patients) [5, 6].
DNA microsatellite instability (MSI) is a phenomenon displayed in most cancers of the colon and rectum; it refers to a clonal change in the number of repeated DNA nucleotide units in microsatellites caused by deletions or insertions, and it occurs in tumors with deficient mismatch repair . MSI has been systematically analyzed for prognostic potential in CRC. The MSI-high (MSI-H) phenotype, which is present in 15 % of CRC, confers a good prognosis and a less aggressive clinical course than the MSI-low (MSI-L) or microsatellite stable (MSS) phenotype [7, 8]. MSI is the hallmark of Lynch syndrome which is an autosomal dominant hereditary syndrome caused by germline mutations in the MLH1, MSH2, MSH6, and PMS2 genes, although it is not solely restricted to hereditary CRC. Therefore, MSI is a marker for better clinical outcomes but appears to be more pronounced for Lynch syndrome [6, 8].
The vast majority of the human genome (98 %) does not code for proteins and gives rise to non-protein-coding RNAs (ncRNAs) . Long non-coding RNAs (lncRNAs) are RNA polymerase II transcripts of >200 nucleotides that lack an open reading frame . lncRNA makes up the biggest class of ncRNAs, with ∼58,000 human lncRNA genes annotated thus far . Unlike the smaller non-coding micoRNAs, the functions of the majority of lncRNAs are not fully clear. However, with the improvement of technology and research in transcriptome profiles, increasing evidence shows that some lncRNAs, which can regulate gene expression at transcriptional, post-transcriptional, and epigenetic levels by interacting with DNA, RNA, and protein [10, 12, 13], play important roles in serial steps of cancer development . These lncRNAs are involved in both oncogenic and tumor-suppressive pathways [14, 15]. Epigenetic studies have shown that lncRNA can predict cancer outcomes and further identify those patients who should require more aggressive treatments . The aberrant expression patterns of lncRNAs can also be used to diagnose cancer or reflect disease prognosis and serve as predictors of patient outcomes. For instance, HOTAIR, a lncRNA located in HOX loci, is highly expressed in human cervical cancer and primary breast tumors, and its high expression level in tumors is a powerful biomarker of poor prognosis and metastases [17, 18].
To identify possible biomarkers for predicting CRC outcomes, we analyzed a cohort of published datasets from the gene expression omnibus (GEO) and investigated the correlation between the expression of some specific lncRNAs and clinical prognostic variables.
Materials and methods
CRC gene expression data from GEO
CRC expression data were obtained from GEO. The datasets were selected using the following criteria: (a) patients had CRC, (b) CRC tissue and tumor-adjacent normal tissue samples were available for comparison, (c) data were obtained from the same platform, and (d) more than three samples existed.
Five datasets included in this study
Individual data processing
The raw CEL files of the five datasets were quantile-normalized and background-adjusted using robust multichip average (RMA), which is an effective tool for computing lncRNA profiling data with AltAnalyze software [24, 25]. The normalized data were analyzed with Linear Model for Microarray Data, a modified t test that incorporates the Benjamini–Hochberg multiple-hypotheses correction technique, through R 3.1.1 . The probe sets for which the adjusted P-value was below 0.01 and the expression level differed by a fold change of ≥ 2 between two comparison groups were defined as significantly different probe sets.
Identification of differentially expressed probe sets
Probe set reannotation and lncRNA classification pipeline
To reannotate the probe sets, the sequences of probes were downloaded from the official website of Affymetrix (http://www.affymetrix.com/Auth/analysis/downloads/na25/wtexon/HuEx-1_0-st-v2.probe. fa.zip), and sequence alignment was performed between all of the probes obtained from the four meta-analysis scenarios (as presented in Fig. 1) and the Refseq database by NCBI Blast-2.2.30+. Because each probe set comprises four probes with 25 nucleotides in Affymetrix Human Exon 1.0 ST, probe sets were filtered and reannotated by the following criteria: (a) the probe should perfectly hit its target gene (E-value = 2e-6, Query cover = 100 %, Ident = 100 %, antisense), (b) the probe should not hit more than one target perfectly, (c) four probes in one probe set should hit the same gene, and (d) the accession number of the hit gene should be “NR_” (NR indicates non-coding RNA in the Refseq database). According to the above process, the differentially expressed ncRNA probe sets were identified. Then, to achieve the lncRNA probe sets, only the probe sets whose genes were defined as lncRNAs by NCBI remained.
To inspect the “leave one dataset out” cross-validation result visually, hierarchical clustering analysis (HCA) was performed on the differentially expressed lncRNAs obtained from each scenario and an independent dataset (GSE31737) with Cluster&TreeView . Principal component analysis (PCA) was conducted with the bioconductor package pcaMethods . The samples were grouped by HCA according to similarities in their gene expression profiles, whereas the PCA summarized the most important variables in a dataset as principal components and classified the samples using as few variables as possible . In HCA cluster analysis, the Euclidean distance method was used to cluster arrays.
The test series (GSE24549, GSE24550) and an independent validation series (GSE39582) were used to identify the correlation between expression levels of specific lncRNAs and CRC prognosis. Dataset GSE39582 which is based on Affymetrix HG-U133 plus 2 platform contained a total of 566 samples with their clinical data, and 541 samples remained after filtering out those clinical data which were not complete . To eliminate individual differences, for each lncRNAs, the expression value was normalized by dividing the average expression value of all of its probe sets by that of GAPDH. The probe sets hitting GAPDH were obtained from NetAffx and filtered by NCBI Blast-2.2.30+. Receiver operating characteristic curves were used to determine the cutoff value of two groups distinguished by the expression level of a specific lncRNA . The method of Kaplan and Meier was used to construct curves with diagnosis of CRC based on lncRNA expression status, and survival curves were compared by log-rank test. To evaluate the association between them, Cox proportional hazards analysis was performed to calculate the hazard ratio and the 95 % confidence interval (CI). In addition, a multivariate Cox regression was measured to identify independent prognostic factors of significance [30, 31]. A two-tailed P-value of 0.05 or less was considered as statistically significant.
Distinctive lncRNA expression pattern between tumor tissue and tumor-adjacent normal tissue samples
Number of probe sets differentially expressed in each dataset
Upregulated probe sets
Downregulated probe sets
Differential probe sets
Number of distinctive lncRNAs in four scenarios
Probe sets of differentially expressed lncRNAs
Summary of differentially expressed lncRNAs
Bladder cancer-associated transcript 1
Cancer susceptibility candidate 19
Cancer susceptibility candidate 21
CDKN2B antisense RNA 1
Colon cancer-associated transcript 1
Colorectal neoplasia differentially expressed
Deleted in lymphocytic leukemia 1
ELFN1 antisense RNA 1
FAM83H antisense RNA 1 (head to head)
FIRRE intergenic repeating RNA element
FOXP1 antisense RNA 1
FOXP4 antisense RNA 1
HOXD antisense growth-associated long non-coding RNA
KBTBD11 overlapping transcript 1
Long intergenic non-protein coding RNA 1234
Long intergenic non-protein coding RNA 1296
Long intergenic non-protein coding RNA 152
Long intergenic non-protein coding RNA 858
MIR4435-1 host gene
Urothelial cancer-associated 1
ZNFX1 antisense RNA 1
Identification of prognostic lncRNAs from 25 lncRNAs through the test series
Characteristics of the two independent colorectal cancer sample series
Test seriesa (n = 160)
Validation seriesb (n = 541)
Age at diagnosis
67.0 ± 13.2
Mean follow-up, years (minimum; maximum)
4.58 (0.17; 10)
4.19 (0; 16.75)
Univariate and multivariate analysis of overall survival in colorectal cancer patients (n = 160)
LINC01296 expression (low/high)
FIRRE expression (low/high)
LINC00152 expression (low/high)
TNM stage (II/III)
MSI status (MSI-H/MSI-L MSS)
Analysis of LINC01296 signature for survival prediction in an independent validation series
Univariate and multivariate Cox regression analysis in the validation series (n = 541)
Age at diagnosis
TNM stage (I/II/III/IV)
Tumor location (distal/proximal)
Adjuvant chemotherapy (yes/no)
CRC, one of the most malignant cancers, is currently the fourth leading cause of cancer deaths worldwide. Treatment choices of CRC are currently influenced by the TNM staging system of the Union for International Cancer Control. Current TNM criteria, however, cause substantial under- and over-treatment of CRC patients . For instance, adjuvant chemotherapy for node-negative (stage II) colon cancer has been controversial because over-treatment will increase the pain experienced by patients, whereas high-risk stage II patients might benefit from adjuvant therapy. Consequently, there is growing need for new and efficient biomarkers to ensure optimal treatment allocation . Tests for MSI status, a kind of biomarker, might contribute to the risk–benefit assessment of treatment in stage II disease . However, MSI status, which appears to be more pronounced for Lynch syndrome, has some limitations.
Recent epigenetic studies revealed that ncRNAs can distinguish advanced adenoma from normal control tissue, whereas their expression levels do not correlate with TNM stages . Several lncRNAs such as HOTAIR, MALAT1, GAS5, and HULC can predict cancer prognosis when taking the epigenetic background of patients into consideration [17, 18, 34, 35, 36]. Therefore, it is a reasonable hypothesis that lncRNAs can be used as biomarkers to predict cancer prognosis, especially for some high-mortality cancers such as CRC.
A search of PubMed suggested that there have been only eight published studies involved in predicting outcomes of CRC with lncRNA [34, 37, 38, 39, 40, 41, 42, 43]. In most of these studies, differentially expressed lncRNAs were validated from reported cases, which involved other cancers. However, this method was inefficient to select target lncRNAs because of the need for a large number of validation experiments, and it was difficult to identify new and specific lncRNAs for CRC prognosis [34, 39, 40, 41, 43]. In another study reported by Debing Shi et al. microarrays were used to investigate diagnostic markers, but only six pairs of samples (tumors and controls) were analyzed, which reduced the accuracy of the microarray results . Ye Hu et al. recently investigated a diagnostic marker of CRC using three datasets from GEO, which were based on the Affymetrix HG-U133 plus 2 platform . Most of the drawbacks in previous reports can be overcome; however, the Affymetrix HG-U133 plus 2 is a 3′ IVT (in vitro transcription) expression array, which covers only 47,000 transcripts, so a great number of lncRNAs might be missed using this platform. In our current study, we used five datasets based on Affymetrix Human Exon 1.0 ST, which contains 1.4 million probe sets. In total, 355 samples were chosen to identify differentially expressed lncRNAs and their potentially diagnostic for CRC.
Nevertheless, some unexpected situations might have occurred during our analysis. Because annotation of the human genome is refreshed frequently, some probe sets may not hit the identical targets that they were originally designed for. For example, according to the NetAffx Annotation Files, probe set 2322902 cannot hit any gene in the Affymetrix Human Exon 1.0 ST array series. However, it can perfectly hit PADI6 (E-value = 2e-6, Query cover = 100 %, Ident = 100 %, antisense) through blasting with the Refseq database. Therefore, probe set reannotation and the lncRNA classification pipeline is a suitable method to extract expression data for lncRNAs.
According to the above analysis flow, 25 differentially expressed lncRNAs were ultimately identified in CRC tissues via probe set reannotation. Among them, CRNDE, CCAT1, and UCA1, which have been reported to be associated with CRC, were validated in our study [44, 45, 46, 47]. Nevertheless, some known CRC-related lncRNAs were not identified in this study, which may have been caused by the strict filter criteria of probe set reannotation. For example, the hypomethylation of lncRNA H19 may result in the loss of IGF2 imprinting in CRC patients . In fact, seven differentially expressed probe sets of H19 were found in CRC, but all of them were filtered out from our final result because these probe sets also hit the mRNA of spidroin-1-like.
In addition, most of the differentially expressed lncRNAs that we identified were reported to be associated with other cancers (Supplementary Table 3), although the relationship between these lncRNAs and CRC is not clear. For example, LINC00152, an lncRNA located in chromosome 2q11.2, is hypermethylated and downregulated in human hepatocellular carcinoma , whereas CDKN2B-AS1 (ANRIL), an lncRNA located in the chromosome 9p21.3, promotes tumor growth by epigenetic silencing of miR-99a/miR-449a and indicates a poor prognosis of gastric cancer . These results suggest that these lncRNAs may play important roles in CRC through similar molecular mechanisms.
Of the 25 differentially expressed lncRNAs that we identified, LINC01296 was shown to be significantly associated with the overall survival of patients with CRC. Kaplan–Meier analysis of overall survival showed that high expression of LINC01296 in tumor tissues could predict a good prognosis in two datasets. The Cox proportional hazards model was adjusted for known prognostic variables such as TNM stage, MSI status, and other clinical characteristics, and the results indicated that LINC01296 was an independent prognostic marker for CRC. These results suggest that LINC01296 is a possible prognostic factor of patients with CRC. It may also be a potential diagnostic marker in patients with CRC. The molecular mechanism of LINC01296 involvement in CRC should be investigated further.
In summary, our results show that the reannotation method is useful for data mining in lncRNA research. A total of 25 differentially expressed lncRNAs were found in CRC samples. Some of them were previously reported to be involved in CRC, and the rest have been implicated in other cancers. Importantly, LINC01296 was identified to be a possible independent diagnostic marker in patients with CRC.
This work was supported by grants from the National Basic Research Project of China (2010CB529502 and 2007CB511904), National Natural Science Foundation of China (81471485), and the Key Program for the Fundamental Research of the Science and Technology Commission of Shanghai (11JC1411000).
Conflicts of interest