Tumor Biology

, Volume 36, Issue 9, pp 7175–7183

Long non-coding RNA LINC01296 is a potential prognostic biomarker in patients with colorectal cancer

Research Article

DOI: 10.1007/s13277-015-3448-5

Cite this article as:
Qiu, J. & Yan, J. Tumor Biol. (2015) 36: 7175. doi:10.1007/s13277-015-3448-5

Abstract

Colorectal cancer (CRC), one of the most malignant cancers, is currently the fourth leading cause of cancer deaths worldwide. Recent studies indicated that long non-coding RNAs (lncRNAs) could be robust molecular prognostic biomarkers that can refine the conventional tumor-node-metastasis staging system to predict the outcomes of CRC patients. In this study, the lncRNA expression profiles were analyzed in five datasets (GSE24549, GSE24550, GSE35834, GSE50421, and GSE31737) by probe set reannotation and an lncRNA classification pipeline. Twenty-five lncRNAs were differentially expressed between CRC tissue and tumor-adjacent normal tissue samples. In these 25 lncRNAs, patients with higher expression of LINC01296, LINC00152, and FIRRE showed significantly better overall survival than those with lower expression (P < 0.05), suggesting that these lncRNAs might be associated with prognosis. Multivariate analysis indicated that LINC01296 overexpression was an independent predictor for patients’ prognosis in the test datasets (GSE24549, GSE24550) (P = 0.001) and an independent validation series (GSE39582) (P = 0.027). Our results suggest that LINC01296 could be a novel prognosis biomarker for the diagnosis of CRC.

Keywords

Colorectal cancer lncRNA Biomarker LINC01296 

Introduction

Colorectal cancer (CRC) is currently one of the most common cancers and the fourth leading cause of cancer deaths worldwide [1]. According to an estimate, there are 1.2 million new CRC cases and >600,000 deaths every year, which accounts for ∼8 % of all cancer deaths [2]. The incidence and death rates of CRC have been rapidly increasing over the last few years in Asian countries [3]. In China, the rates are much higher than the worldwide average [4].

Currently, clinicopathologic tumor staging, which is based on the tumor-node-metastasis (TNM) system, is the commonly used prognostic marker of CRC clinical outcomes. However, the TNM staging system is not a reliable predictor of CRC outcome. Histologically identical CRC patients may have totally different disease progression and clinical outcomes owing to their different genetic and epigenetic backgrounds. For example, although most TNM stage II patients with no lymph node metastasis have a better prognosis, one fourth of these patients may still have a high risk for relapse after surgical resection (classified as high-risk stage II patients) [5, 6].

DNA microsatellite instability (MSI) is a phenomenon displayed in most cancers of the colon and rectum; it refers to a clonal change in the number of repeated DNA nucleotide units in microsatellites caused by deletions or insertions, and it occurs in tumors with deficient mismatch repair [6]. MSI has been systematically analyzed for prognostic potential in CRC. The MSI-high (MSI-H) phenotype, which is present in 15 % of CRC, confers a good prognosis and a less aggressive clinical course than the MSI-low (MSI-L) or microsatellite stable (MSS) phenotype [7, 8]. MSI is the hallmark of Lynch syndrome which is an autosomal dominant hereditary syndrome caused by germline mutations in the MLH1, MSH2, MSH6, and PMS2 genes, although it is not solely restricted to hereditary CRC. Therefore, MSI is a marker for better clinical outcomes but appears to be more pronounced for Lynch syndrome [6, 8].

The vast majority of the human genome (98 %) does not code for proteins and gives rise to non-protein-coding RNAs (ncRNAs) [9]. Long non-coding RNAs (lncRNAs) are RNA polymerase II transcripts of >200 nucleotides that lack an open reading frame [10]. lncRNA makes up the biggest class of ncRNAs, with ∼58,000 human lncRNA genes annotated thus far [11]. Unlike the smaller non-coding micoRNAs, the functions of the majority of lncRNAs are not fully clear. However, with the improvement of technology and research in transcriptome profiles, increasing evidence shows that some lncRNAs, which can regulate gene expression at transcriptional, post-transcriptional, and epigenetic levels by interacting with DNA, RNA, and protein [10, 12, 13], play important roles in serial steps of cancer development [12]. These lncRNAs are involved in both oncogenic and tumor-suppressive pathways [14, 15]. Epigenetic studies have shown that lncRNA can predict cancer outcomes and further identify those patients who should require more aggressive treatments [16]. The aberrant expression patterns of lncRNAs can also be used to diagnose cancer or reflect disease prognosis and serve as predictors of patient outcomes. For instance, HOTAIR, a lncRNA located in HOX loci, is highly expressed in human cervical cancer and primary breast tumors, and its high expression level in tumors is a powerful biomarker of poor prognosis and metastases [17, 18].

To identify possible biomarkers for predicting CRC outcomes, we analyzed a cohort of published datasets from the gene expression omnibus (GEO) and investigated the correlation between the expression of some specific lncRNAs and clinical prognostic variables.

Materials and methods

CRC gene expression data from GEO

CRC expression data were obtained from GEO. The datasets were selected using the following criteria: (a) patients had CRC, (b) CRC tissue and tumor-adjacent normal tissue samples were available for comparison, (c) data were obtained from the same platform, and (d) more than three samples existed.

According to these criteria, five datasets (GSE24549, GSE24550, GSE35834, GSE50421, and GSE31737) were chosen (Table 1). These data were obtained from the Affymetrix Human Exon 1.0 ST platform. In the next step, four datasets (GSE24549, GSE24550, GSE35834, and GSE50421) were used in the “leave one dataset out” process [23], and GSE31737 served as an independent dataset to validate the gene signatures derived from the meta-analysis. Among them, information on disease-free survival (DFS), MSI status, and TNM stage of samples were available for GSE24549 and GSE24550. Therefore, these two datasets were used to investigate the correlation between the lncRNA expression profiles and CRC prognosis.
Table 1

Five datasets included in this study

Dataset

No. of tumor

No. of control

Reference

Year published

Platform

GSE31737

40

40

Loo LW et al. [19]

2012

Affymetrix Human Exon 1.0 ST Array

GSE24549

83

13

Sveen A et al. [20]

2011

GSE24550

77

13

Sveen A et al. [20]

2011

GSE35834

30

23

Pizzini S et al. [21]

2014

GSE50421

24

25

Aziz MA et al. [22]

2014

Individual data processing

The raw CEL files of the five datasets were quantile-normalized and background-adjusted using robust multichip average (RMA), which is an effective tool for computing lncRNA profiling data with AltAnalyze software [24, 25]. The normalized data were analyzed with Linear Model for Microarray Data, a modified t test that incorporates the Benjamini–Hochberg multiple-hypotheses correction technique, through R 3.1.1 [26]. The probe sets for which the adjusted P-value was below 0.01 and the expression level differed by a fold change of ≥ 2 between two comparison groups were defined as significantly different probe sets.

Identification of differentially expressed probe sets

A workflow (Fig. 1) for the consistency-based meta-analysis approach was designed to identify the distinctive probe sets as described by Yang et al. [23]. A “leave one dataset out” validation process was used in this approach, and the result was validated in an independent dataset (GSE31737). First, the meta-analysis was replicated four times. One of the four datasets (GSE24549, GSE24550, GSE35834, and GSE50421) was left out each time, and the analysis was performed using the remaining three datasets as one scenario. In each scenario, after aggregating the distinctive probe sets in three datasets, the probe sets differentially expressed in at least two datasets were selected as the probe set signatures of this scenario. The differentially expressed probe sets were then validated in the fourth dataset. Second, the probe sets differentially expressed in at least two scenarios were picked out as the final probe set signatures. An independent dataset (GSE31737) was used for validation.
Fig. 1

Consistency-based meta-analysis process. Four datasets (GSE24549, GSE24550, GSE35834, GSE50421) were shuffled to create four scenarios. In each scenario, three of the datasets were used to generate signatures, and the result was validated in an independent dataset in each scenario. The signatures produced from each scenario were confirmed by the fifth dataset (GSE31737)

Probe set reannotation and lncRNA classification pipeline

To reannotate the probe sets, the sequences of probes were downloaded from the official website of Affymetrix (http://www.affymetrix.com/Auth/analysis/downloads/na25/wtexon/HuEx-1_0-st-v2.probe. fa.zip), and sequence alignment was performed between all of the probes obtained from the four meta-analysis scenarios (as presented in Fig. 1) and the Refseq database by NCBI Blast-2.2.30+. Because each probe set comprises four probes with 25 nucleotides in Affymetrix Human Exon 1.0 ST, probe sets were filtered and reannotated by the following criteria: (a) the probe should perfectly hit its target gene (E-value = 2e-6, Query cover = 100 %, Ident = 100 %, antisense), (b) the probe should not hit more than one target perfectly, (c) four probes in one probe set should hit the same gene, and (d) the accession number of the hit gene should be “NR_” (NR indicates non-coding RNA in the Refseq database). According to the above process, the differentially expressed ncRNA probe sets were identified. Then, to achieve the lncRNA probe sets, only the probe sets whose genes were defined as lncRNAs by NCBI remained.

Data analysis

To inspect the “leave one dataset out” cross-validation result visually, hierarchical clustering analysis (HCA) was performed on the differentially expressed lncRNAs obtained from each scenario and an independent dataset (GSE31737) with Cluster&TreeView [27]. Principal component analysis (PCA) was conducted with the bioconductor package pcaMethods [28]. The samples were grouped by HCA according to similarities in their gene expression profiles, whereas the PCA summarized the most important variables in a dataset as principal components and classified the samples using as few variables as possible [13]. In HCA cluster analysis, the Euclidean distance method was used to cluster arrays.

Survival analysis

The test series (GSE24549, GSE24550) and an independent validation series (GSE39582) were used to identify the correlation between expression levels of specific lncRNAs and CRC prognosis. Dataset GSE39582 which is based on Affymetrix HG-U133 plus 2 platform contained a total of 566 samples with their clinical data, and 541 samples remained after filtering out those clinical data which were not complete [29]. To eliminate individual differences, for each lncRNAs, the expression value was normalized by dividing the average expression value of all of its probe sets by that of GAPDH. The probe sets hitting GAPDH were obtained from NetAffx and filtered by NCBI Blast-2.2.30+. Receiver operating characteristic curves were used to determine the cutoff value of two groups distinguished by the expression level of a specific lncRNA [30]. The method of Kaplan and Meier was used to construct curves with diagnosis of CRC based on lncRNA expression status, and survival curves were compared by log-rank test. To evaluate the association between them, Cox proportional hazards analysis was performed to calculate the hazard ratio and the 95 % confidence interval (CI). In addition, a multivariate Cox regression was measured to identify independent prognostic factors of significance [30, 31]. A two-tailed P-value of 0.05 or less was considered as statistically significant.

Results

Distinctive lncRNA expression pattern between tumor tissue and tumor-adjacent normal tissue samples

Through individual data processing, differentially expressed probe sets of each dataset were obtained (Table 2). After the “leave one dataset out” validation process, distinctive lncRNAs were identified from four scenarios through the probe set reannotation and lncRNA classification pipeline (Table 3). In each scenario, the lncRNA signatures aggregated from any three datasets were validated by an independent dataset. HCA clustering data revealed clear distinctions between CRC tissue and tumor-adjacent normal tissue samples (Supplementary Figs. 1A, 2A, 3A, 4A), which could also be distinguished by PCA (Supplementary Figs. 1B, 2B, 3B, 4B). In scenario I, all samples were perfectly distributed into the CRC tissue and tumor-adjacent normal tissue groups by PC1 = 0 (Supplementary Fig. 1B). In scenario II, 29 of 30 tumor-adjacent normal tissue samples and 29 of 30 tumor tissue samples were distributed by PC1 = 0 (Supplementary Fig. 2B). In scenario III, all samples were perfectly distributed into two groups by PC1 = 7 (Supplementary Fig. 3B). In scenario IV, 76 of 77 tumor tissue samples and all tumor-adjacent normal tissue samples were distributed by PC1 = −6 (Supplementary Fig. 4B). Through aggregation of the four scenarios of lncRNA signatures, 25 lncRNAs were identified, which included 20 upregulated and five downregulated lncRNAs (Table 4). HCA and PCA of all samples in the independent dataset (GSE31737) using the final 25 differentially expressed lncRNAs indicated that most samples could be distributed by PCA1: PCA1 > −2 in 35 of 40 CRC tissue samples and PCA1 < −2 in 36 of 40 tumor-adjacent normal tissue samples. A total of 71 samples could be correctly distributed by PCA1. Thus, CRC tissue could be discriminated from tumor-adjacent normal tissue using these 25 identified lncRNAs with an overall accuracy of 88.8 % (Fig. 2). A total of nine of 80 samples were incorrectly classified based on their lncRNA expression profiles, with an error rate of 11.2 %.
Table 2

Number of probe sets differentially expressed in each dataset

 

Upregulated probe sets

Downregulated probe sets

Differential probe sets

GSE31737

5271

4893

10,164

GSE24549

5923

51,266

57,189

GSE24550

11,185

37,423

48,608

GSE35834

3633

12,105

15,738

GSE50421

16,884

11,210

28,094

Table 3

Number of distinctive lncRNAs in four scenarios

Scenario

Upregulated lncRNAs

Downregulated lncRNAs

Differential lncRNAs

Probe sets of differentially expressed lncRNAs

I

23

22

45

81

II

20

18

38

74

III

22

26

48

87

IV

21

24

45

79

Table 4

Summary of differentially expressed lncRNAs

lncRNA

Expression status

Chromosome

Start

End

Description

BLACAT1

Upregulated

1q32.1

205434886

205456086

Bladder cancer-associated transcript 1

CASC19

Upregulated

8q24.21

  

Cancer susceptibility candidate 19

CASC21

Upregulated

8q24.21

  

Cancer susceptibility candidate 21

CDKN2B-AS1

Downregulated

9p21.3

21994791

22121097

CDKN2B antisense RNA 1

CCAT1

Upregulated

8q24.21

127207382

127219268

Colon cancer-associated transcript 1

CRNDE

Upregulated

16q12.2

54918863

54929189

Colorectal neoplasia differentially expressed

DLEU1

Upregulated

13q14.3

50082169

50528643

Deleted in lymphocytic leukemia 1

ELFN1-AS1

Upregulated

7p22.3

1738630

17442310

ELFN1 antisense RNA 1

FAM83H-AS1

Upregulated

8q24.3

143734140

143746337

FAM83H antisense RNA 1 (head to head)

FIRRE

Upregulated

Xq26.2

131702650

131830643

FIRRE intergenic repeating RNA element

FOXP1-AS1

Downregulated

3p13

  

FOXP1 antisense RNA 1

FOXP4-AS1

Upregulated

6p21.1

41523895

41546255

FOXP4 antisense RNA 1

HAGLR

Downregulated

2q31.2

176173195

176188958

HOXD antisense growth-associated long non-coding RNA

KBTBD11-OT1

Downregulated

8p23.3

  

KBTBD11 overlapping transcript 1

LINC01234

Upregulated

12

113744577

113773683

Long intergenic non-protein coding RNA 1234

LINC01296

Upregulated

14q11.2

19076244

19096796

Long intergenic non-protein coding RNA 1296

LINC00152

Upregulated

2p11.2

87455455

87521518

Long intergenic non-protein coding RNA 152

LINC00858

Upregulated

10q23.1

84279980

84294659

Long intergenic non-protein coding RNA 858

MIR4435-1HG

Upregulated

2q13

111367002

111495115

MIR4435-1 host gene

LOC100190940

Upregulated

12q24.33

130033812

130042342

Uncharacterized LOC100190940

LOC100505938

Upregulated

7p21.3

8237045

8262821

Uncharacterized LOC100505938

LOC145837

Downregulated

15q23

69561720

69571440

Uncharacterized LOC145837

LOC400706

Upregulated

19q13.32

46563469

46580982

Uncharacterized LOC400706

UCA1

Upregulated

19p13.12

15828947

15836321

Urothelial cancer-associated 1

ZFAS1

Upregulated

20q13.13

49278178

49289260

ZNFX1 antisense RNA 1

Fig. 2

Validation of 25 differentially expressed lncRNAs using an independent dataset (GSE31737). In HCA (a), the x axis represents the samples, and lncRNAs are shown on the y axis. Red spots represent upregulated genes, and green spots represent downregulated genes. The sample types are shown with bar colors in the dendrogram; blue stripes represent tumor-adjacent normal samples, and red stripes are tumor samples. In PCA (b), green dots represent tumor-adjacent normal tissue samples, and red dots represent tumor tissue samples. In GSE31737, CRC samples can be distinguished from tumor-adjacent normal tissues by HCA and PCA

Identification of prognostic lncRNAs from 25 lncRNAs through the test series

The correlation between expression levels of differentially expressed lncRNAs and CRC prognosis was analyzed in test series (GSE24549, GSE24550) (Table 5). The probe sets hitting GAPDH and 25 lncRNAs were shown in Table S1. The survival curves of 25 lncRNAs were drawn, and significant differences were identified for three lncRNAs (LINC01296, LINC00152, and FIRRE) (P < 0.05, log-rank test) (Fig. 3). Of the three, the overall survival was most significantly higher for patients with high LINC01296 expression compared with those with low expression (P < 0.01), and the difference was validated in GSE21510. Univariate analysis and multivariate analysis between expression levels of three lncRNAs (LINC01296, LINC00152, and FIRRE) and TNM stage as well as MSI status (Table 5) indicated that LINC01296 was a significant predictor of survival in CRC (P = 0.001) in addition to TNM stage (P = 0.001) and MSI status (P = 0.006) (Table 6).
Table 5

Characteristics of the two independent colorectal cancer sample series

Characteristic

Test seriesa (n = 160)

Validation seriesb (n = 541)

Age at diagnosis

NA

67.0 ± 13.2

Gender

NA

 

 Male

 

300

 Female

 

241

TNM stage

 I

 

36

 II

90

259

 III

70

201

 IV

 

45

Mean follow-up, years (minimum; maximum)

4.58 (0.17; 10)

4.19 (0; 16.75)

Tumor location

NA

 

 Distal

 

327

 Proximal

 

214

MSI

 

NA

 MSI-H

21

 

 MSI-L

16

 

 MSS

100

 

Adjuvant chemotherapy

 Yes

NA

232

 No

83

309

aGEO accession numbers GSE24549 and GSE24550

bGEO accession number GSE39582

Fig. 3

Kaplan–Meier survival curves for three lncRNAs in CRC samples (n = 160) from GSE24549 and GSE24550. Overall survival was significantly different in patients (P < 0.05, log-rank test)

Table 6

Univariate and multivariate analysis of overall survival in colorectal cancer patients (n = 160)

Variables

Univariate analysis

Multivariate analysis

HR

95%CI

P value

HR

95%CI

P value

LINC01296 expression (low/high)

0.512

0.289–0.907

0.022*

0.413

0.244–0.697

0.001*

FIRRE expression (low/high)

0.598

0.319–1.118

0.107

  

0.077

LINC00152 expression (low/high)

0.811

0.476–1.382

0.441

  

0.307

TNM stage (II/III)

2.249

1.314–3.849

0.003*

2.498

1.471–4.244

0.001*

MSI status (MSI-H/MSI-L MSS)

5.775

1.759–18.955

0.004*

5.182

1.608–16.701

0.006*

HR hazard ratio, CI confidence interval

*P < 0.05

Analysis of LINC01296 signature for survival prediction in an independent validation series

To confirm the above conclusion, LINC01296 signature was analyzed in an independent validation series (GSE39582) (Table 5). The raw CEL files were also normalized using RMA, and probe sets hitting GAPDH and LINC01296 were shown in Table S2. Multivariate Cox regression analysis was performed, which included the expression level of LINC01296, age, gender, and other important clinical characteristics such as TNM stage and postoperative treatment as covariables, and the expression level of LINC01296 was evaluated as a continuous variable. The result showed that the expression level of LINC01296 was also significantly associated with DFS which was adjusted by TNM stage, postoperative treatment, and other variables (P = 0.027) (Table 7).
Table 7

Univariate and multivariate Cox regression analysis in the validation series (n = 541)

Variables

Univariate analysis

Multivariate analysis

HR

95%CI

P value

HR

95%CI

P value

LINC01296 expression

0.127

0.022–0.738

0.022*

0.136

0.023–0.798

0.027*

Gender (male/female)

0.682

0.493–0.943

0.021*

0.660

0.480–0.909

0.011*

Age at diagnosis

1.009

0.996–1.022

0.169

  

0.286

TNM stage (I/II/III/IV)

2.384

1.844–3.081

0.000*

2.379

1.889–2.996

0.000*

Tumor location (distal/proximal)

1.338

0.958–1.867

0.087

  

0.117

Adjuvant chemotherapy (yes/no)

0.938

0.655–1.341

0.724

  

0.807

HR hazard ratio, CI confidence interval

*P < 0.05

Discussion

CRC, one of the most malignant cancers, is currently the fourth leading cause of cancer deaths worldwide. Treatment choices of CRC are currently influenced by the TNM staging system of the Union for International Cancer Control. Current TNM criteria, however, cause substantial under- and over-treatment of CRC patients [6]. For instance, adjuvant chemotherapy for node-negative (stage II) colon cancer has been controversial because over-treatment will increase the pain experienced by patients, whereas high-risk stage II patients might benefit from adjuvant therapy. Consequently, there is growing need for new and efficient biomarkers to ensure optimal treatment allocation [6]. Tests for MSI status, a kind of biomarker, might contribute to the risk–benefit assessment of treatment in stage II disease [32]. However, MSI status, which appears to be more pronounced for Lynch syndrome, has some limitations.

Recent epigenetic studies revealed that ncRNAs can distinguish advanced adenoma from normal control tissue, whereas their expression levels do not correlate with TNM stages [33]. Several lncRNAs such as HOTAIR, MALAT1, GAS5, and HULC can predict cancer prognosis when taking the epigenetic background of patients into consideration [17, 18, 34, 35, 36]. Therefore, it is a reasonable hypothesis that lncRNAs can be used as biomarkers to predict cancer prognosis, especially for some high-mortality cancers such as CRC.

A search of PubMed suggested that there have been only eight published studies involved in predicting outcomes of CRC with lncRNA [34, 37, 38, 39, 40, 41, 42, 43]. In most of these studies, differentially expressed lncRNAs were validated from reported cases, which involved other cancers. However, this method was inefficient to select target lncRNAs because of the need for a large number of validation experiments, and it was difficult to identify new and specific lncRNAs for CRC prognosis [34, 39, 40, 41, 43]. In another study reported by Debing Shi et al. microarrays were used to investigate diagnostic markers, but only six pairs of samples (tumors and controls) were analyzed, which reduced the accuracy of the microarray results [42]. Ye Hu et al. recently investigated a diagnostic marker of CRC using three datasets from GEO, which were based on the Affymetrix HG-U133 plus 2 platform [37]. Most of the drawbacks in previous reports can be overcome; however, the Affymetrix HG-U133 plus 2 is a 3′ IVT (in vitro transcription) expression array, which covers only 47,000 transcripts, so a great number of lncRNAs might be missed using this platform. In our current study, we used five datasets based on Affymetrix Human Exon 1.0 ST, which contains 1.4 million probe sets. In total, 355 samples were chosen to identify differentially expressed lncRNAs and their potentially diagnostic for CRC.

Nevertheless, some unexpected situations might have occurred during our analysis. Because annotation of the human genome is refreshed frequently, some probe sets may not hit the identical targets that they were originally designed for. For example, according to the NetAffx Annotation Files, probe set 2322902 cannot hit any gene in the Affymetrix Human Exon 1.0 ST array series. However, it can perfectly hit PADI6 (E-value = 2e-6, Query cover = 100 %, Ident = 100 %, antisense) through blasting with the Refseq database. Therefore, probe set reannotation and the lncRNA classification pipeline is a suitable method to extract expression data for lncRNAs.

According to the above analysis flow, 25 differentially expressed lncRNAs were ultimately identified in CRC tissues via probe set reannotation. Among them, CRNDE, CCAT1, and UCA1, which have been reported to be associated with CRC, were validated in our study [44, 45, 46, 47]. Nevertheless, some known CRC-related lncRNAs were not identified in this study, which may have been caused by the strict filter criteria of probe set reannotation. For example, the hypomethylation of lncRNA H19 may result in the loss of IGF2 imprinting in CRC patients [48]. In fact, seven differentially expressed probe sets of H19 were found in CRC, but all of them were filtered out from our final result because these probe sets also hit the mRNA of spidroin-1-like.

In addition, most of the differentially expressed lncRNAs that we identified were reported to be associated with other cancers (Supplementary Table 3), although the relationship between these lncRNAs and CRC is not clear. For example, LINC00152, an lncRNA located in chromosome 2q11.2, is hypermethylated and downregulated in human hepatocellular carcinoma [49], whereas CDKN2B-AS1 (ANRIL), an lncRNA located in the chromosome 9p21.3, promotes tumor growth by epigenetic silencing of miR-99a/miR-449a and indicates a poor prognosis of gastric cancer [50]. These results suggest that these lncRNAs may play important roles in CRC through similar molecular mechanisms.

Of the 25 differentially expressed lncRNAs that we identified, LINC01296 was shown to be significantly associated with the overall survival of patients with CRC. Kaplan–Meier analysis of overall survival showed that high expression of LINC01296 in tumor tissues could predict a good prognosis in two datasets. The Cox proportional hazards model was adjusted for known prognostic variables such as TNM stage, MSI status, and other clinical characteristics, and the results indicated that LINC01296 was an independent prognostic marker for CRC. These results suggest that LINC01296 is a possible prognostic factor of patients with CRC. It may also be a potential diagnostic marker in patients with CRC. The molecular mechanism of LINC01296 involvement in CRC should be investigated further.

In summary, our results show that the reannotation method is useful for data mining in lncRNA research. A total of 25 differentially expressed lncRNAs were found in CRC samples. Some of them were previously reported to be involved in CRC, and the rest have been implicated in other cancers. Importantly, LINC01296 was identified to be a possible independent diagnostic marker in patients with CRC.

Acknowledgments

This work was supported by grants from the National Basic Research Project of China (2010CB529502 and 2007CB511904), National Natural Science Foundation of China (81471485), and the Key Program for the Fundamental Research of the Science and Technology Commission of Shanghai (11JC1411000).

Conflicts of interest

None

Supplementary material

13277_2015_3448_Fig4_ESM.gif (51 kb)
Fig. S1

Validation of 45 differentially expressed lncRNAs using an independent dataset (GSE50421). In HCA (A), the x axis represents the samples, and lncRNAs are shown on the y axis. Red spots represent upregulated genes, and green spots represent downregulated genes. The sample types are shown with bar colors in the dendrogram; blue stripes represent tumor-adjacent normal samples, and red stripes are tumor samples. In PCA (B), green dots represent tumor-adjacent normal tissue samples, and red dots represent tumor tissue samples. In GSE50421, CRC samples can be distinguished from tumor-adjacent normal tissues by HCA and PCA (GIF 51 kb)

13277_2015_3448_MOESM1_ESM.tif (5 mb)
High resolution image (TIFF 5137 kb)
13277_2015_3448_Fig5_ESM.gif (48 kb)
Fig. S2

Validation of 38 differentially expressed lncRNAs using an independent dataset (GSE35834). In HCA (A), the x axis represents the samples, and lncRNAs are shown on the y axis. Red spots represent upregulated genes, and green spots represent downregulated genes. The sample types are shown with bar colors in the dendrogram; blue stripes represent tumor-adjacent normal samples, and red stripes are tumor samples. In PCA (B), green dots represent tumor-adjacent normal tissue samples, and red dots represent tumor tissue samples. In GSE35834, CRC samples can be distinguished from tumor-adjacent normal tissues by HCA and PCA (GIF 48 kb)

13277_2015_3448_MOESM2_ESM.tif (4.6 mb)
High resolution image (TIFF 4678 kb)
13277_2015_3448_Fig6_ESM.gif (67 kb)
Fig. S3

Validation of 48 differentially expressed lncRNAs using an independent dataset (GSE24549). In HCA (A), the x axis represents the samples, and lncRNAs are shown on the y axis. Red spots represent upregulated genes, and green spots represent downregulated genes. The sample types are shown with bar colors in the dendrogram; blue stripes represent tumor-adjacent normal samples, and red stripes are tumor samples. In PCA (B), green dots represent tumor-adjacent normal tissue samples, and red dots represent tumor tissue samples. In GSE24549, CRC samples can be distinguished from tumor-adjacent normal tissues by HCA and PCA (GIF 67 kb)

13277_2015_3448_MOESM3_ESM.tif (3.1 mb)
High resolution image (TIFF 3164 kb)
13277_2015_3448_Fig7_ESM.gif (62 kb)
Fig. S4

Validation of 45 differentially expressed lncRNAs using an independent dataset (GSE24550). In HCA (A), the x axis represents the samples, and lncRNAs are shown on the y axis. Red spots represent upregulated genes, and green spots represent downregulated genes. The sample types are shown with bar colors in the dendrogram; blue stripes represent tumor-adjacent normal samples, and red stripes are tumor samples. In PCA (B), green dots represent tumor-adjacent normal tissue samples, and red dots represent tumor tissue samples. In GSE24550, CRC samples can be distinguished from tumor-adjacent normal tissues by HCA and PCA (GIF 61 kb)

13277_2015_3448_MOESM4_ESM.tif (3.6 mb)
High resolution image (TIFF 3693 kb)
13277_2015_3448_MOESM5_ESM.doc (28 kb)
ESM 5(DOC 28 kb)
13277_2015_3448_MOESM6_ESM.doc (14 kb)
ESM 6(DOC 13 kb)
13277_2015_3448_MOESM7_ESM.doc (74 kb)
ESM 7(DOC 73 kb)

Copyright information

© International Society of Oncology and BioMarkers (ISOBM) 2015

Authors and Affiliations

  1. 1.Shanghai Children’s Hospital, Shanghai Institute of Medical GeneticsShanghai Jiao Tong University School of MedicineShanghaiChina
  2. 2.Key Laboratory of Embryo Molecular BiologyMinistry of Health of China and Shanghai Key Laboratory of Embryo and Reproduction EngineeringShanghaiChina
  3. 3.State Key Laboratory of Biocherapy/Collaborative Innovation Center for Biotherapy, West China HospitalSichuan UniversityChengduChina

Personalised recommendations