Background

The role of genomes in biological processes has become better understood in recent decades, as researchers have gradually come to recognize the roles of individual transcripts in particular. New high-throughput sequencing technologies have enabled the detection of novel transcripts through increased sensitivity. These recent advances have facilitated more comprehensive and more thorough research into the effects of transcription and translation [1,2,3]. At present, much is understood about messenger RNAs and other RNAs, including transfer RNAs, small nuclear RNAs, small nucleolar RNAs, and micro RNAs, but the roles, types, and biological significance of long non-coding RNAs (lncRNAs) have yet to be elucidated [4,5,6].

Kidney cancer is one of the most prevalent urinary tract cancers in adults. In the United States, a total of 63,900 new cases of kidney and renal pelvis cancers were projected (40,610 and 23,380 for male and female patients, respectively), with an estimated 14,400 deaths (9470 and 4930 for males and females, respectively) in 2017 [7]. With approximately 3% mortality for all cases, the rate continues to soar [7]. In China, 66,800 cases of kidney cancer were newly diagnosed, with a 2.34% mortality rate in 2015 [8]. Histologically, clear cell renal cell carcinoma (ccRCC) is the most widespread kidney cancer subtype, constituting 70% of kidney cancers, followed by kidney renal papillary cell carcinoma (10%) and chromophobe renal cell carcinoma (5%) [9,10,11].

Recently, lncRNAs have been revealed to play a role in tumorigenesis, disease development, and metastasis in ccRCC, in both oncogenic and tumor-suppressing roles that modulate a number of biological and pathological processes [12,13,14,15,16,17]. Nevertheless, scant prognosis-related research has been conducted on lncRNAs in ccRCC, and more lncRNAs are assumed to influence ccRCC progression via their own molecular mechanisms. Thus, the present study aimed to investigate the prognostic significance of differentially-expressed lncRNAs by mining high-throughput RNA-sequencing data from The Cancer Genome Atlas (TCGA). A risk score based on 6 novel lncRNAs exhibited superior prognostic value for ccRCC outcomes.

Methods

Patient cohort from TCGA dataset

RNA sequencing (RNA-Seq) raw counts data (level 3) from ccRCC patients, which were generated using the Illumina HiSeq RNASeq platform, were obtained from the TCGA data portal (https://tcga-data.nci.nih.gov/tcga/). These data corresponded to 539 ccRCC tissues and 72 adjacent non-tumorous renal tissue samples deposited on or before May 31, 2017. The ultimate status of the ccRCC patients in our study was captured as overall survival (OS) data. The average follow-up period was 44.9 months. The data were retrieved from TCGA, which is a community resource project offering data for research; approval from the local ethics committee was not necessary for the current study, as it complied with TCGA publication principles and data use policies.

Assessment of differentially expressed lncRNAs

The ccRCC RNA-Seq data contained 60,483 messenger RNAs, including 13,198 lncRNAs that have been labelled in NCBI (https://www.ncbi.nlm.nih.gov/) or GENCODE databases (http://www.gencodegenes.org/). Differentially expressed lncRNAs were assessed using edgeR and DESeq packages for the R statistical computing environment (using adjusted P < 0.05 and |log2FC| > 2 thresholds, respectively) [18, 19]. The expression level of each lncRNA was assessed using DESeq. The lncRNA expression data were displayed as log2-transformation. The final candidate lncRNAs were determined using the two R packages. Student’s t-tests (SPSS 22.0, IBM Corp., Armonk, NY) were employed to assess differential expression of the 6 candidate lncRNAs for discriminating between ccRCC and non-cancerous kidney tissues.

ccRCC prognosis capabilities based on differentially expressed lncRNAs

The differentially expressed lncRNAs for which relative expression levels were below 1 in more than 10% of all subjects were eliminated from subsequent analyses. Similarly, lncRNAs were excluded if they lacked adequate clinical information. The final prognostic analysis included a total of 530 samples with expression data for 370 lncRNAs. Samples from the TCGA dataset were divided into training and validation sets, which were randomly selected from 530 tumor samples to verify the prognostic risk model.

The prognostic significance of lncRNAs was primarily measured by univariate Cox proportional hazard regression (P < 0.01). Statistically significant indicators, including lncRNAs, were further confirmed via multivariate Cox stepwise regression. Furthermore, the relationships between the expression of these 6 lncRNAs and various clinicopathological features were assessed by Student’s t-tests and Spearman correlation analysis.

Clinical role of the risk score generated by the key lncRNAs

An lncRNA-based prognosis risk score was generated from a linear combination of the expression level multiplied by the regression coefficient acquired from the multivariate Cox regression model (β) with the following formula as previously reported [20, 21]:

$${\text{Risk score }} = \mathop \sum \limits_{n = 1}^{\infty } \left( {e_{n} \;*\;\beta_{n} } \right)$$

The β value is the estimated regression coefficient of the lncRNA derived from the multivariate Cox stepwise regression analysis and e indicates the expression profiles of the lncRNA.

Based on the cumulative distribution curve inflection point of the risk score, ccRCC patients were categorized into high- and low-score cohorts. Univariate and multivariate Cox proportional hazards regression analyses were conducted to further assess the efficacy of this prognostic risk score, and adjustments were made based on risk score, race, sex, age, tumor stage, distant metastasis, lymph node metastasis, neoplasmic cancer status, clinical stage, and tumor grade. Hazard ratios (HRs) with 95% confidence intervals (CIs) were examined. A time-dependent receiver operating characteristic (ROC) curve analysis within 5 years was also performed with the R package survival ROC in order to calculate the prognostic accuracy of the model for time-dependent disease outcomes. Kaplan–Meier (K–M) survival curves were assessed to determine correlations between all parameters (clinical aspects and six-lncRNA-based risk scores) and ccRCC patient OS. A concordance index (C-index) was used to measure the predictive accuracy and discriminative ability of the nomograms.

A ROC curve was used to assess the prognostic effectiveness of the six-lncRNA-based risk scores for clinical progress of ccRCC patients. A two-sided P-value < 0.05 threshold was used to assess corresponding results as statistically significant. SPSS 22.0 (IBM Corp.) was utilized for these statistical analyses.

Different signaling pathways between high- and low-risk groups

Gene set enrichment analysis (GSEA) was carried out using GSEA software (http://www.broadinstitute.org/gsea) with the MSigDB C2 CP canonical pathways gene set collection [22,23,24,25,26,27]. A total of 60,483 genes were imported for GSEA. Gene sets with a nominal P-value less than 0.05 and a false discovery rate (FDR) value less than 0.25 were considered to be significantly enriched. For the most important pathways, protein–protein interaction (PPI) network analysis was also performed using the Search Tool for the Retrieval of Interacting Genes (STRING) database (http://www.string-db.org/) [28, 29]. Differentially expressed genes (DEGs) were identified using the edgeR package with Padj < 0.01 and |log2FC| > 3 [30,31,32,33] between the high- and low-risk score groups for ccRCC and normal kidney samples. The DEG results were rendered as volcano plots and heatmaps. Identified DEGs were used to perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses with the DAVID online tool (http://david.abcc.ncifcrf.gov/) [28, 29].

Validation by Gene Expression Omnibus DataSets and International Cancer Genomics Consortium database

We collected the relevant microarrays from Gene Expression Omnibus (GEO) DataSets to validate the clinical roles of the six lncRNAs, the following search terms were used: (kidney OR nephridium OR renal) AND (“clear cell”) AND (cancer OR carcinoma OR tumor OR neoplas* OR malignan* OR adenocarcinoma OR ccRCC) [28, 34]. Differences in lncRNA expression levels between different groups were assessed using Student’s t-tests. Furthermore, we searched ccRCC dataset through the International Cancer Genomics Consortium (ICGC) database (https://icgc.org/) to verify to verify the effectiveness of prognostic model.

Results

Differentially expressed ccRCC lncRNAs

The analysis of 60,483 TCGA messenger RNAs revealed the differential expression of 13,198 lncRNAs based on the results of the R packages edgeR and DESeq. Significantly differentially expressed lncRNAs (n = 869) were obtained for subsequent prognostic analysis (Fig. 1). Among these 869 lncRNAs, 555 were upregulated and 314 were downregulated.

Fig. 1
figure 1

Differentially expressed lncRNAs analysis. a Differentially expressed lncRNAs identified using the edgeR package. Red and green points indicate upregulated and downregulated DELs, respectively (|log2FC| > 2). b Differentially expressed lncRNAs identified using the DESeq package. The individual datapoints are the same as those in a (|log2FC| > 2). c Overlapping differentially expressed lncRNAs

Assessment of prognosis based on differentially-expressed lncRNAs

After eliminating the samples without adequate associated survival data, we identified 530 cases for diagnostic assessment. The lncRNAs lacking expression data in 10% of the samples were also excluded from the prognosis assessment. Using univariate Cox regression, we discovered that 107 lncRNAs in total displayed prognostic capabilities for ccRCC outcomes (P < 0.01). This conclusion was validated by multivariate Cox regression, and CTA-384D8.35, CTD-2263F21.1, LINC01510, RP11-352G9.1, RP11-395B7.2, and RP11-426C22.4 were confirmed to be independent prognostic biomarkers for ccRCC (Table 1 and Additional file 1: Table S1). In addition, the independent prognostic features of these 6 lncRNAs were shown in Fig. 2 using the Kaplan–Meier survival curves. The original expression differences of these 6 lncRNAs between ccRCC and non-cancerous kidney tissues were also evaluated. Remarkably higher expression levels were noted for CTA-384D8.35, CTD-2263F21.1, RP11-352G9.1, RP11-395B7.2, and RP11-426C22.4, while predominantly lower expression was observed for LINC01510 in ccRCC samples (Fig. 3). The association between the expression of the 6 identified lncRNAs and clinicopathological features were further analyzed by t-test. CTA-384D8.35 expression was related to tumor stage, metastasis, cancer status, clinical stage, and grade; CTD-2263F21.1 expression was related to tumor stage, clinical stage, and grade; LINC01510 expression was related to tumor stage, metastasis, cancer status, clinical stage, and grade; RP11-352G9.1 expression was related to tumor stage, cancer status, clinical stage, and grade; RP11-395B7.2 expression was related to tumor stage, metastasis, cancer status, and clinical stage; RP11-426C22.4 expression was related to tumor stage, cancer status, clinical stage, and grade (all P < 0.05). More importantly, as shown in Table 2 and Figs. 4 and 5, the levels of these 6 lncRNAs predicted the clinical progression of ccRCC.

Table 1 Detailed summary of six prognostic lncRNAs in clear cell renal cell carcinoma (ccRCC)
Fig. 2
figure 2

The independent prognostic features of these 6 lncRNAs. Survival analysis of these 6 lncRNAs was shown with Kaplan–Meier survival curves

Fig. 3
figure 3

Differential expression of the six key lncRNAs between clear cell renal cell carcinoma (ccRCC) and para-tumorous (pT) renal tissues. *P < 0.05; **P < 0.01; ***P < 0.001

Table 2 Association between six lncRNAs and clinical features of clear cell renal cell carcinoma (ccRCC) patients
Fig. 4
figure 4

Association between the expression of key lncRNAs and clinicopathological features in clear cell renal cell carcinoma (ccRCC). Statistically significant differences in the expression of these key lncRNAs were associated with various clinicopathological features: tumor stage (T1/T2 vs. T3/T4), distant metastasis (M0 vs. M1–X), cancer status (tumor free vs. with tumor), clinical stage (I/II vs. III/IV), and grade. The different lncRNAs are arrayed along the x-axis, while the y-axis indicates normalized expression (log2). *P < 0.05; **P < 0.01; ***P < 0.001

Fig. 5
figure 5

Predictive power of six key lncRNAs for clinical progression of clear cell renal cell carcinoma (ccRCC) using receiver operating characteristic (ROC) curves. ROC curves were constructed to evaluate the predicted value of each key lncRNA for cancer progression including advanced tumor stages (T3–4), lymph node metastasis, metastasis, cancer status (with tumor), higher clinical stages (III–IV), and grade (G3–4). The x-axis shows the false positive rate, presented as “100%-Specificity,” while the y-axis indicates the true positive rate, shown as “Sensitivity.” *P < 0.05 for AUC of each lncRNA

Clinical role of the six-lncRNA-based risk score

Next, the six-lncRNA-based risk score for predicting OS was calculated using a formula consisting of the expression level multiplied by the regression coefficient derived from the multivariate Cox regression model (β) values:

$$\begin{aligned} {\text{Risk}}\;{\text{score}} & = 0.527 \times e_{{{\text{CTA{-}}}384{\text{D}}8.35}} + 0.238\times e_{{{\text{CTD{-}2263F21}}. 1}} - 0. 30 4 { } \times e_{{{\text{LINC}}0 1 5 10}} \hfill \\ & \quad + 0. 4 5 9 { } \times e_{{{\text{RP11{-}352G9}}. 1}} + 0.25 { } \times e_{{{\text{RP11{-}395B7}}.2}} - 0. 30 9\times e_{{{\text{RP11{-}426C22}}.4}} \hfill \\ \end{aligned}$$

The ccRCC patients were classified into two cohorts, high- and low-risk groups, according to the cumulative distribution curve inflection point of the six-lncRNA-based risk score (Fig. 6). We gauged the differences in expression levels for these 6 lncRNAs between the high- and low-risk cohorts. Compared with the low-risk group, expression of LINC01510 was lower in the high-risk group, yet the expression of the other 5 lncRNAs was higher in the high-risk group (Fig. 6). K–M curves indicated that the median survival time of patients in the high-risk group was 73.5 months, which was much shorter than that of the low-risk group (112.6 months, P < 0.05; Fig. 7a). Furthermore, the risk score predicted 5-year survival of ccRCC patients across the entire set (AUC at 5 years, 0.683; C-index, 0.853; 95% CI 0.817–0.889). Moreover, the training and validation sets showed similar performance (AUC at 5 years, 0.649 and 0.680, respectively; C-index, 0.822 and 0.891; 95% CI 0.774–0.870 and 0.844–0.938) (Fig. 7). Additionally, the risk score HR generated by univariate Cox regression was 2.372 (95% CI 1.712–3.288, P < 0.001), and multivariate Cox proportional hazards regression analysis demonstrated an accordant HR of 1.693 (95% CI 1.181–2.425, P = 0.004), which confirmed that the six-lncRNA-based risk score was an independent indicator of ccRCC patient survival (Table 3).

Fig. 6
figure 6

Analysis of lncRNA risk score in clear cell renal cell carcinoma (ccRCC) patients. a The entire set (530 tumor samples). b The training set (265 tumor samples). c The validation set (265 tumor samples). Each panel consists of three rows: top row, the low- and high-score group for the lncRNA signature in ccRCC patients; middle row, the survival status and duration of ccRCC cases; bottom row, heatmap showing the expression of the six key lncRNAs. The color, from blue to red shows, low to high expression, respectively

Fig. 7
figure 7

SurvivalROC curve and Kaplan–Meier curves for the six-lncRNA signature in the entire, training, and validation sets. a Kaplan–Meier survival curves showing overall survival outcomes for the high- and low-risk patients. b Time-dependent ROC curve analysis for survival prediction using the six-lncRNA signature

Table 3 Univariate and multivariate Cox analyses for the prognostic value of clinical features in clear cell renal cell carcinoma (ccRCC) patients

Meanwhile, the prognostic value of a diversity of clinicopathological parameters was also explored. The K–M methodology revealed that the age, tumor stage, distant metastasis, cancer status, clinical stage, and grade could predict the outcome (Fig. 8). Some parameters were discovered to exhibit prognostic value through univariate analysis; nevertheless, it was demonstrated by multivariate analysis that age, metastasis, cancer status, and grade appeared statistically significant (Table 3).

Fig. 8
figure 8

Kaplan–Meier survival curves in subgroup analyses according to different clinical factors. a Age (HR = 1.739, P < 0.01); b tumor stage (HR = 3.152, P < 0.001); c metastasis (HR = 3.736, P < 0.001); d cancer status (HR = 5.008, P < 0.001); e clinical stage (HR = 3.85, P < 0.001); f grade (HR = 2.668, P < 0.001)

ROC analysis showed that the six-lncRNA-based risk score could significantly predict tumor progression, including tumor stage (AUC = 0.669, P < 0.001), distant metastasis (AUC = 0.664, P < 0.001), cancer status (AUC = 0.658, P < 0.001), advanced clinical stage (AUC = 0.685, P < 0.001), and grade (AUC = 0.614, P < 0.001). Additionally, associations between risk score and different clinical features were also found (Figs. 9 and 10 and Table 4).

Fig. 9
figure 9

Predictive value of the risk scores for clinical features by receiver operating characteristic (ROC) curves. a Tumor stage (AUC = 0.669, P < 0.001); b distant metastasis (AUC = 0.664, P < 0.001); c cancer status (AUC = 0.658, P < 0.001); d advanced clinical stage (AUC = 0.685, P < 0.001), and e grade (AUC = 0.614, P < 0.001)

Fig. 10
figure 10

Association between the risk score and clinicopathological features in clear cell renal cell carcinoma (ccRCC). Statistically significant differences in risk score are noted for various clinicopathological features: tumor stage, metastasis, cancer status, clinical stage, and grade. *P < 0.05; **P < 0.01; ***P < 0.001

Table 4 Association of the risk score of the six-lncRNA signature with clinical features in clear cell renal cell carcinoma (ccRCC) patients

Functional evaluation of the differentially expressed genes in high- and low-risk groups

Volcano plots and heatmaps of DEGs in high/low-risk score group of ccRCC and normal kidney samples were created (Figs. 11 and 12). GO terms and KEGG pathways are shown in Additional file 2: Table S2, Additional file 3: Table S3, Additional file 4: Table S4, Additional file 5: Table S5 which suggests that different pathways were enriched between the high- and low-risk groups.

Fig. 11
figure 11

Volcano plots of differentially expressed genes (DEGs) in high- and low-risk groups. Volcano plots of DEGs were generated using the edgeR package in R with Padj < 0.01 and |log2FC| > 3. a High-risk score group. b Low-risk score group

Fig. 12
figure 12

Heatmaps of differentially expressed genes (DEGs) in high- and low-risk groups. Heatmaps of DEGs were generated using the edgeR package in R with Padj < 0.01 and |log2FC| > 3. a High-risk score group. b Low-risk score group

GSEA was also performed to investigate related biological processes and signaling pathways [12]. We compared the gene profiles of ccRCC patients in the high- and low-risk groups categorized by the six-lncRNA-based risk score. The gene sets with significantly different expression (FDR < 0.25 and nominal P < 0.005) were used for GSEA. In total, 6 pathways were found to be significantly enriched in the high-risk group, including primary immunodeficiency, olfactory transduction, allograft rejection, autoimmune thyroid disease, and immune network for IgA production. By contrast, GSEA revealed that the gene sets in the low-risk group were enriched in 152 pathways including several cancer related pathways, such as the ERBB signaling pathway, WNT signaling, and the WNT pathway in cancer (Fig. 13). The associated biological pathways are shown in Tables 5 and 6 as assessed by GSEA, as well as in Additional file 2: Table S2, Additional file 3: Table S3, Additional file 4: Table S4, Additional file 5: Table S5. PPI networks were also analyzed for the genes involved in the ‘Renal cell carcinoma pathway,’ and several hub genes, such as PIK3CA, VEGFA, and PIK3CB were noted (Additional file 6: Fig. S1).

Fig. 13
figure 13

Gene set enrichment analysis (GSEA) identifies cancer-related KEGG pathways associated with risk score. GSEA validated the enhanced activity of a the ERBB signaling pathway, b WNT signaling pathway, and c pathway in cancers

Table 5 Pathways enriched in the high-risk group according to gene set enrichment analysis (GSEA)
Table 6 Pathways enriched in the low-risk group according to gene set enrichment analysis (GSEA)

Validation of these lncRNAs using Gene Expression Omnibus DataSets and International Cancer Genomics Consortium (ICGC) database

In total, 4030 items (GSE = 248, GPL = 96) were identified from the GEO DataSets through our searching strategies. The standard process for retrieval and inclusion is shown in Additional file 7: Fig. S2. Some annotation for these 6 lncRNAs was found in the following platforms of GEO DataSets: GPL19615, GPL8841, GPL19197, GPL1707, GPL570, GPL5175, GPL15096, GPL97, and GPL96. Ultimately, only GPL19615 (GSE96574 contained LINC01510), GPL570 (GSE53757, GSE66272, GSE36895, GSE46699, and GSE22541 contained CTA-384D8.35) and GPL96 (GSE781 contained RP11-395B7.2) were included in subsequent analyses. The expression levels of CTA-384D8.35 and LINC01510 from these 6 microarrays were remarkably higher in ccRCC than those in normal controls (CTA-384D8.35: GSE53757 [P < 0.0001], GSE66272 [P = 0.0483], GSE36895 [P = 0.0007], GSE46699 [P = 0.0021]; LINC01510: GSE96574 [P < 0.005]), and the expression of RP11-395B7.2 also showed the same trend (P = 0.183). The AUC value of CTA-384D8.35 was 0.655 for anticipating advanced tumor stage, and CTA-384D8.35 had prognostic value for patients with ccRCC (P = 0.033). These results were consistent with our previous results based on TCGA data (Table 7, Fig. 14).

Table 7 Validation of lncRNA expression in clear cell renal cell carcinoma (ccRCC) based on Gene Expression Omnibus (GEO) data
Fig. 14
figure 14

Validation of lncRNAs in clear cell renal cell carcinoma (ccRCC) based on Gene Expression Omnibus (GEO) data. a Boxplot showing expression of CTA-384D8.35 (GSE53757) in normal and ccRCC tissues. b The association of CTA-384D8.35 expression level with tumor (T) stage was also considered. c ROC curve of CTA-384D8.35 (GSE53757). d Boxplot showing expression of CTA-384D8.35 (GSE66272). e Boxplot showing expression of CTA-384D8.35 (GSE36895). f Boxplot showing expression of CTA-384D8.35 (GSE46699). g Kaplan–Meier survival curves of CTA-384D8.35 (GSE66272). h Boxplot showing expression of RP11-395B7.2 (GSE781). i Boxplot showing expression of LINC01510 (GSE96574)

Renal Cell Cancer (RECA-EU) data was selected from the International Cancer Genomics Consortium (ICGC) database, containing 91 ccRCC tissues and 45 adjacent non-tumorous renal tissue samples. Three of the six lncRNAs were matched, including CTD-2263F21.1, LINC01510 and RP11-426C22.4. Differential expression and prognostic value analysis of these three lncRNAs were performed. The differential expression of these three lncRNAs was meaningful (P < 0.05) and consistent with the results of TCGA. Kaplan–Meier survival curves of CTD-2263F21.1 and RP11-426C22.4 also showed the value of their predicted survival (P < 0.05) (Fig. 15).

Fig. 15
figure 15

a Differential expression of CTD-2263F21.1 between clear cell renal cell carcinoma (ccRCC) and para-tumorous (pT) renal tissues (P < 0.05). b Differential expression of LINC01510 between clear cell renal cell carcinoma (ccRCC) and para-tumorous (pT) renal tissues (P < 0.001). c Differential expression of RP11-426C22.4 between clear cell renal cell carcinoma (ccRCC) and para-tumorous (pT) renal tissues (P < 0.05). d Kaplan–Meier survival curve of CTD-2263F21.1 (P = 0.042). e Kaplan–Meier survival curve of LINC01510 (P = 0.743). f Kaplan–Meier survival curve of RP11-426C22.4 (P = 0.038)

Discussion

This study analyzed TCGA sequencing data to discover effective prognostic biomarkers for ccRCC, which have the potential to guide future clinical and basic medical studies. First, we analyzed the statistical significance of differentially-expressed lncRNAs in ccRCC patients using the R packages edgeR and DESeq, and systematically assessed their prognostic value. Notably, the best prognostic value was achieved using a pool that consisted of 6 lncRNAs (CTA-384D8.35, CTD-2263F21.1, LINC01510, RP11-352G9.1, RP11-395B7.2, and RP11-426C22.4), which were obtained via multivariate Cox regression. The resulting six-lncRNA-based risk score accurately predicted the progression and prognosis of ccRCC. With ccRCC patients classified into high- and low-risk groups, we discovered that differentially-expressed genes in these two groups were dissimilar, and the essential signaling pathways were unique as well (Additional file 8: Fig. S3).

Some ccRCC studies have already utilized lncRNA expression profiling. Similarly, studies on lncRNA interactions with other molecules have been on the rise in recent years. The most frequently used research techniques for assessing lncRNA expression profiles of renal cell carcinoma (RCC) include microarray assays and ChIP-Seq experiments [16, 35,36,37,38]. However, these studies were limited by their small sample sizes and insufficient focal lncRNAs. In 2018, Liu et al. published a paper on a novel lncRNA profile reveals potential prognostic biomarkers in clear cell renal cell carcinoma. The expression profile of 1801 lncRNAs of ccRCC patients was obtained using TCGA RNASeqv2 system [39]. To enable a comprehensive understanding of lncRNAs in ccRCC, the present study mined high-throughput TCGA data from 530 patients and analyzed 13,198 lncRNAs. 869 differentially expressed lncRNAs were assessed using edgeR and DESeq packages, and used for subsequent analysis. In the study of Qu et al. and Liu et al., there were only 51 and 247 differentially expressed lncRNAs, respectively [39, 40].

Several studies have revealed that abnormal expression levels of lncRNAs are correlated with OS, 5-year survival, disease-free survival, disease grade and stage, recurrence, and metastasis. However, each previous study mainly focused on a single lncRNA. For example, an undesirable prognosis for RCC patients was connected with decreased expression of the lncRNAs NONHSAT123350, CADM1-AS1, TCL6, and lnc-ZNF180-2 [41]. Furthermore, increased expression of SPRY4-IT1, RCCRT1, MALAT1, LINC00152, and PVT1 also indicated unsatisfactory results [41]. Owing to the popularity of high-throughput TCGA data, the use of sequencing data was considered an ideal approach to discover novel lncRNAs. Therefore, using multiple statistical methods for prognostic analysis, we found that CTA-384D8.35, CTD-2263F21.1, LINC01510, RP11-352G9.1, RP11-395B7.2, and RP11-426C22.4 were of great prognostic value. More importantly, the pool composed of these 6 lncRNAs was the basis for a risk score that provided a superior means of predicting disease progression and prognosis.

Very recently, Shi et al. [42] used TCGA reads per kilobase of exon model per million mapped reads (RPKM) data to categorize 9669 lncRNAs from 440 kidney cancer patients into a training set (n = 220) and a testing set (n = 220). They discovered that expression of a five-lncRNA signature (consisting of AC069513.4, AC003092.1, CTC-205M6.2, RP11-507K2.3, and U91328.21) was closely associated with kidney cancer patient OS. Using the training set, lncRNAs were identified with a univariate Cox regression model, and these five lncRNAs were closely linked to patient OS. The five-lncRNA-based risk score was confirmed in both the testing set and the entire set. However, the results of Shi et al. [42] were inconsistent with ours as the five lncRNAs in their study did not overlap with the six lncRNAs in ours. However, the analysis that we conducted had the following advantages. First, more samples were included in our study (n = 539). Second, more lncRNAs were annotated (n = 13,198). Third, we simply analyzed those differentially-expressed lncRNAs for their prognostic value. If lncRNAs exerted inconsiderable influences on tumorigenesis, their prognostic value would be diminished. Two of the five lncRNAs reported in the study by Shi et al. [42] (U91328.21 and CTC-205M6.2) showed no remarkable differences in expression between ccRCC and non-cancerous renal tissues (Additional file 9: Fig. S4). The reason for this result may be that the value of RPKM data was not suitable for using edgeR to analyze differentially expressed genes [43]. We investigated the prognostic significance of the six lncRNAs identified in the present study based on the premise that their expression patterns exhibited noticeable differences between cancerous and non-cancerous tissues. Consequently, the six lncRNAs identified in the present study (i.e., CTA-384D8.35, CTD-2263F21.1, LINC01510, RP11-352G9.1, RP11-395B7.2, and RP11-426C22.4) functioned not only at the outset of tumorigenesis but also in tumor progression. Fourth, taking other factors into consideration, we applied multivariate Cox proportional hazards regression analysis to discover novel biomarkers with prognostic value, which guaranteed a more valid and comprehensive result. Fifth, the ccRCC dataset was divided into training and validation sets to verify the prognostic efficacy of the six-lncRNA-based signature. Sixth, using GEO and ICGC datasets for validation, we found a ccRCC-related series consisting of 248 samples from the GEO Datasets. CTA-384D8.35, CTD-2263F21.1 and RP11-426C22.4 had prognostic value for patients with ccRCC, and the clinical value of three lncRNAs (CTA-384D8.35, RP11-395B7.2, and LINC01510) was also partly verified by six microarrays. Lastly, a total of 530 cases of ccRCC were divided into high- and low-risk groups, and differences in pathways between the two groups were also investigated. Moreover, the potential signaling pathways and molecular mechanism in ccRCC were explored for their influences on prognosis.

Through GSEA, it was determined that the six novel lncRNAs may play unique roles in ccRCC via specific signaling pathways. ‘Pathway in cancer’ (321 genes) includes multiple pathways, such as the ‘Renal cell carcinoma pathway’ (49 genes). Hub genes in the ‘Renal cell carcinoma pathway’ based on PPI analysis, such as PIK3CA, VEGFA, and PIK3CB, were noted and have also been observed to play vital roles in ccRCC [44,45,46,47,48,49]. Interestingly, PIK3CA has been identified as a direct target of miR-490-5p and miR-19a in renal carcinoma [44, 45]. VEGFA was the most important trigger for angiogenesis [46], and it was the target of miR-185, which acted as a tumor suppressor in ccRCC [47]. VEGFA was also reported to act as a stimulus of ccRCC cell migration, invasion, and angiogenesis [48]. Thus, these six novel lncRNAs may begin their function by activating genes in the ‘Renal cell carcinoma pathway.’ In addition to the ‘Renal cell carcinoma pathway,’ by modulating the ‘Wnt signaling pathway,’ the lncRNAs CCAT2 and Kindlin‑2 appear to promote clear cell renal cell carcinoma progression [50, 51]. We also found that the top three KEGG pathways for DEGs of patients in the high-risk group included KEGG_PRIMARY_IMMUNODEFICIENCY, KEGG_OLFACTORY_TRANSDUCTION, and KEGG_ALLOGRAFT_REJECTION, while in the low-risk group the three most dominant pathways were KEGG_UBIQUITIN_MEDIATED_PROTEOLYSIS, KEGG_OXIDATIVE_PHOSPHORYLATION, and KEGG_PEROXISOME. There were some identified pathways that differed between the high- and the low-risk groups. As the six lncRNAs that we detected were novel and no relevant research has been conducted on their functions, the above analysis of the signaling pathways offers prospects into future research on their molecular mechanisms.

In many cancers, gene expression signatures and prognostic models have proven to be useful tools for predicting clinical outcomes and prognostic value based on molecular characteristics that drive pathogenesis. For example, Brooks et al. [52] developed a 34-gene subtype predictor to classify ccRCC tumors according to good risk (ccA) and poor risk (ccB) subtypes and built a subtype-inclusive model to predict patient survival outcomes. Their model provides prognostic stratification and improves the established algorithms to assess risk of recurrence and death in patients with non-metastatic ccRCC. However, the detection of 34 indicators presents a significant clinical burden. Additionally, a 16-gene recurrence score (RS) assay was developed and validated previously to predict the risk of disease recurrence in patients with stage I–III RCC after nephrectomy [53]. This study used data from the phase-III adjuvant sunitinib (S-TRAC) trial in high-risk phase-III RCC to provide additional validation of the 16-gene RS assay. The strong prognostic performance of the 16-gene RS assay was confirmed in the S-TRAC study, and the RS assay is now supported by IB level data. However, primary analysis focused on patients with T3 RCC and additional studies are needed to determine if RS predicts adjuvant treatment benefits. The (cell cycle progression) CCP score, based on levels of 31 cell cycle genes and 15 control genes from the tumor, had prognostic value in predicting metastatic progression after resection of organ-confined ccRCC by univariate analysis and multivariate logistic regression modeling [54]. The CCP score also had prognostic utility in a second TCGA renal cancer cohort with M1 metastasis at time of surgery. However, because the study cohort was relatively small, other genes in addition to CCP genes may still provide meaningful prognostic information. Because the assay used here was originally derived from prostate cancer, the ideal ccRCC gene set may differ from the genes evaluated in this study.

Conclusion

In conclusion, by using TCGA data to evaluate lncRNAs from 530 ccRCC patients, we developed an effective six-lncRNA-based risk score, which has potential as a novel prognostic biomarker for ccRCC. However, this clinical finding needs further confirmation. Additionally, the function and molecular mechanisms of these novel lncRNAs also require in vitro and in vivo exploration.