Introduction

Despite the great efforts and progress in imaging techniques and therapeutical options in the last decades, renal cell carcinoma (RCC) still remains one of the deadliest urogenital cancers worldwide, being the ninth most common neoplasm in the United States and accounting for 2% of global cancer cases (Padala et al. 2020). The incidence of RCC rises especially in higher income regions where accessibility of primary health care and imaging techniques enables early identification of RCC cases (Capitanio et al. 2019). However, more than 50% of cases are discovered incidentally (Escudier et al. 2019); as only a small fraction of patients present with typical symptoms of RCC, and no systemic screening programe has been developed so far (Decastro and McKiernan 2008; Padala et al. 2020). And still, even the growing accessibility of imaging does not cover the diagnostic need, as the locally advanced disease is already present in a notable proportion of patients, with almost 17% harboring distant metastasis (Capitanio et al. 2019). Moreover, a significant portion of patients after successful nephrectomy experience relapse relatively early, up to 2 years after the surgery. Currently recognized prognostic models, namely stage, size, grade, and necrosis (SSIGN) score, the University of California Los Angeles Integrated Staging System (UISS), and molecular models such as ClearCode34 can provide only limited prognostic value. No preference is placed on any of them by official oncology guidelines (Escudier et al. 2019). Assessment of prognosis is currently based mainly on histological and clinical factors, and the pressing clinical need to predict early relapse after nephrectomy stays unmet as no reliable biomarker has been discovered and implemented yet.

Long non-coding RNAs are such potential biomarkers, as their expression is dysregulated in association with many human diseases not only as a secondary effect but also as a causative event (Chi et al. 2019). As potent regulators of gene expression on all levels from chromatin organization all the way to the protein folding and beyond, lncRNAs have a vital impact on cell functioning (Quinn and Chang 2016; Chi et al. 2019). Dysregulation of lncRNA levels affects other cellular components and contributes to development of pathologic conditions but can also serve as a unique snapshot of current situation in any given cell (Chandra Gupta and Nandan Tripathi 2017; Bohosova et al. 2021). In the recent decade, there was a plethora of publications focused on dysregulation of lncRNA in many human diseases, although, predominantly in cancer, including renal cell carcinoma (Chandra Gupta and Nandan Tripathi 2017; Bhan et al. 2017).

In a present study, we aimed to identify diagnostic and prognostic lncRNAs, specifically dysregulated in patients with early relapse after nephrectomy and thus provide potential new tool for diagnosis and prognostication of patients in higher risk of disease progression.

Materials and methods

Samples and patients

Patients diagnosed with renal cell carcinoma were enrolled during their treatments at the Masaryk Memorial Cancer Institute, Brno, the Czech Republic, between 2014 and 2018. Enrollment, sample, and clinical data management were handled according to the Declaration of Helsinki. Signed informed consent has been collected from all the patients before the study enrollment. The study design was approved by the Ethics Committee of the Masaryk Memorial Cancer Institute and Ethics Committee of Masaryk University. From all 259 patients, tumor renal parenchyma and adjacent non-tumor tissue have been taken during the nephrectomy. All tissue was first submerged in RNA-later and then stored at -80 °C until further processing. In Table 1 relevant clinical characteristics for the select cohorts of patients used in sequencing and validation part of the study are shown.

Table 1 Select clinical characteristics of the patient cohorts

Out of the 259 patients enrolled in the study, several (N = 13) patients had to be excluded due to extremely short overall survival after nephrectomy, and death unrelated to the disease relapse (up to ten months from surgery), or invalid clinical data. The remaining 246 patients were stratified into 2 independent cohorts for the sequencing and validation part of the study. The sequencing cohort of 48 patients consisted of two groups which were later compared with each other—early relapse and late-relapse, both groups contained 24 patients (together 96 samples, as we processed both the tumor and non-tumor tissue from each patient). Early-relapse patients were defined as relapses up to 25 months after nephrectomy (median 10,58 months, range 2.41–21.97). Late-relapse group consisted of patients without relapse events for the period of more than 50 months after the nephrectomy. For validation, we used all the remaining patients a divided them again into two groups—early and late-relapse patients. Early-relapse group consisted of all the remaining patients (N = 15) who experienced relapse before the 25-month mark. In the late-relapse group were all the remaining patients (N = 183) who relapsed after 25 months or had not relapsed at all during the 5-year follow-up.

Isolation and quality control

Frozen tissue samples were thawed at the processing site and from each specimen, a 0.5 × 0.5 cm piece was cut and further processed. Total RNA enriched for small RNAs was extracted using mirVana™ miRNA isolation kit (Invitrogen, Waltham, MA, USA) with a minor alteration of the manufacturer’s protocol. Specimens were first homogenized using ceramic beads (Qiagen, Hilden, Germany) along with the Lysis/Binding buffer from the mirVana™ miRNA isolation kit (Invitrogen, Waltham, MA, USA). We then proceeded to the RNA extraction as suggested by the manufacturer. The concentration of the extracted RNA was measured using NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). Extracted RNA was stored at − 80 °C until further processing.

Library preparation and transcriptome sequencing

Prior to the library preparation, the concentration of the samples chosen for the sequencing analysis was measured again for better precision of input amounts of RNA using Qubit™ 2.0 (Invitrogen, Thermo Fisher Scientific, Waltham, MA, USA) fluorometer. The integrity of RNA was determined using Agilent 2200 TapeStation system and RNA ScreenTape (Agilent, Santa Clara, CA, USA). As some samples still contained some residual genomic DNA which would interfere with the sequencing of the desired fragments, we removed it using the DNA-free™ DNA Removal Kit (Invitrogen, Thermo Fisher Scientific, Waltham, MA, USA) according to the manufacturer's protocol. The concentration of the pure RNA was measured again using Qubit™ 2.0 (Invitrogen, Thermo Fisher Scientific, Waltham, MA, USA) fluorometer and then we proceeded to the ribosomal RNA depletion. First, the samples were diluted in 26 ul of nuclease-free water (Qiagen, Hilden, Germany) to achieve 500 ng of total RNA input. For the RNA depletion, we used the RiboCop rRNA Depletion Kit V1.2 (Lexogen, Vienna, Austria) according to the manufacturer's protocol in order to eliminate the ribosomal RNA which would overwhelm the sequencing capacity. The concentration of the purified RNA was measured again using Qubit™ 2.0 (Invitrogen, Thermo Fisher Scientific, Waltham, MA, USA) fluorometer.

We then proceeded to the sequencing library preparation using Ultra™ II Directional RNA Library Prep Kit for Illumina® (New England Biolabs, Ipswich, MA, USA), AM-Pure® XP Beads (Beckman Coulter, Brea, CA, USA), and NEBNext® Multiplex Oligos for Illumina® (Dual Index Primers Set 1) (New England Biolabs, Ipswich, MA, USA). Minor adjustments have been made to the manufacturer's protocol: RNA has been fragmented for a longer time (8 min despite higher RNA quality); incubation with the USER Enzyme has been carried out as a first step in the PCR Enrichment of Adaptor Ligated DNA reaction; this reaction has been also run in the Biometra Optical Thermocycler® (Analytik Jena, Jena, Germany) which allows following the amplification curve real-time and stopping the reaction when the desired amplification signal is achieved. To visualize the amplification, we added 2 ul of EvaGreen® Dye, 20X in Water (Biotium, Fremont, CA, USA) into the PCR reaction mix, and individual tubes were taken out of the machine when the fluorescence reached 5000. Purifying beads volume had to be adjusted as the volume of the mixture was lower before the PCR and higher after the PCR. Final DNA libraries were stored at − 20 °C until further processing.

Quality and quantity of the libraries was measured using the Qubit™ 2.0 (Invitrogen, Thermo Fisher Scientific, Waltham, MA, USA) fluorometer and Agilent 2200 TapeStation system and High Sensitivity D1000 ScreenTape (Agilent, Santa Clara, CA, USA) according to the manufacturer's protocol. All 96 samples have been divided into 8 groups, so 12 patients have been sequenced in each run. Also, we stratified samples semi-randomly, as to have tumor and non-tumor tissue and early as well as late-relapse patients in each run to minimize batch effect. Samples have been pooled equimolar at the 4 nM concentration, which has been rechecked using the Qubit™ 2.0 (Invitrogen, Thermo Fisher Scientific, Waltham, MA, USA) fluorometer. The size of each pool was checked using Agilent 2200 TapeStation system and High Sensitivity D1000 ScreenTape (Agilent, Santa Clara, CA, USA) according to the manufacturer's protocol. Pools containing fragments of undesirable length have been purified using Agencourt AMPure XP magnetic beads (Beckman Coulter, Brea, CA, USA).

Following the Illumina Denature and Dilute protocol, the polls have been denatured and diluted to 1,8 pM final concentration and loaded onto a sequencing cassette from NextSeq 500/550 High Output v2 kit, 75 cycles (Illumina, San Diego, CA, USA) and run according to the manufacturer's protocol.

Data analysis

Raw data from the Illumina NextSeq 500 was converted to fastq using bcl2fastq2 Con-version software (version 2.20.0), and read quality was checked using FastQC (version 0.11.9) (Andrews 2010). Adapter sequences were identified using the Kraken system (version 16-098) (Davis et al. 2013), and poor read ends were removed using Trimmomatic (version 0.39). The 3 'ends with a threshold value less than five and reads shorter than 35 bp have been considered poor and were removed. The modified libraries were mapped with the STAR tool (version 2.7.0d) (Dobin et al. 2013) to the human genome, the sequence of which was downloaded from the GENCODE (version 37) database. During mapping, each reading was allowed to map to up to 20 different locations. Genes were quantified using RSEM software (version 1.3.1) (Li and Dewey 2011), and differentially expressed lncRNAs were identified using the DESeq2 tool (version 1.18.1) (Love et al. 2014) and edgeR package (version 3.30.3). Six samples have failed library preparation resulting in almost no reads, and thus were removed from any subsequent analysis. Another 24 samples were removed due to low number of gene-aligned reads (less than 10 million). Further, 14 samples were removed due to failed DESeq2 normalization and 4 samples were removed due to more issues. Remaining 56 samples were analyzed and the results are shown at the Results section.

PCR validation of the results

Based on the results of sequencing, 20 candidate lncRNAs were selected. Ten candidates were chosen based on the comparison of early and late-relapse patients (prognostic lncRNAs), other ten were chosen based on the comparison of tumor and non-tumor tissue (diagnostic lncRNAs). In diagnostic lncRNAs, we simply took first ten lncRNAs with assays available in our supplier, Thermo Fisher. In prognostic lncRNAs, we took lncRNAs with base mean > 20 (except TSPEAR due to its low padj value we decided to test it as well). When choosing specific assays from our supplier, we decided to take only assays which spanned exons, had best coverage and possibly the highest RefSeq number.

For the validation of the sequencing results, we used the High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems, Thermo Fisher Scientific, Waltham, MA, USA) for reverse transcription. TaqMan™ Gene Expression Master Mix (Applied Biosystems, Thermo Fisher Scientific, Waltham, MA, USA) was used in qPCR run on QuantStudio 12 K Flex Real-Time PCR System (Applied Biosystems, Thermo Fisher Scientific, Waltham, MA, USA) according to the manufacturer's protocol. TaqMan® Gene Expression Assays used for qPCR along with the catalogue number of product (Assay ID) (Applied Biosystems, Thermo Fisher Scientific, Waltham, MA, USA) are listed in Table 2 below. The expression of PPIA was used as a normalization standard and reference gene based on literature search and our previous experience with the expression of PPIA in RCC tissue (Bohosova et al. 2022).

Table 2 Gene Expression Assays used for qPCR along with the catalogue number of product (Assay ID) (Applied Biosystems, Thermo Fisher Scientific, Waltham, MA, USA)

Statistical analysis

For the statistical analysis of the validation phase results, we used normalized expression data which were evaluated using Mann–Whitney U Test, Wilcoxon test, ROC analysis, and Kaplan–Meier analysis (GraphPad Prism 8, GraphPad Software, La Jolla, CA, USA). P values lower than 0,05 were considered statistically significant. Combined ROC curves were prepared using both GraphPad Prism 8 and JMP software (SAS-JMP software, SAS Institute, Cary, North Carolina, USA). All statistically significant lncRNAs in stage and relapse comparison and all diagnostic lncRNAs with AUC > 0.9200 were analyzed in different combinations of the lncRNAs in the panels. Based on the results of combined ROC analysis for several lncRNAs, a diagnostic Dx score has been developed:

$${\text{DxScore }} = {\text{ Intercept }} - \, \left( {{\text{estimate}}_{{{\text{lncRNA1}}}} *{\text{lncRNA1}}} \right) \, - \ldots . - \left( {{\text{estimate}}_{{{\text{lncRNAx}}}} *{\text{lncRNAx}}} \right)$$

Results

lncRNA expression profiles in RCC patients

Next-generation sequencing results were analyzed using the DESeq2 tool. Comparing the tumor and paired non-tumor tissue, we identified 2800 dysregulated lncRNAs (Fig. 1a, b). Twenty most dysregulated lncRNAs are listed in Table 3. However, when comparing early and late responders, we found only three significantly dysregulated lncRNAs in patients with early relapse of the disease (Table 4). In Tables 5 and 6 are listed all the lncRNAs chosen for validation along with their specific performance in sequencing.

Fig. 1
figure 1

a Clastrogram and the heatmap showing 2800 significantly dysregulated lncRNAs in 28 samples of tumor tissue (pink) and s28 non-tumor tissue samples (blue). In the heatmap, red shows higher expression, and blue shows lower expression. lncRNAs with foldchange > 2 and p < 0.05 were considered significant. b Clastrogram and the heatmap showing showing 20 most significantly dysregulated lncRNAs in 28 samples of tumor tissue (pink) and 28 non-tumor tissue samples (blue). In the heatmap, red shows higher expression, and blue shows lower expression. lncRNAs with foldchange > 2 and p < 0.05 were considered significant

Table 3 Twenty most significantly dysregulated lncRNAs in tumor and non-tumor tissue of RCC patients according to the p value and adjusted p value; FC Fold Change in relation to the non-tumor tissue
Table 4 Twenty most significantly dysregulated lncRNAs in early and late-relapse RCC patients according to the p value and adjusted p value; FC fold change in relation to the late-relapse tissue
Table 5 List of candidate lncRNAs chosen for validation based on the comparison of tumor and non-tumor RCC tissue (diagnostic lncRNAs)
Table 6 List of candidate lncRNAs chosen for validation based on the comparison of early- and late-relapsing RCC patients (prognostic lncRNAs)

qPCR validation of sequencing results

Twenty selected candidate lncRNAs were analyzed using qPCR. Their expression levels have been measured in tumor and paired non-tumor tissue of all patients in the validation cohort and Cq values were normalized to the expression of PPIA. Fourteen lncRNAs have been successfully validated as significantly dysregulated in tumor tissue compared to the paired non-tumor tissue. Four lncRNAs (ZNF667-AS1, RMST, COLCA1, and SNHG3) were significantly dysregulated also in patients with early relapse compared to the late-relapsing patients. We also compared patients in lower stages (stage 1 and 2) with patients in higher stages (stage 3 and 4). Six lncRNAs also shown association with stage of the patients. Results of all lncRNAs with statistically significant dysregulation of expression (in any comparison) are shown in Fig. 2. In all validated lncRNAs we also did ROC analysis (Figs. 3 and 4). Understandably, lncRNAs differed in the ability to distinguish a tumor and non-tumor tissue, however, ten of them showed satisfactory AUC above 0.75. All results are summarized in Table 7.

Fig. 2
figure 2

Significantly dysregulated lncRNAs, each graph shows all comparisons analyzed for each lncRNA (tumor vs. non-tumor, stage I + II vs. stage III + IV, early relapse vs. late relapse)

Fig. 3
figure 3

Graph showing ROC curves of candidate lncRNAs dysregulated in tumor tissue compared to non-tumor tissue of RCC patients

Fig. 4
figure 4

Graph showing ROC curves of candidate lncRNAs dysregulated in tumor tissue of patients with or without early relapse (labeled Relapse) of with different stage of the disease (labeled Stage)

Table 7 Significantly dysregulated lncRNAs and their statistical performance in different comparisons

In order to gain better discriminatory results, we combined statistically significantly dysregulated lncRNAs into a diagnostic, and prognostic models. Logistic regression analysis showed that linear combination of AZGP1P1, CDKN2B-AS1, COL18A1, and RMST provided best diagnostic discrimination (AUC 0.9808, sensitivity 95.96%, specificity 90.4%) outperforming other combinations of lncRNAs (Table 8). In stage comparison, the combination of ZNF667-AS1, PVT1, RMST, LINC00955, and TCL6 yielded best results with AUC 0,812, sensitivity 85.71% and specificity 69,41%. And although even if the combination of COLCA1, RMST, SNHG3, and ZNF667-AS1 in relapse comparison did bring higher AUC than RMST, SNHG3, and ZNF667-AS1 alone, it did not outperformed AUC of COLCA1 alone, which makes this lncRNA a superior prognostic biomarker (Fig. 5).

Table 8 Results of combined ROC analysis for lncRNA panels achieving the best diagnostic and prognostic values
Fig. 5
figure 5

Graph showing combined ROC curves of the most significantly dysregulated lncRNAs diagnostic of prognostic setting

Discussion

In this study, we provide valuable results showing set of exceptional diagnostic and prognostic biomarkers feasible in RCC therapy. Our main aim was to sequence transcriptome of a large set of patients and provide lncRNA expression profiles of a novel and unique patient cohort. We identified more than 2800 significantly dysregulated lncRNAs, out of which we chose twenty for further validation on an independent cohort. Moreover, we aimed to validate diagnostic and prognostic value of selected candidate lncRNAs and their association with early relapse and/or stage, which was successful as 14 out of 20 lncRNAs did perform statistically significantly (p < 0.0001 in diagnostic setting, p < 0.05 in prognostic setting). To achieve better discriminatory results, we tested several combinations of lncRNAs and identified panels with the best diagnostic and prognostic value. However, in case of early relapse prediction, even the combination achieving the highest AUC did not outperform lncRNA COLCA1. To our best knowledge, we are the first to show that this lncRNA holds such a prognostic power and is significant not only regarding the early relapse after nephrectomy but in RCC in general, as there is only one other work mentioning COLCA1 in relation to RCC as one of nine redox-related lncRNA associated with overall survival of RCC patients (Qi-Dong et al. 2020).

There are however, several other lncRNAs in our panel, which are not frequent in similar RCC-focused studies. For example AZGP1P1 has been associated only with prognosis in breast cancer (Liu et al. 2019), LINC00462, which has been associated with ferroptosis according to one recent work (Wu et al. 2022b), LINC00955 without any other evidence, RMST, which is known to be involved in the development of other tumors, but not RCC (Chen et al. 2022) and similarly ZNF667-AS1, also known as MORT (Vrba and Futscher 2018).

On the other hand, we also identified several notoriously known lncRNAs typically associated with development of RCC. There is CDKN2B-AS1, known also as ANRIL, is up-regulated in RCC tissue which corresponds with our results (Angenard et al. 2019; Dasgupta et al. 2020; Xie et al. 2021) and there are even some single nucleotide polymorphisms identified which are typically associated with RCC (Li et al. 2014). COL18A1 is downregulated as in our study also in other works (Yang et al. 2018; Angenard et al. 2019; Liu et al. 2022) and its expression seems to be specific for chromophobe RCC (He et al. 2016). Dysregulation of LINC00472 along with PVT1 and LUCAT1 has been described in our previous work (Bohosova et al. 2022). LINC00472 has been mentioned also in other works as downregulated in RCC and associated with progression of the disease. Even some mechanisms of LINC00472 functioning have been elucidated (Gao and Wang 2021; Xiang et al. 2021; Wang et al. 2022). PVT1 and LUCAT1 are very well-described lncRNAs in RCC as their impact on overall survival has been shown among several other works also in meta-analysis from Wang et al. (Wang et al. 2019). Involvement of PVT1 into the development of RCC has been thoroughly reviewed also in Bohosova et al. (2021). Similarly, TCL6 has also been shown to be involved in development of RCC as a tumor suppressor (Rysz et al. 2022) and interestingly, artificial increase of TCL6 in cancer cells sensitizes them to paclitaxel, which could pose a novel therapeutical opportunity (Chen et al. 2020).

Although most results correspond with other works, there were some discrepancies in the direction of change in expression, namely in SNHG3 and LINC00887. While SNHG3, an autophagy-related lncRNA (Xuan et al. 2021), was downregulated in our study, other works show its upregulation in RCC (Zhang et al. 2019; Yang et al. 2020; Xu et al. 2021). Similarly, LINC00887 was downregulated in our study and up-regulated in study of Wu et al. (Wu et al. 2022a). However, the main drawback of current state of knowledge regarding the lncRNA dysregulation in RCC lies in the sequencing data, as in majority of works the same set of TCGA sequencing data was analyzed. Therefore, high degree of inter-rater reliability is expected, while some discrepancies between TCGA and our independent cohort are inevitable.

In conclusion, we identified distinct expression profiles of lncRNA in renal tumor tissue and based on our exploratory data, we successfully validated several highly reliable diagnostic and prognostic tissue biomarkers. Moreover, combination of lncRNAs with the best statistical values provided even better discriminatory values. Our study shows that besides diagnosis and overall survival prognostication, lncRNAs can discern also between patients with different length of relapse-free survival and thus should be considered for further independent validation in larger cohorts.