An atlas of gastric PIWI-interacting RNA transcriptomes and their utility for identifying signatures of gastric cancer recurrence
The poor survival and recurrence rate in gastric adenocarcinoma highlights the need for cancer gene discovery. Towards this end, we globally assessed the expression of an emerging class of small non-coding RNAs, called PIWI-interacting RNAs (piRNAs). We analysed the transcriptomes of 358 non-malignant stomach tissue and gastric adenocarcinoma samples, and found that nearly half of the expressed piRNAs were overexpressed in tumours. Our gastric piRNA atlas showed that most piRNAs were embedded in protein-coding sequences rather than known piRNA clusters. Furthermore, we identified a three-piRNA signature associated with recurrence-free survival. In this proof-of-principle study, we demonstrate the potential clinical utility of piRNAs in gastric cancer.
KeywordsPIWI-interacting RNAs Epigenetics PIWI Gastric adenocarcinoma Patient outcome
Gastric adenocarcinoma (GA) has a poor 5-year survival, with high rates of relapse, posing an urgent need for biomarker discovery . Small non-coding RNAs, such as microRNAs, have proven clinical utility, owing to stability in biofluids and formalin-fixed paraffin-embedded material . Recent studies have demonstrated the deregulation of two members of an emerging class of small non-coding RNA, PIWI-interacting RNAs (piRNAs), in a small cohort of GA [3, 4, 5].
GA is one of the cancer types selected for profiling by The Cancer Genome Atlas (TCGA), providing a valuable resource for discovery of new cancer genes . Although piRNAs were not one of the dimensions analysed by TCGA, we were able to generate expression profiles for 38 non-malignant stomach tissue samples and 320 GA samples from raw sequencing data using a custom analysis pipeline. We performed an unbiased, global analysis of the 20,821 piRNAs in the human genome to deduce the relationship of deregulated piRNAs with clinicopathological features, and to evaluate a possible role for piRNAs as prognostic biomarkers.
Materials and methods
A total of 320 GA and 38 non-malignant small RNA sequencing libraries (for processing, see Fig. S1) were obtained from the Cancer Genomics Hub data repository (dbgap project ID 6208). SNP 6.0 copy number profiles were downloaded from (https://tcga-data.nci.nih.gov/tcga/dataAccessMatrix.htm). An additional cohort of 25 GA small RNA sequencing libraries was downloaded from Gene Expression Omnibus (series GSE36968) . Rank-normalized expression of the recurrence-free survival (RFS) signature piRNAs (described later) was extracted from nine additional cancer types with available small RNA sequencing data and RFS follow-up.
Rank-normalized piRNA reads per kilobase of exon per million mapped reads were clustered, using hierarchical and consensus approaches, in GENE-E (http://www.broadinstitute.org/cancer/software/GENE-E/index.html) and GenePattern [8, 9]. Hierarchical clustering was performed using Euclidean distance with average linkage. Consensus clustering analysis was performed using the following parameters: k max = 5; clustering algorithm = hierarchical; distance = Euclidean; resampling iterations = 20.
Differential expression analysis
Differentially expressed piRNAs (non-malignant tissues vs. tumours) were identified using the “Comparative Marker Selection” module implemented in GenePattern [9, 10]. Differential expression was assessed through a signal-to-noise ratio test. The nominal p value was estimated using a permutation test (100,000 permutations), and was corrected using the procedure of Benjamini and Hochberg . The expression fold change was calculated by dividing the mean expression value of tumours by the mean expression value in non-malignant tissues.
Clinical information was obtained from the TCGA data portal. Overall survival (OS) data with at least 1-day follow-up were available for 282 patients. Sixteen patients died of causes other than GA, and were removed from analysis. Log-rank survival analysis was performed on piRNAs expressed in at least two thirds of samples (n = 59 piRNAs) in MATLAB (The MathWorks, Natick, MA, USA); high- and low-expression tertiles were compared. RFS data were available for 240 GA patients (information for the nine additional cancer types assessed for RFS can be found in Table S2). Log-rank survival analysis was performed as for OS. Cox proportional hazard models were evaluated in R (‘survival’ package; R version 3.1.0). Expression values of the piRNAs in the model with the best performance (lowest p value) were transformed into a risk score by multiplying the expression values of each piRNA by their respective Cox proportional hazard coefficient, and then summing their values . Risk scores were ranked, and high- and low-risk tertiles were compared by Kaplan–Meier analysis. In all cases, a p value below 0.05 was considered significant.
Remarkably, half (n = 156) of the expressed piRNAs were significantly differentially expressed in GA as compared with non-malignant stomach tissue. In fact, 45 displayed GA-specific expression, and 18 were exclusively expressed in non-malignant stomach tissue. Most of the remaining 93 deregulated piRNAs were overexpressed, with only seven undergoing underexpression. We further investigated these differentially expressed piRNAs regarding their association with OS or RFS of GA patients.
Recent studies have expanded the function of piRNAs from germline cells to somatic tissues and cancer, including GA [3, 4]. Although efforts have been made to study deregulation of piRNAs in GA, an unbiased analysis of global piRNA expression in gastric tissue was warranted. In this study, we took advantage of the massive sequencing data generated by TCGA by applying a custom analysis pipeline to deduce the piRNA expression patterns in one of the largest cohorts of GA to date.
We detected expression of 312 piRNAs, and remarkably, found that half of these were significantly deregulated in GA. Most of these piRNAs were overexpressed in GA compared with non-malignant stomach tissue, suggesting their importance in GA. Since the function of most piRNAs has not yet been characterized in humans, it is difficult to speculate how their deregulation is mechanistically influencing GA. However, we observed that 70.9 % of these piRNAs were located within protein-coding sequences. Localization of piRNAs within protein-coding sequences has been associated with cis- and trans-regulatory effects on protein-coding transcripts in diverse species [13, 14, 15].
We have demonstrated piRNAs, like other non-coding RNAs [16, 17, 18, 19], are associated with GA patient outcome. FR222326 was significantly associated with OS, and perhaps more impressively, a three-piRNA signature (FR290353, FR064000, FR387750/FR157678) effectively stratified GA patients into low and high risk of recurrence groups. When tested in other cancer types, the RFS signature performed well in colon cancer, suggesting conserved importance to digestive tract malignancies. We did not detect mutations in the RFS-associated piRNA genes; however, we show that DNA copy number is likely one of the genetic mechanisms of deregulation for FR381169, and RFS-signature piRNAs FR290353 and FR064000. (The Illumina HumanMethylation450 BeadChip platform is uninformative for these genes, as they were not covered by any probes.) Although expression of FR064000 did not provide validation in the independent cohort, expression of the remaining RFS-associated piRNAs was able to significantly predict RFS in the TCGA cohort. Although the clinical utility of piRNAs has not yet been defined, it is highly feasible owing to their small size. Other small RNAs, such as microRNAs, are stable in biofluids, circulating tumour cells, and formalin-fixed paraffin-embedded materials . Considering there are 10–25 times more piRNA species (20,000–50,000) than microRNAs (approximately 2,000) , their deregulation is likely at least as relevant. Therefore, piRNAs hold great promise as potential biomarkers.
In summary, we have identified transcribed piRNA loci in non-malignant and malignant stomach tissues, and have characterized malignancy-associated expression patterns of GA. In doing so, we have generated a piRNA transcription atlas of the gastric cancer genome. Furthermore, we use this study as a proof of principle to demonstrate the potential clinical utility of piRNAs in GA patient stratification. We have made the data derived from our analysis publicly available to encourage further investigations of piRNAs in GA.
K.S.S.E. is supported by a Charles Best Canada Graduate Scholarship from the Canadian Institutes of Health Research. Funding support for this work was through research grants to W.L.L. from the Canadian Institutes of Health Research, the Canadian Cancer Society, the Terry Fox Foundation (Canada) and the National Institutes of Heath (USA).
Conflict of interest
The authors declare that they have no conflict of interest.
- 6.The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of gastric adenocarcinoma. Nature. 2014; 513(7517):202–9. doi: 10.1038/nature13480.