Background

In developed countries, prostate cancer (PCa) is the most commonly diagnosed non-skin cancer in men and a leading cause of cancer death [1, 2]. Data compiled in 2010 for Canadian men, for example, showed a 14.1% lifetime risk for developing PCa, and a 3.5% probability of dying from PCa [3]. As such, PCa is second only to lung and bronchus for cancer-related mortality in men.

In PCa patients who have had their prostate removed, biochemical recurrence (BCR) of blood prostate-specific antigen (PSA) occurs in ~ 20–40% of cases within 10–15 years [4,5,6]. BCR is commonly used as a surrogate for metastatic PCa, since it is a necessary antecedent [7]. However, BCR can occur without radiological evidence of metastasis, and the reported PCa-specific mortality in patients with BCR is only 19–45% by 10–15 years [4, 6, 8]. Prostate tissue is androgen-sensitive, and treatments for metastatic PCa typically include androgen-deprivation. While initially successful, within 2 years ~ 70% of metastatic PCa becomes resistant to androgen deprivation (castration-resistant PCa [CRPC]), leading, almost invariably, to PCa-specific mortality (reviewed in [9]). Given the heterogeneous outcome in patients diagnosed with localized PCa, risk stratification of newly diagnosed patients with non-metastatic disease at presentation is critical to inform clinical management, and treatment options include observation and one or more of radical prostatectomy, radiotherapy, and androgen deprivation therapy [10, 11].

Twenty years ago, D’Amico and colleagues [12] proposed a classification scheme assessing risk for BCR following radical prostatectomy based on PSA, Gleason score and tumour category. Low risk was defined as PSA < 10 ng/ml, Gleason score < 6, and tumour T1-T2a; intermediate risk as PSA 10–20 ng/ml, and/or Gleason score 7, and/or T2b; and high risk as PSA > 20 ng/ml and/or Gleason score 8–10 and/or > T2c. Tumour sub-staging has since been abolished by the AJCC Eighth Edition and by the International Society for Urologic Pathology (https://www.ncbi.nlm.nih.gov/pubmed/27251951). However, tumours can still be stratified within groups; for example, patients with Gleason score 7 tumours are frequently sub-classified into 3 + 4 = 7 (3, primary, 4; secondary, patterns), or 4 + 3 = 7, since 3 + 4 = 7 has a 3-fold lower risk of lethal outcome than 4 + 3 = 7 [13]. Despite these refinements, significant heterogeneity is apparent in the observed incidence of BCR and the more clinically relevant PCa-specific mortality. Heterogeneity is particularly evident for intermediate risk patients, who comprise the largest group [14, 15]. Enhanced discrimination amongst these patients would facilitate more tailored management. As such, new biomarkers are required to accurately identify patients who are at a higher risk for aggressive disease and therefore merit more aggressive treatment, while at the same time allowing for patients with more indolent disease to be observed. Of particular interest are biomolecule markers isolated from tumours, blood, or urine, including proteins, coding and non-coding RNA, and genetic and epigenetic modifications. Unbiased co-discovery of molecules that act in shared cellular pathways might also suggest particularly vulnerable cell signaling pathways to target for therapeutic intervention.

Next generation sequencing with public sharing of data has yielded an enormous amount of genomic and transcriptomic information that is now available to PCa researchers worldwide [16, 17]. An unprecedented and still emerging picture of genome alterations, epigenetic landscape, and gene and protein expression in tumours and normal tissue has enabled the development of a number of PCa prognostic systems (reviewed in [18,19,20]). Interestingly, there is not much overlap between most previously identified biomarkers [20]. This may reflect, in part, different strategies for biomarker discovery. For example, some studies have queried restricted groups of genes (e.g., a subset of cell cycle genes [21], annotated PCa-associated genes [22], miRNA-regulated transcripts [23]), or different cell types (e.g. stromal cells [24]).

In this study, we exploit recently developed web-based tools and publicly available PCa transcriptome data to perform an unbiased query of multiple datasets, focusing on Gleason score 7 PCa. We identify 12 transcripts that predicted BCR (1 downregulated, 11 upregulated in tumours). Ten of these transcripts remained prognostic for poor outcome in patients with Gleason score 3 + 4 = 7 PCa. Importantly, this suggests a possible strategy to identify higher risk patients within a group that is, overall, considered at a low to intermediate risk for poor outcome. Multivariable logistic regression analysis within the 3 + 4 = 7 group showed combining either UBE2C or CCNB1, which are elevated in PCa, with SRD5A2, which is decreased in PCa, increased the prognostic power over and above any of the 3 transcripts alone. Ten of the 11 Gleason 7 (and all of the Gleason 3 + 4 = 7 sub-group) upregulated biomarkers are mitosis and cell cycle related genes that are also part of an E2F1 transcription-factor regulated cell cycle network that predicts lethal metastatic CRPC [25]. Therefore, in addition to predicting BCR, they are prognostic for PCa-specific mortality. Future studies comparing the performance of these transcripts with other biomarkers in new datasets will help determine their robustness in predicting, and potentially treating, non-indolent PCa.

Methods

A set of test transcripts was identified in an unbiased, stepwise fashion using 3 free, publicly available web-based data visualization and analysis tools that enable rapid genome-wide screening (Fig. 1a). In step 1, Cancer RNA-Seq Nexus (CRN) TCGA-PRAD data (http://syslab4.nchu.edu.tw/) [26] was used to identify genes that had at least one isoform with altered transcript abundance in tumours, compared to normal controls (p < 1.0E-04). DAVID 6.8 Gene Name Batch Viewer (https://david.ncifcrf.gov/list.jsp) [27, 28] was used to remove duplicates. In step 2, genes identified in step 1 were queried for patient BCR status, tumour transcript abundance (z-score threshold of 2.0), and tumour Gleason score using cBioPortal, (http://www.cbioportal.org/) [29, 30] MSKCC 2010 data [31]. MSKCC data was used since it is the only cBioPortal prostate cancer dataset in which tumour transcript abundance is compared to normal tissue (other datasets compare patient tumour sample transcript abundance to all tumours that are diploid). In this cohort, 131 patients had primary tumour transcript data available, with 27 of these patients experiencing BCR. Of the 27, 14 had Gleason score 7 tumours and 4 had PSA 10–20 ng/ml. Taking into account the limited number of patients who experienced BCR, we chose to screen transcript abundance in patients with Gleason score 7 tumours. Batches of oncoprints for 50 genes were examined manually to identify transcripts that were increased or decreased in Gleason score 7 patients with BCR, compared to Gleason score 7 patients who remained free of BCR (Fig. 1b shows MELK as an example). In step 3, transcripts identified in Step 2 were then examined using camcAPP (http://bioinformatics.cruk.cam.ac.uk/apps/camcAPP/) [32]. Kaplan-Meier plots examining the relationship between transcript abundance and time to BCR were generated for each transcript using Cambridge, Stockholm, [33] and MSKCC [31] data (Fig. 1c shows MELK as an example). Genes for which the time to BCR could be separated by recursive partitioning [34] into distinct transcript abundance groups for all three datasets were designated the test set.

Fig. 1
figure 1

Rapid generation of a transcript test set using 3 web-based tools: CRN, cBioPortal, and camcAPP. a) Outline of steps used to generate the transcript test set. b) cBioPortal oncoprint (step 2) example (MELK) indicates 36% of patients with Gleason score 7 tumours who experienced biochemical recurrence (BCR) had increased MELK transcript. In contrast, only 7% of those who remained BCR-free had increased transcript. Patients (individual bars) are aligned within the three rows. nc- no difference in tumour transcript abundance compared with normal tissue. c) camcAPP Kaplan-Meier plots (step 3) example (MELK) shows higher MELK expression is associated with shorter time to BCR in all three datasets. Grouping of samples determined by recursive partitioning (Hothorn et al., 2006). b, c) MSKCC data from Taylor et al., 2010. c) Cambridge and Stockholm data from Ross-Adams et al., 2015

Expression (transcript per million; TPM) of test genes was examined in TCGA-PRAD normal tissues (n = 52), and tumours for which transcript abundance, BCR status, and reviewed Gleason score were available (n = 285). Details of sample acquisition and processing following radical prostatectomy at authorized tissue source sites are provided by TCGA-PRAD [35]. Reviewed Gleason score (assigned by TCGA genitourinary pathologists on prostatectomy specimens) was used to avoid possible collection site variability in scoring. Samples with zero or near zero (TPM < 1.6E-07) transcript abundance were removed from the analysis; these included 1 normal for DLGAP5 and CDK1, 3 normals for E2F2 and MELK, and 5 normals and 6 tumours for CDKN3. Univariate logistic regression (SAS 9.4) was performed to determine odds ratios (ORs), and ROC-AUC was calculated for each of the test transcripts, as well as for Gleason score and pre-operative PSA using easyROC http://www.biosoft.hacettepe.edu.tr/easyROC/ [36]. Multivariable logistic regression was performed on transcripts with increased abundance and ORs having p values < 0.05, and SRD5A2, which had decreased abundance, in Gleason score 7 (n = 158), Gleason score 4 + 3 = 7 (n = 70) and Gleason score 3 + 4 = 7 (n = 88) tumours. The multivariable model was restricted to include only 2 independent variables due to the limited number (n=13) of BCR events in patients with Gleason score 7 tumours. Pre-operative PSA concentration was only available for a subset of patients (88/156) that included only 4 patients with BCR, so was not included in the multivariable analysis. Future studies that include more samples with longer term follow-up (and hence, more BCR) will be helpful to test the independent contributions of PSA in patients with Gleason score 7 tumours.

Transcripts identified in the current study were examined for overlap with previously reported prognostic transcript signatures [21,22,23, 25, 33, 37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56]. Gene names were standardized using GeneCards http://www.genecards.org/ v4.5.0 Build 38, except for TSBP, for which 2 aliases were listed in GeneCards (CTR9, c6orf10), and Gene Ontology (GO terms) were determined using DAVID 6.8 https://david.ncifcrf.gov/home.jsp [27, 28]. Proportional Venn diagrams were generated using eulerAPE http://www.eulerdiagrams.org/eulerAPE [57].

Results

A comparison of prostate cancer primary tumours (n = 497) vs. normal prostatic tissue samples (n = 52) revealed altered transcript abundance of at least one isoform of 8187 genes (Additional file 1: Table S1). Manual examination of these 8187 genes in MSKCC Gleason score 7 patients using cBioPortal oncoprints identified 1816 genes with differences in expression between patients without BCR (n = 60) and those with BCR (n = 14) (Additional file 1: Table S2). For each of the 1816 genes identified by the above algorithm, the Cambridge (n = 111), Stockholm (n = 92), and MSKCC (n = 140) datasets were interrogated for any correlation between transcript abundance and BCR-free survival across all Gleason scores. The level of expression of twenty-two individual genes predicted BCR-free survival in all three of the patient cohorts (Table 1). Two of these genes were eliminated from the test set due to discordance in the expression differential between data sets. In particular, for ANKMY1, higher BCR-free survival was observed in association with lower expression in the Cambridge and MSKCC datasets, but with higher expression in the Stockholm dataset. In the case of ALDH1A2, there were discordant results between the MSKCC cohort of Gleason score 7 patients queried using cBioPortal, and the group of patients in the Cambridge, Stockholm, and MSKCC datasets examined using camcAPP. The final test set of 20 transcripts included 13 with higher expression in tumour vs. normal tissue that were positively associated with BCR, and 7 with decreased expression in tumours that were negatively associated with BCR (Table 1).

Table 1 Test set of 20 transcripts that predicted time until BCR

When examined across Gleason scores 6 through 10, 14 of 20 test set genes yielded an OR, and 17 an ROC-AUC, that was prognostic (p < .05) for BCR in the TCGA-PRAD validation dataset (n = 285) (Additional file 1: Table S3). For half of the genes, the AUC was greater than for Gleason score or pre-operative PSA, examined in the same group of patients (Additional file 2). Within the group of Gleason score 7 patients in the TCGA-PRAD dataset (n = 158), 12 transcripts remained prognostic (Table 2, Fig. 2), with 1 negatively associated with BCR (SRD5A2) and 11 positively associated with BCR (BUB1, TPX2, NCAPG, UBE2C, MELK, CCNA2, CCNB1,CDK1, E2F2, DLGAP5, TMEM206,). Ten of the 11 transcripts positively associated with BCR encode proteins that are functionally annotated as related to mitotic and cell cycle (GO:0000082, GO:0051726), cell division (GO:0051301), or cell proliferation (GO:0008283). The proteins they encode also participate in an E2F1-dependent cell cycle network associated with CRPC, and 4 of the 11 (BUB1, NCAPG, CDK1, MELK) are implicated in miRNA (MIR145-3P)-dependent CRPC (Fig. 3). The 11th transcript encodes a membrane protein (TMEM206) that has recently been identified as important to control cell proliferation in colorectal cancer cells [58].

Table 2 Univariate logistic regression and ROC-AUC analyses for predicting BCR in patients with Gleason score 7 tumours
Fig. 2
figure 2

Receiver Operating Characteristic (ROC) curves for transcripts predicting biochemical recurrence (BCR) in patients with Gleason score 7 tumours. (n = 158). BCR predicted by increased (BUB1, NCAPG, TPX2, UBE2C, CCNA2, CCNB1, CDK1, MELK, DLGAP5, E2F2, TMEM206), or decreased (SRD5A2) abundance of transcripts. See Table 2 for details. AUC-Area under the curve. Data from TCGA-PRAD (TCGA, 2015)

Fig. 3
figure 3

The positive predictors of BCR in Gleason score 7 tumours also predict castration-resistant prostate cancer (CRPC). Ten of 11 positive predictors identified in this study encode cell cycle and mitosis proteins that are part of a HES6-associated E2F1-dependent signature of castration-resistant prostate cancer (CRPC). Four of the 10 are targeted by the miRNA miR-145-3p in CRPC. Total of numbers within or adjacent to each circle are the number of prognostic transcripts in each dataset

Ten of 12 transcripts that were prognostic in Gleason score 7 tumours remained prognostic in patients with Gleason 3 + 4 = 7 tumours (Table 3), and multivariable logistic regression analysis showed that combining CCNB1 or UBE2C with SRD5A2 slightly improved the predictive power of the model (Additional file 1: Table S4). In low risk (Gleason score 6) tumours, 9 transcripts remained prognostic, although the limited number of cases of BCR (n=3) in this group may limit its predictive power (Additional file 1: Table S5). Interestingly, in Gleason 4 + 3 = 7, only TMEM206 was prognostic (Additional file 1: Table S6). The reason for the discrepancy between results for Gleason 3 + 4 = 7 and 4 + 3 = 7 is unclear. One possibility is that, with the exception of MELK, transcript abundance of positive prognosticators was higher in Gleason 3 + 4 = 7 than in Gleason 4 + 3 = 7 in patients with BCR (Additional file 1: Table S7). Future studies examining transcript abundance within regions of different pattern within the tumour may be helpful to explain these findings.

Table 3 Univariate logistic regression and ROC-AUC analyses for predicting BCR in patients with Gleason score 3 + 4 = 7 tumours

Discussion

PCa is a common disease with a heterogeneous clinical outcome that is difficult to predict using available risk stratification tools. Treatments such as prostatectomy, radiotherapy and androgen deprivation extend lifespan and improve the quality of life in patients with aggressive disease, but also cause unnecessary morbidity and loss of quality of life in those with indolent disease. Thus, a key goal in disease management is to develop biomarkers that accurately predict outcome. Variable outcome is especially evident in “intermediate risk” patients, making clinical decision-making particularly difficult within this group. To identify biomarkers that may be helpful for predicting outcome in this group, we took the approach of broadly querying publicly available transcriptome data using multiple distinct visualization and analysis data portals. We identify 12 genes, the expression of which predicted poor outcome (BCR) in patients with Gleason score 7 tumours. Eleven of the twelve genes have been identified as prognostic for PCa outcome in at least one other study (Additional file 1: Table S8), portending robustness. Importantly, 10/12 of the genes remained prognostic in Gleason score 3 + 4 = 7 tumours. Since Gleason 3 + 4 = 7 tumours are often projected to have a more favourable prognosis [59], the identification of transcripts that may stratify this group into higher and lower risk of poor outcome may be particularly helpful for clinical decision-making.

The top performer we identified in Gleason score 7 tumours, including Gleason 3 + 4 = 7, was BUB1. BUB1 encodes a mitotic checkpoint Ser/Thr kinase that has recently been implicated as a key regulator of prostate cancer progression [60]. TPX2 was also a top prognosticator and is a microtubule-associated protein that stimulates Ran-GTP-dependent microtubule nucleation and regulates Aurora A kinase during mitosis and cell cycle progression [61, 62]. Elevated expression of TPX2 is a common finding in human cancers, including prostate, and overexpression in vitro increases invasion of multiple cancer cell lines [63,64,65]. Like BUB1 and TPX2, the other transcripts positively associated with BCR function in the cell cycle and/or mitosis. Elevated abundance of cell cycle and mitosis transcripts has previously been shown to predict PCa outcome, and is the basis of the Prolaris commercial test for predicting PCa aggressiveness [21].

The cell cycle and mitosis genes identified here are associated with HES6-dependent E2F1 transcription factor-mediated CRPC [25]. HES6 is a transcription cofactor that physically interacts with E2F1 as well as the androgen receptor [25]. During the G1/S transition, cyclin-dependent kinases and cyclins phosphorylate the tumour suppressor retinoblastoma, resulting in a weakening of its interaction with E2F proteins [66]. E2F2 (as well as E2F1 and E2F3a) are then free to activate genes that promote S phase entry and cell cycle progression. In CRPC, HES6 is able to maintain androgen receptor activity in the absence of testosterone. In the current study E2F2, but not E2F1, was found to be prognostic for BCR, indicating E2F2 may be a more sensitive or earlier indicator of poor clinical outcome than the related family member, E2F1.

E2F2 transcription factor has been shown previously to be negatively regulated by the Let-7 family miRNA, MIRLET7A, leading to suppression of growth of PCa cells [67]. miRNAs are small RNA molecules that are bound by Argonaute proteins to form an RNA-induced silencing complex that targets specific mRNA(s), typically leading to translational repression [68]. The abundance of many miRNA is altered in PCa, and is prognostic for clinical outcome [69]. Some miRNA, such as MIRLET7A, act as tumour suppressors by targeting genes (e.g. E2F2), that enhance tumour growth. In contrast, others act as oncogenes by targeting tumour suppressors or cell death and differentiation pathway mRNAs [70]. MIR145-3P, a tumour suppressive miRNA, is decreased modestly in hormone-sensitive PCa, and decreased severely in CRPC, and low levels are associated with a shorter time to BCR [71]. Examining MIR145-3P in detail, Goto et al. identified four key MIR145-3P targets (BUB1, NCAPG, CDK1, MELK) [23]. All four are upregulated in hormone-sensitive PCa, and elevated even more in CRPC, and all are part of the group of cell cycle and mitosis genes identified in the current study, both in Gleason score 7, and in the Gleason score 3 + 4 = 7 subgroup. Given that thousands of proteins participate in the cell division cycle [72], the specific identification of BUB1, NCAPG, CDK1, and MELK as predictors of BCR, as well as key targets of MIR145-3P that are associated with CRPC, suggests a particularly important role for these cell cycle/mitosis genes in aggressive PCa.

Like other cancers, PCa frequently exhibits defects in cell cycle regulation and cell cycle progression and cell cycle proteins have been explored as therapeutic targets [73]. For example, prostate cancer Phase IB/II trials are ongoing for the CDK4/6 inhibitors ribociclib (ClinicalTrials.gov identifier: NCT02555189, NCT02494921) and palbociclib (NCT02905318, NCT02059213), and the CHEK1 inhibitor LY2606368 (NCT02203513). The cell cycle and mitosis proteins we identified in the present study do not include CDK4/6 or CHEK1. However, almost half are targets of MIRLET7A or MIR145-3P tumour suppressor miRNA. Pre-clinical and clinical trials are currently underway to determine the feasibility of using synthetic miRNA mimics (to compensate for decreased abundance of endogenous tumour suppressive miRNA) or antimiRNA (to block endogenous oncogenic miRNA) as therapeutic agents [70]. Diseases currently being targeted in clinical trials include hepatitis (NCT02508090, NCT02452814), lymphoma (NCT0250552) and mesothelioma (NCT02369198). The experimental evidence supports an important role in PCa progression of miRNA regulation of E2F2, BUB1, NCAPG, CDK1, and MELK. As such, in future studies it would be interesting to determine if the cell cycle and mitosis transcripts identified in the current study might prove to be effective targets for miRNA-based therapeutics in PCa patients.

SRD5A2 was decreased in tumours and negatively associated with BCR. SRD5A2 encodes a steroid 5-α-reductase 2 that converts testosterone to the more potent androgen receptor agonist, dihydrotestosterone. Multivariable logistic regression analysis showed that including CCNB1 or UBE2C and SRD5A2 as variables improved the predictive power in Gleason 3 + 4 = 7. CCNB1 encodes a cyclin B1 required during mitosis, and UBE2C encodes an E2 ubiquitin-conjugating enzyme important in cyclin destruction. These findings suggest it may be beneficial to pair predictors from distinct cellular pathways (cell cycle/mitosis plus androgen synthesis) to improve prognostication in Gleason score 3 + 4 = 7 PCa. SRD5A2 was previously identified in a candidate screen of 732 PCa-related genes [22] and forms part of the commercial Oncotype DX Genomic Prostate Score assay that predicts tumour aggressiveness in low and intermediate risk PCa. SRD5A2 was also identified by Rubicz et al. [50] in an unbiased screen for transcripts that improved prediction of PCa recurrence following radical prostatectomy when combined with Gleason score. Taken together, these findings suggest SRD5A2 may be a robust prognosticator of disease outcome in PCa. However, lower SRD5A2 transcript abundance has previously been reported to be associated with poor [50] or favourable [22] outcome. Moreover, altered SRD5A2 activity resulting from genetic polymorphisms or treatment with drugs such as finasteride has inconsistent effects on PCa outcome [74,75,76]. Accordingly, additional studies are warranted to clarify the role of SRD5A2 abundance in PCa outcome.

The average patient follow-up time in the TCGA validation dataset was short (< 2 years) [35], consistent with the transcripts identified in the current study as prognostic for early, aggressive disease progression. The small number of patients with BCR in this short follow-up time (13/158 Gleason score 7 cases), together with incomplete data for pre-operative PSA, precluded our testing for concurrent, independent roles for transcript abundance and PSA in the multivariable logistic regression analysis. In the future, prospective studies that include a larger number of patients with complete clinical information and long-term follow-up (15 years+) would be helpful to determine if the improvement in risk stratification we identified here is durable and reproducible.

Conclusions

In this study we identify SRD5A2, and 11 mitosis and cell cycle transcripts that predict PCa disease progression in patients with Gleason score 7 disease. The prognostic power of 10 of these transcripts extended to patients with Gleason 3 + 4 = 7 disease indicating they are excellent candidates for stratifying risk in this group of patients with disease that is often deemed likely to have a favourable outcome. Future studies, especially those including long-term clinical follow-up, will be helpful to confirm the robustness of their predictive power, as well as the therapeutic potential for pharmacologic or genetic intervention.