1 Introduction

Prostate Cancer (PC) is one of the most common malignant tumors, and the incidence of PC has been on the increase in recent years, especially the late-stage PC [1]. Risk factors of PC mainly include age, genetic and family history, behavior and lifestyle, environmental factors [2]. PC arises from high grade intra-epithelial neoplasia and develops into localized PC [3,4,5]. In clinical practice, localized prostate cancer can be effectively treated, and even cured, through radical surgery or radiotherapy. Active surveillance may be chosen for some low-risk and elderly patients. However, some patients do not benefit from radical surgery or radiotherapy, but are afflicted with biochemical recurrence (BCR) [6, 7]. Once BCR occurs, patients of PC have a possibility to be liable to undergo metastasis [6, 8]. Androgen deprivation therapy (ADT) is a preferred treatment for patients with metastatic PC. Unfortunately, some diseases developed into castration-resistant PC after ADT treatment [6, 9, 10] (Fig. 1). Therefore, if risk factors of prognosis of PC patients could be accurately predicted, early intervention and targeted therapy could prevent the progression of PC, which will benefit more patients.

Fig. 1
figure 1

The development of prostate cancer

Recently, clinicians mainly take into account the clinical stages and clinicopathological features to guide the treatment of PC patients. However, due to the molecular heterogeneity of PC, some patients would be resistant to the uniform treatment and progress to recurrence [11,12,13]. Therefore, it is significant to identify prognostic biomarkers for PC patients to predict outcomes and guide treatment. To improve prognosis of PC patients, gene signatures are applied to predict survival outcome of PC patients by risk stratification, thus classifying patients as high or low risk. For instance, a 7-gene signature has been developed to distinguish indolent PC from aggressive PC [14]. In addition, a 20-gene signature has been constructed to identify patients at risk of metastatic progression after prostatectomy [15], and a 10-gene signature has been identified to predict the risk of BCR of PC patients [16]. Besides, gene signatures also have been applied to the personalized diagnosis and treatment of PC patients [17, 18].

Consequently, the aim of the present study was to conduct a comprehensive review and analysis of previously reported gene signatures that predict survival outcome of PC patients. In this review, 71 different gene signatures were summarized and 3 strategies for gene signature construction were concluded.

2 Methods

2.1 Data collection and selection

In order to evaluate previously reported gene signatures for PC, a total of 282 articles were included by searching PubMed database with keywords ‘‘prognostic gene signature AND prostate cancer’’. The dates for database search were September 20, 2020, and February 6, 2021, respectively. At first, a total of 279 articles were included (3 articles were not available). By selection, gene signatures based on mRNA expression profiling and correlated with patients' survival outcomes were included. For consistency, exclusion criteria was as follows: (i) Review or meta-analysis; (ii) not prostate cancer; (iii) gene signature not mentioned; (iv) signatures were not derived from mRNA expression profiling; (v) no prognostic analysis. The inclusion of relevant articles was executed according to the PICO (P: prostate cancer patients; I: prognostic gene signature; O: survival outcomes). As a result, 71 different gene signatures were summarized from 68 studies.

2.2 Data interpretation and statistical analysis

To compare prognostic abilities of different gene signatures, data from training set was collected and summarized. Additionally, during the data extraction process, results from the TCGA database and data with overall survival (OS) were given priority. To evaluate these prognostic signature models, some indexes were summarized, including gene name, p value, AUC of ROC curve, Hazard ratio (HR) and 95% confidence interval (CI) estimated by Cox regression analysis (Table 1, 2, 3, 4). To visually demonstrate the ability of risk stratification of gene signature, we extracted HR and 95% CI from these articles and made a forest plot (Fig. 3) by R 4.3.1 (Among 71 gene signatures included in this study, only 31 gene signatures showed HR and 95% CI). In addition, robust genes were identified according to their usage frequency in 71 different gene signatures. Furthermore, pathway enrichment analysis of robust genes were conducted at www.metascape.org. Steps were as follows: (i) input a gene list; (ii) select species: H.sapiens; (iii) perform Express Analysis.

Table 1 Gene signature information of strategy I
Table 2 Gene signature information of strategy II
Table 3 Gene signature information of strategy III
Table 4 Gene signature information of other strategies

3 Results

By screening, 71 different gene signatures were summarized from 68 studies, which were published from 2005 to 2021 (Fig. 2). Comprehensive gene signature information related to survival of PC patients was presented in Table 1, 2, 3, 4 and Fig. 3. These signatures were associated with metastasis-free survival (MFS), overall survival (OS), disease-free survival (DFS), biochemical recurrence‐free survival (BFS) or recurrence-free survival (RFS) of PC patients (Tables 1, 2, 3, 4). In summary, methods of gene signature construction were mainly divided into three strategies according to different sources of prognostic genes. In Strategy I, gene signatures were constructed based on differentially expressed genes (DEGs); Strategy II was based on cellular process-related genes; Strategy III was based on AR (androgen receptor) or AR-Vs (androgen receptor variants)-related genes. In addition to the classification of gene signature construction, we also identified 14 robust genes from 71 different gene signatures.

Fig. 2
figure 2

Data collection and interpretation. A Workflow of articles screening. B Publication year of 68 studies from 2005 to 2021

Fig. 3
figure 3

Estimate of 31 gene signatures via Meta-analysis. Forest plot, HR from validation set marked with*; HR from test set marked with**

3.1 Strategy I: gene signatures based on DEGs

In strategy I, authors developed gene signatures on the basis of DEGs derived from different analysis methods. Shao N et al. (2020) obtained DEGs by comparing microarray data of PC samples with Gleason score (GS > 8 or GS < 6), and then identified 6 genes significantly related to biochemical recurrence (BCR) by using Lasso and Cox regression models [17]. In order to distinguish high-risk invasive PC from low-risk indolent PC, Xiao K et al. (2016) screened 8 DEGs between invasive and indolent PC using expression profiling of 87 prostatectomy samples [19]. Significantly, ETS (E26) fusion has been identified as a molecular subtype specific for PC [20,21,22]. Therefore, Bismar TA et al. (2014) screened 10 genes with significant differences between ERG fusion negative and positive samples as a 10-gene signature through singular value decomposition (SVD) analysis [23].

The workflow of strategy I was summarized as follows: [1] Patient groups classified according to the research purpose. To study genes associated with PC metastasis, for instance, researchers divided PC samples into metastatic and non-metastatic groups [24]. (2) Acquisition of DEGs. Microarray or bioinformatics technique was used to analyze the gene expression profiling of different groups, and DEGs were obtained according to the criteria set at the beginning of these studies. (3) Establishment of a gene signature. Multivariate Cox regression model was used to screen genes significantly related to survival of PC patients from DEGs, thus a gene signature was constructed. (4) Validation. Survival and ROC analysis were performed on the established gene signature in other cohorts (Fig. 4A).

Fig. 4
figure 4

The flow chart of strategy I and II. A Strategy I: gene signatures based on DEGs. B Strategy II: gene signatures based on cellular process-related gene

For instance, to predict survival outcome of PC patients, Xu N et al. (2018) analyzed 1417 DEGs by comparing expression profiling of PC and non-PC tissues, and then screened out 4 DEGs (HOXB5, GPC2, PGA5 and AMBN) through univariate and multivariate Cox regression analysis, which were significantly correlated with OS of PC patients [25]. Finally, multiple Cox regression coefficients corresponding to the four genes were multiplied by their corresponding gene expression levels and then summed to develop risk score. To confirm the predictive ability of a gene signature in PC, patients were divided into high-risk and low-risk groups and then subjected to Kaplan–Meier survival analysis. Furthermore, ROC curve analysis was applied to detect the discriminability of this gene signature [25]. More gene signature-related information of strategy I were presented in Table 1.

As for the strategy of developing gene signature through DEGs, some novel methods to identify DEGs have been applied. As Ong CW et al. (2018) described, through immunohistochemistry (IHC) to stratify intermediate risk PC, 35 DEGs related to high PTEN expression were identified to establish a signature [26]. As Pang X et al. (2019) described, DEGs were obtained through comparing expression profiling of hormone sensitive PC (HSPC) vs. metastatic castration-resistant PC (mCRPC), then extracellular matrix genes were enriched by performing biological pathway analysis. Thus, a gene signature composed of six genes was identified by correlation analysis of extracellular matrix genes [27]. In brief, as the fundamental feature of strategy I, gene signatures established through DEGs combined with other factors (such as PTEN expression or therapy sensitivity) also included.

3.2 Strategy II: gene signatures based on cellular process-related genes

In this strategy, gene signatures were constructed based on cellular process-related genes, which were associated with progression of PC, including metabolism, CCP (cell cycle progression), apoptosis and autophagy. Zhang Y et al. (2020) considered that the unrestricted amplification feature of cancer cells would make the metabolic state of tumor tissues different from that of normal tissues. Thus, they developed a metabolism-associated 6-gene signature to guide diagnosis and DFS prediction of PC patients [18]. The expression of CCP-related genes fluctuated with cell cycle progression which may represent an aspect of tumor biology [28]. Thus, Cuzick J et al. (2011) employed 31 CCP genes to form a gene signature for predicting RFS of PC patients [29]. By literature retrospect, Zhang Q et al. (2020) declared that apoptosis is involved in the recurrence and progression of PC, thus they constructed an apoptosis-related gene signature for BCR prediction [30].

The major steps of Strategy II workflow were as follows: (1) To confirm that cellular process (such as metabolism, CCP, apoptosis or autophagy) is associated with PC progression. (2) Screening for genes associated with survival outcomes from cellular process-related genes. (3) Prognostic risk stratification model construction. (4) Verification of the prognostic risk model (Fig. 4B).

As known, autophagy leads to the degradation and recycling of intracellular components to maintain cellular homeostasis [31]. However, excessive autophagy may contribute to the elevated tumor invasion and lead to PC progression [32]. Thus, Hu D et al. (2020) constructed OS- and DFS-associated prognostic models based on autophagy-related genes (ARGs) [33]. First, differentially expressed genes were identified from 234 ARGs. Then, hub ARGs were screened using Cox regression analysis to construct a prognostic model. Finally, the correlation between clinicopathological features and this prognostic model was analyzed [33]. Glycidamide (GA) is known to be associated with malignant transformation of tumors [34, 35], however, little is known about which genes are induced by GA. Titus et al. (2019) demonstrated that GA accelerates migratory and growth ability of PC cells through influencing regulators of cell cycle and epithelial-to-mesenchymal transition (EMT). Hence, they constructed a 3-gene signature (CDK4, TWISTI and SNAI2) to predict survival outcome of PC patients upon GA exposure [35].

Gene signature derived from cellular process-related genes was the typical characteristics of this strategy. Therefore, we suspect that if the cellular process (such as metabolism, CCP, apoptosis or autophagy) has a greater impact on PC, the predictive ability of these gene signatures may be stronger than that of gene signature formed by other strategies. More gene signature related information of Strategy II was presented in Table 2.

3.3 Strategy III: gene signatures based on AR or AR-Vs

Due to the importance of AR (androgen receptor) and AR-Vs (androgen receptor variants) in PC, the establishment of gene signatures related to AR or AR-Vs was considered as Strategy III. The effect of AR and AR-Vs on PC cells [36,37,38,39] was shown in Fig. 5. Androgen receptors play an important role in the development of both normal and cancerous prostate tissue by regulating proliferation-related genes expression [40] (Fig. 5). Therefore, researchers have tried to identify genes related to survival events and regulated by AR as PC biomarkers and therapy targets. The major steps of Strategy III workflow were shown in Fig. 6. Chen X et al. (2019) identified 29 gene modules using Weighted Gene Co-expression Network Analysis (WGCNA) method, and the biological function of the module significantly regulated by AR is “generation of precursor metabolites and energy” [41]. Eleven genes in this module are involved in this biological function, among which FECH and CROT are regulated by androgen and CROT has androgen receptor binding sites. Finally, a 2-gene signature was established to predict recurrence (RFS), and notably blocking this AR-related biological process will contribute to preventing PC from malignant progression [41].

Fig. 5
figure 5

The effect of AR and AR-Vs on prostate cancer cells. Catalyzed by CYP17A1, steroids precursor turns into testosterone. After entering cells, testosterone transforms into dihydrotestosterone (DHT) by 5α-reductase. AR separates with its chaperones and binds to DHT, subsequently, phosphorylation occurs. After binding, AR and DHT enter nucleus in the form of dimer and bind to the corresponding site to promote cell proliferation. These processes can be blocked by Abiraterone and Enzalutamide, respectively. AR-Vs lack ligand-binding domain (LBD), but retain N-terminal domain (NTD) and DNA-binding domain (DBD). Some researchers have demonstrated that high expression of AR leads to chromatin relaxation, which is related to ADT resistance. It has also been suggested that ADT resistance may be related to the continuous activation of proto-oncogenes by AR-Vs, but the specific mechanism remains unclear [36,37,38,39]. This figure was graphed by Illustrator

Fig. 6
figure 6

The flow chart of strategy III

Androgen Deprivation Therapy (ADT) is a preferred treatment for patients with PC metastasis [42]. However, some patients developed androgen resistance after treatment [43]. Mechanisms of androgen resistance may include AR variants splice, AR overexpression and alterations in AR coregulators [44, 45]. Therefore, Magani F et al. (2018) focused on AR splice variants, and discovered gene modules associated with different phenotypes of PC using WGCNA method [38]. As a result, most genes in one module were regulated by AR-V7 (an AR splice variant), and the main biological functions of this module were “cell proliferation” and “chromosome segregation”. Moreover, a 7-gene (KIF20A, KIF23, TOP2A, CCNB1, CCNB2, BUB1, BUB1B) signature with predictive value was constructed by further analysis of this module [38].

Through the construction of AR or AR-Vs related gene signature, not only provide effective prognosis prediction models for PC patients, but also help to elucidate the molecular mechanism underlying the occurrence and progression of PC, and will potentially facilitate the development of prognostic biomarkers and molecular targeted therapy strategies for PC. More gene signature related information of Strategy III was presented in Table 3.

3.4 Other strategies

Although we have concluded three common strategies, other approaches for establishing gene signature were also mentioned. Li X et al. (2020) performed univariate Cox regression analysis (FDR < 20%) and screened out 80 prognosis-related genes [46]. 74 pairs of genes were identified as a gene signature according to C index. Patients with at least 37 pairs of high-risk genes were considered as high-risk group, and those with low-risk genes were considered as low-risk group [46]. Shi R et al. (2019) divided genes into co-expression modules, selected the modules significantly correlated with BFS by Cox regression analysis, and then performed LASSO regression analysis for screening genes to obtain a gene signature [47]. Cho H et al. (2021) analyzed the circulating tumor cells (CTC) and formed a gene signature derived from the representative genes in CTC [48]. More information of gene signatures derived from other strategies were exhibited in Table 4.

3.5 Identification of robust genes in 71 different gene signatures

In 71 different gene signatures, certain genes exhibited the high frequency of application, which were considered as robust genes with excellent predictive abilities. After screening, we found that 71 gene signatures included 1278 genes, among which 381 genes were employed more than or equal to twice; 41 genes more than or equal to three times; 14 genes more than or equal to four times (regarded as robust genes) (Table 5). Furthermore, pathway and process enrichment analysis of 14 robust genes (ASPN, BIRC5, BUB1, BUB1B, CCNB1, CDK1, CENPF, DPP4, EZH2, MYH11, POSTN, PTTG1, SMC4, ZFP36) showed that these genes were mainly enriched in: Mitotic prometaphase; Cell cycle; protein localization to chromosome (Fig. 7).

Table 5 The summary of 14 robust genes
Fig. 7
figure 7

Pathway and process enrichment analysis of 14 robust genes

4 Discussion

In the present study, we conducted a systematic review and integrated analysis of previously reported prognostic gene signatures of PC. Gene signature construction strategies and robust genes significantly associated with prognosis were summarized. First, the reported gene signatures were collected and sorted; second, three strategies for gene signature establishment were summarized and the information of prognostic abilities of gene signatures was listed and exhibited as each strategy; third, the robust genes were identified from the reported gene signatures with risk prediction abilities.

In the process of constructing gene signature of PC, biochemical recurrence (BCR) and castration-resistant prostate cancer (CRPC) have been introduced to be evaluated. The majority of patients with early-stage PC were treated with prostatectomy. However, about one-third of patients will develop BCR after surgical treatment, leading to the poor prognosis of them [6, 7]. So it is required for gene signatures acting as predictors of BCR to discriminate patients at high risk. In addition to BCR, CRPC is also a challenge in PC treatment. Urbanucci A et al. (2017) demonstrated that overexpressed androgen receptors lead to chromatin relaxation, which are thought to be involved in CRPC by regulating bromodomain-containing proteins (BRDs) [39]. Furthermore, Zhang Q et al. (2020) [30], Peng Z et al. (2016) [49], Chen X et al. (2012) [14], and Yuan P et al. (2020) [24] showed that the predictive ability of a gene signature in combination with clinical characteristics (e.g. Gleason score) was stronger than that of a gene signature alone. In addition, Zhao SG et al. (2016) [15], Mu HQ et al. (2020) [50], and Pang X et al. (2019) [27] mentioned that genes in their signatures have a certain relationship with drug therapy, indicating that these gene signatures for PC might help identify potential therapeutic targets. With the development of bioinformatics, more and more researchers have applied GO or pathway enrichment analysis and protein network into the study of gene signature to better meet the research purpose [16, 24, 51].