Introduction

As one of the most common gastrointestinal malignant diseases, colon cancer is a worldwide leading cause of cancer-related mortality [1, 2]. Of the 36 cancers estimated globally in 2018, the number of new cases and related deaths of colon cancer ranked fourth, with estimated new cases of approximately 1,100,000 [2]. The current standard therapeutic strategy for colon cancer is the combination of surgery and adjuvant chemotherapy or radiation therapy [3]. However, the prognosis of patients with colon cancer varies by multiple factors, including the clinical histological subtypes, age, genetic profiles, and treatment responses [4,5,6,7,8]. Also, the unsatisfactory prognostic outcomes still exist due to the complex pathogenesis that involves a variety of molecular or genetic factors [3, 9,10,11,12]. Therefore, the identification of prognostic biomarkers for colon cancer is still necessary.

The advances of biomarkers identified by high-throughput genome sequencing and bioinformatics analysis have attracted a great amount of interest in the last two decades. Computational bioinformatics analysis identifies potential biomarkers by deducing the association with disease status and progression. Most important of all, some of them are verifiable and reliable in clinical trials [13, 14]. For instance, Dalerba et al. [15] emphasized that the lack of the caudal-related homoeobox transcription factor 2 (CDX2) is associated with a poor prognosis in patients with stage II/III colon cancers using bioinformatics analysis. Besides, the association between the loss of CDX2 expression and poor disease-free survival in two Denmark cohorts of patients with colon cancer was validated by Hansen et al. [13]. These results showed that computational bioinformatics tools are of great value for identifying and providing potential prognostic biomarkers before the implements of clinical or preclinical experiments.

In the past decades, a lot of data mining analysis of mRNA, microRNA, long non-coding RNA, and DNA methylation have been performed on human cancers, including colon cancer [16,17,18,19]. As the biomarkers identified by the above techniques are of diagnostic and prognostic values in cancers and the revolution of sequencing technologies and bioinformatics tools facilitates the identification of more potential biomarkers related to disease progression [20,21,22,23], the more potential biomarkers identified, the more recognition and options for the diagnosis and treatment of colon cancer.

This current study aimed to identify a potential prognostic biomarker or gene signature using bioinformatics analysis. An integrated bioinformatics analysis was performed using The Cancer Genome Atlas (TCGA) and microarray datasets in the gene expression omnibus (GEO) database. The differentially expressed genes (DEGs) between the colon tumor and non-tumor control tissues and prognosis-associated genes were identified and used for the construction of a gene signature with prognostic predictive power. The possibility of using the prognostic model as a biomarker for colon cancer was validated using different cohorts. This study may provide a clinical reference for predicting the survival probability of patients with different clinical subtypes.

Materials and methods

Data extraction

The public colon cancer gene expression profiles data were preliminarily extracted from the National Center for Biotechnology Information (NCBI) GEO repository (https://www.ncbi.nlm.nih.gov/geo/) using the search words “colon cancer”. Datasets selected if they met the following inclusion criteria: (1) human gene expression profiles data, and (2) inclusive of ≥ 100 tissue samples, with or without control samples; and (3) for datasets without control samples, the clinical prognosis information of the tumor samples were included. Four datasets were selected according to the above criteria, including GSE44861 (Affymetrix-GPL3921 [HT_HG-U133A] platform, 56 tumor samples and 55 normal samples), GSE44076 (Affymetrix-GPL13667 [HG-U219] platform, 98 tumor samples and 148 normal samples), GSE17538 (Affymetrix-GPL570 [HG-U133_Plus_2] platform, 238 tumor samples), and GSE38832 (Affymetrix-GPL570 [HG-U133_Plus_2] platform, 122 tumor samples). The first two datasets with control samples were for the identification of DEGs using the weighted gene co-expression network analysis (WGCNA) and MetaDE analysis. The last two datasets with the clinical stage and survival data and without control samples were used for the construction of the prognostic prediction model.

Besides, the RNA-seq data of colon cancer and the corresponding clinical information were downloaded from TCGA (https://gdc-portal.nci.nih.gov/). After sample selection, 473 samples including 432 tumor samples with clinical information and 41 normal samples were retained in this study. A workflow of this study is shown in Fig. 1.

Fig. 1
figure 1

Workflow of this study. COAD, colon adenocarcinoma. DEG, differentially expressed genes. WGCNA, weighted gene co-expression network analysis. TCGA, The Cancer Genome Atlas. NCBI, National Center for Biotechnology Information. GEO, gene expression omnibus

Screening of colon cancer-related gene module

WGCNA has been widely applied to identify the gene module associated with diseases and extract potential therapeutic targets [24]. WGCNA software (version 1.61; https://cran.r-project.org/web/packages/WGCNA/index.html) [25] in R3.4.1 was used to screen the colon cancer-related stable gene modules with the following criteria: min size ≥ 150 and cutHeight = 0.99. The TCGA data were utilized as the training set, and the GSE44861 and GSE44076 datasets were used as the validation sets for the identification of stable gene co-expression modules. The preservation and correlation properties of the above WGCNA modules were analyzed, and modules with a preservation Z-score of > 5.0 and correlation p value of < 0.05 were defined as colon cancer-related stable gene modules.

DEG identification by meta-analysis

The common DEGs across the TCGA, GSE44861, and GSE44076 datasets were identified using the MetaDE.ES methods in the R MetaDE package (https://cran.r-project.org/web/packages/MetaDE/) [26, 27]. Briefly, the heterogeneity test of gene expression profiles from different platforms was first conducted according to the statistical tau2, Q value, and Q pval. The common DEGs were screened out according to the following criteria: tau2 = 0, p < 0.05, Q pval > 0.05, false discovery rate (FDR) < 0.05, and log2fold change (FC) had the same differential expression direction across the three datasets (> 0 or < 0). The overlapping genes between the above WGCNA module genes and the common DEGs across the three datasets were retained and used for further functional enrichment analysis and the construction of the prognostic prediction model.

Functional enrichment analysis

To investigate the biological functions associated with the above overlapping genes (DEGs), functional enrichment analyses were performed. The Gene Ontology biological processes and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways associated with these DEGs were identified using the DAVID online tool (version 6.8; https://david.ncifcrf.gov/) [28, 29]. Significant enrichment was considered when p value < 0.05.

Construction and evaluation of prognostic prediction model

Before the construction of the prognostic prediction model, the prognosis-associated DEGs were identified using the univariate and multivariate Cox regression analysis in the R survival package (version 2.4, https://cran.r-project.org/web/packages/survival/index.html) [30]. The prognosis-associated DEGs in the TCGA training set (n = 432) were identified when log-rank p value < 0.05. Then, the optimal prognostic gene signature was identified using the L1-penalized least absolute shrinkage and selection operator (LASSO) Cox-proportional hazards (Cox-PH) model (lamba = 1000) in the penalized package (version 0.9-50, http://bioconductor.org/packages/penalized/) [31, 32]. Subsequently, the prognosis risk score of each sample was calculated using the following gene signature model: risk score = ∑βgene × Expgene, where β represents the LASSO coefficient and Exp denotes the expression level. All the samples in the TCGA training set were divided into the high- and low-risk groups according to the median risk score. The Kaplan-Meier (K-M) curve analysis in the R survival package (version 2.41-1) and the receiver operating characteristic (ROC) curve were used to assess the association of the risk score with the overall survival in patients with colon cancer. Similarly, the samples in the validation sets (GSE17538 and GSE38832) were separately divided into the high- and low-risk groups according to the above prognostic model. The performance of the above gene signature model in predicting the prognosis of colon cancer was validated in the validation sets (GSE17538 and GSE38832) using the K-M survival test and ROC curves.

Identification of clinical factors associated with the prognosis of colon cancer

The clinical factors associated with the prognosis of colon cancer were identified in the TCGA training set using the univariate and multivariate Cox regression analysis of the survival package (version 2.41-1) in R3.4.1. The threshold was log-rank p value < 0.05. Also, the K-M survival test was used to validate the performance of the gene signature model in predicting the prognosis of patients with different clinical subtypes.

Nomogram survival model analysis

The final nomogram was established using the “rms” package (Version 5.1-2; https://cran.r-project.org/web/packages/rms/index.html) in R3.4.0 to estimate the individualized survival probability for patients with colon cancer. The prognosis-associated clinical factors and the gene signature model were used for the construction of the nomogram. Each factor in the nomogram was ascribed points according to its weight. The total point of each sample was calculated and the 3- and 5-year survival probabilities of each sample were predicted accordingly.

Screening of DEGs between the high- and low-risk groups

At last, the DEGs between the samples in the high- and low-risk groups were identified to investigate the different gene expression profiles and features between patients with different survival probabilities. The DEGs between the high- and low-risk groups in the training set were screened using the limma package (Version 3.34.7, https://bioconductor.org/packages/release/bioc/html/limma.html) [33], with the thresholds of FDR < 0.05 and |log2FC| > 0.5.

Results

Extraction of WGCNA modules related to colon cancer

The correlation analysis of RNA-seq data showed there were significant positive correlations (expression correlation coefficient > 0.700 and p < 1e−200) and connectivities (p < 1e−06) across the TCGA, GSE44861, and GSE44076 datasets (Figure S1A). Before the identification of the WGCNA modules analysis, the scale-free topology criterion was identified: the soft threshold power = 7 when the scale-free topology model fit R2 was maximized (R2 = 0.9; Figure S1B). Then, 8 WGCNA modules were identified in the training dataset according to the criteria: soft threshold power = 7, min size ≥ 150, and cutHeight = 0.99 (Fig. 2a). The same module division was identified in the two validation datasets (GSE44861 and GSE44076; Fig. 2a).

Fig. 2
figure 2

The gene module related to colon cancer based on the weighted gene co-expression network analysis (WGCNA) algorithm. a The module partition results of WGCNA in the TCGA (left), GSE44861 (middle), and GSE44076 (right) datasets, respectively. The different colors represent the different WGCNA modules. b The correlation heatmap of gene modules with the clinical factors of colon cancer. The horizontal axis represents clinical factors, and the vertical axis represents gene modules. The color changed from blue to red indicates the correlation from negative to positive. The numbers in the boxes indicate the correlation coefficients (upper) and the numbers in parentheses indicate the p values (lower)

Subsequently, 5 robust modules (blue, brown, green, red, and yellow) with a preservation Z-score of > 5.0 and a p value of < 0.05 were obtained. A total of 1160 genes, including 381, 205, 195, 184, and 195 genes in the blue, brown, green, red, and yellow modules, were obtained (Table 1). The correlation of these 8 WGCNA modules with clinical factors, including patients’ age, gender, history of colon polyps, lymphatic invasion, microsatellite instability, radiation therapy, death, tumor recurrence, pathologic M, pathologic N, pathologic T, and pathologic stage, is shown in Fig. 2b. For instance, the genes in the red module were significantly correlated with the pathologic T classification (cor = 0.54, p < 0.0001).

Table 1 The weighted gene co-expression network analysis (WGCNA) gene modules related to colon cancer

Identification of common DEGs using the MetaDE analysis

Following the aforementioned criteria for the MetaDE analysis, 1153 common DEGs were identified across the three datasets (TCGA, GSE44861, and GSE44076), including 724 downregulated DEGs and 429 upregulated DEGs. These DEGs had distinctively different expression profiles in the tumor and control samples and showed the same differential expression direction across the three datasets (Fig. 3).

Fig. 3
figure 3

The heatmap of the common differentially expressed genes across the three datasets. High- and low-expression levels are indicated by red and green, respectively

Enrichment analysis of common DEGs

The Venn diagram indicated that 556 genes were overlapped between the five WGCNA module genes (n = 1160) and common DEGs (n = 1153) were obtained (Fig. 4a), including 218, 73, 166, 0, and 99 genes in the blue, brown, green, red, and yellow modules, respectively. The functional enrichment analyses indicated that these common DEGs were significantly associated with 24 biological processes related to immune response and the defense response (Fig. 4b) and 8 KEGG pathways including cytokine-cytokine receptor interaction, chemokine signaling pathway, and focal adhesion (Fig. 4b).

Fig. 4
figure 4

Features of the differentially expressed genes (DEGs) in the cancer-related WGCNA genes modules. a The Venn diagram indicating the overlapping genes between genes in the five cancer-related WGCNA modules and the common DEGs across the three datasets (TCGA, GSE44861, and GSE44076) identified by the MetaDE analysis (left), and the pie chart showing the number of overlapping genes in WGCNA modules (right). b The Gene Ontology biological processes (left) and Kyoto Encyclopedia of Genes and Genomes pathways (right) associated with the overlapping genes in the above figures. Horizontal axis represents gene number. The color and size of the dots indicate the p value. The closer the color is to red, the higher the significance

Construction of the prognostic model

Based on the univariate Cox regression analysis, 84 prognosis-associated DEGs were identified in the TCGA training dataset. The multivariate Cox regression analysis showed that 14 out of the 84 DEGs were independently correlated with the prognosis of patients with colon cancer (Table S1). Afterward, an optimized prognostic gene signature was identified using the Cox-PH model, which consisted of 12 DEGs, including ADORA3, CPA3, CPM, EDN3, FCRL2, MFNG, NAT1, PCSK5, PPARGC1A, PRRX2, TNFRSF17, and WDR78 (Table 2). Most of these 12 genes were in the blue (n = 5) and green modules (n = 6). The prognostic gene model of colon cancer was built according to the following algorithm: prognostic risk score = 0.44262 × ExpADORA3 + (− 0.35894) × ExpCPA3 + (− 0.26349) × ExpCPM + (− 0.12557) × ExpEDN3 + 1.38523 × ExpFCRL2 + 0.35734 × ExpMFNG + (− 0.42755) × ExpNAT1 + 0.30206 × ExpPCSK5 + (− 0.34355) × ExpPPARGC1A + 0.04376× ExpPRRX2 + (− 0.21594) × ExpTNFRSF17 + (− 0.07166) × ExpWDR78. The 432 samples in the TCGA training set were then divided into the high- (n = 216) and low-risk (n = 216) groups according to the median prognostic risk score. The K-M survival test indicated that patients with high-risk scores had a significantly shorter survival time compared with patients with low-risk scores (hazard ratio, HR = 3.287, 95% CI 2.082–5.189, p = 4.096e−08; Fig. 5a). The ROC curve analysis showed the prognostic model had a high accuracy in predicting the prognosis of colon cancer in the training set (area under the ROC curve, AUC = 0.922; Fig. 5a).

Table 2 The list of the differentially expressed genes in the optimized prognostic gene signature was identified by the Cox-proportional hazards (Cox-PH) model
Fig. 5
figure 5

The Kaplan-Meier (K-M) survival analysis for samples with different risk scores. ac The K-M survival analysis of samples in the low- and high- risk groups (upper), and the receiver operating characteristic (ROC) curve analysis for evaluating the prognostic model in predicting survival in the training (TCGA) and validation datasets(GSE44861 and GSE44076; lower). HR represents hazard ratio, and the number in parentheses indicates 95% confidence interval (CI). AUC, the area under the ROC curve

Validation of the prognostic model

Similarly, the samples with clinical overall survival data in the two validation datasets (GSE17538, n = 232; and GSE38832, n = 122) were separately divided into the high- and low-risk groups according to the prognostic risk scores (Fig. 5b, c). The K-M survival analysis showed there was a significant difference in the overall survival time between patients in the high and low groups in the two datasets (GSE17538: HR = 1.659, 95% CI 1.042–2.642, p = 3.059e−02; GSE38832: HR = 3.247, 95% CI 1.312–9.037, p = 5.273e−03; Fig. 5b, c). Besides, the model had high accuracies in predicting the prognosis in the two datasets (GSE17538: AUC = 0.841; GSE38832: AUC = 0.824). These results suggested the high performance of this model in predicting the prognosis of colon cancer.

Identification of prognosis-associated clinical factors

Before the construction of the nomogram model, the prognosis-associated clinical factors were identified using the univariate and multivariate Cox regression analysis. The stepwise Cox regression analyses showed that patient’s age (HR = 1.047, 95% CI 1.021–1.073, p = 3.510e−04), pathologic T classification (HR = 3.561, 95% CI 1.781–7.121, p = 3.280e−04), recurrence (HR = 1.881, 95% CI 1.050–3.369, p = 3.363e−02), and the risk model status (high/low; HR = 2.737, 95% CI 1.447–5.178, p = 1.970e−03) were prognosis-associated factors in the TCGA cohort (Table 3). The K-M survival analysis indicated that there was a significantly lower survival ratio in patients aged above 65 years (HR = 1.618, 95% CI 1.041–2.513, p = 2.748e−02; Fig. 6a, left), with advanced T classification (HR = 2.658, 95% CI 1.775-3.979, p = 1.116e−06; Fig. 6b, left), and with recurrence tumor (HR = 2.567, 95% CI 1.636–4.029, p = 2.113e−05; Fig. 6c, left) in comparison with the corresponding control groups, respectively. These results indicated the significant correlation of patients’ age, T classification, and recurrence status with the prognosis of colon cancer.

Table 3 Identification of the prognosis-associated factors in colon cancer (the TCGA samples) using Cox regression analysis
Fig. 6
figure 6

The subgroup Kaplan-Meier (K-M) survival analyses of prognosis-associated clinical factors analysis. ac The K-M survival analysis of age, pathological T, and tumor recurrence in all samples (left), as well as different subgroups divided by the status of age, clinical T classification, and recurrence status (middle and right). HR represents hazard ratio, and the number in parentheses indicates 95% confidence interval (CI)

Besides, the subgroup K-M survival analysis showed that high risk score was correlated with a lower survival ratio in patients aged below 65 years (HR = 6.807, 95% CI 2.358–19.65, p = 1.808e−05; Fig. 6a, middle), aged above 65 years (HR = 2.623, 95% CI 1.566–4.393, p = 1.271e−04; Fig. 6a, right), with advanced T classifications (T13-4, HR = 3.273, 95% CI 2.022–5.300, p = 1.831e−07; Fig. 6b, right), with tumor recurrence (HR = 2.680, 95% CI 1.410–5.094; p = 1.807e−03; Fig. 6c, middle), and without tumor recurrence (HR = 3.073, 95% CI 1.322–7.140; p = 6.222e−03; Fig. 6c, right). For patients with early T classifications (T1-2), there was no difference in the survival ratio between patients with high- and low-risk scores (HR = 1.660, p = 5.395e−01; Fig. 6b, middle). The subgroup analysis indicated that the prognostic gene model had high performance in predicting the prognosis of patients with colon cancer, irrespective of the clinical age and tumor recurrence status.

Nomogram model construction

According to the above analyses, the nomogram model was constructed using the prognosis-associated factors, including patients’ age, clinical T classification, and tumor recurrence status (Fig. 7a). According to the nomogram, we found that patients with older age, an advanced T classification, tumor recurrence, and a high risk score had low 3- and 5-year survival probabilities. Take an 85-year-old man (~ 5 points), with T3 classification (~ 33.7 points), with tumor recurrence (0 points), and a risk score of 1.5 (~ 9.3 points), for example, he had a total point of 48. His 3- and 5-year survival probabilities were approximately 40% and 28%, respectively (Fig. 7a). What’s more, the predicted 3- and 5-year survival probabilities had high compliance with the actual situations (c-index = 0.752 and 0.721; Fig. 7b). These results suggested the clinical applicability of this prognostic model in predicting the prognosis of colon cancer.

Fig. 7
figure 7

The nomogram model analysis. a The predictive weight of each factor and prognostic risk score in predicting the prognosis of colon cancer. The red line with arrow notes the 3- and 5-year survival probability of the example case. b The difference analysis between nomogram-predicted survival probability and the actual survival. The nomogram-predicted survival probabilities have high compliances with the actual situations (c-index = 0.752 and 0.721)

The features of the DEGs between patients with different prognosis risk scores

At last, we investigated the differential gene expression profiles between TCGA samples with high- and low-risk scores. A total of 514 DEGs were identified between high- and low-risk groups, including 102 downregulated and 412 upregulated genes (Fig. 8a). The clustering analysis indicated that the expression profiles of these DEGs changed with the risk scores (Fig. 8b), showing the co-expression profiles of these DEGs with the 12-gene signature.

Fig. 8
figure 8

Screening of differentially expressed genes (DEGs) in the TCGA samples with high- and low-prognostic risk scores. a The scatter plot of the 514 DEGs between the high- and low-risk groups. Blue nodes indicate genes are upregulated (FDR < 0.05 and log2FC > 0.5) and downregulated DEGs (FDR < 0.05 and log2FC < − 0.5). b The sample heatmap of the 514 DEGs in the TCGA cohort (n = 432). FDR, false discovery rate. FC, fold change. TCGA, The Cancer Genome Atlas

Discussion

In the present study, 5 significantly stable gene modules (including 1160 genes) related to colon cancer were constructed by the WGCNA algorithm. Then, 1153 common DEGs across the TCGA, GSE44861, and GSE44076 datasets were identified between colon cancer tumor and normal tissue samples. Furthermore, the expression features of 12 prognosis-associated DEGs (ADORA3, CPA3, CPM, EDN3, FCRL2, MFNG, NAT1, PCSK5, PPARGC1A, PRRX2, TNFRSF17, and WDR78) were identified as the optimized prognostic gene signature. The corresponding prognostic model presented high performance for predicting the prognosis of colon cancer both in the training dataset and in the validation datasets. Besides, we found that the predicted 3- and 5-year survival probabilities using the combination of the model status with clinical factors (including patients’ age, pathological T classification, and tumor recurrence status) showed high compliance with the actual 3- and 5-year overall survival proportion. These results indicated that the prognostic gene signature was of great reference value for predicting the prognosis and survival probability of colon cancer.

The advances in mining the genetic properties of various diseases have been enhanced due to the rapid technological development in high-throughput sequencing and bioinformatics [34]. The GEO and TCGA databases, as public available cancer genomic databases, provide the comprehensive data of cancers, including mRNA expression data, miRNA expression data, copy number variation, DNA methylation, and clinical information [35, 36]. The TCGA and GEO data have been effectively applied to improve diagnostic and therapeutic methods and potential of cancers [35,36,37]. Thus, this study was performed based on the gene expression profile data and clinical information of colon cancer retrieved from the TCGA and GEO databases. Gene expression profiles have been reported to predict the prognosis outcome of cancers [38,39,40]. Computationally, the Cox regression methods were commonly used to construct the prognostic models and screen prognostic factors [41]. The availability of this model in survival analysis has been confirmed in recent studies [42, 43]. Similarly, in this study, the Cox regression model based on the LASSO was applied to screen the optimized gene set with potential prognostic value. The 12-gene prognostic signature constructed by the LASSO Cox regression model showed a higher predictive ability both in the TCGA training data and the two validation sets (GSE17538 and GSE38832; AUC > 0.800).

Besides, this study showed that age, pathological T classification, and tumor recurrence were prognosis-associated factors in patients with colon cancer. Consistent with our results, previous studies have also demonstrated that older age, advanced pathological T, and tumor recurrence are associated with poor prognosis in patients with colon cancer [44,45,46]. Notably, the nomogram analysis in the current study revealed that the combination of patients’ age, T classification, recurrence status, and prognostic risk score had 3- and 5-year survival probabilities close to actual clinical situations. These results further showed that the 12-gene prognostic model had a significant predictive ability for the prognosis of colon cancer.

In this study, the prognostic model was constructed based on the signature of 12 prognosis-associated genes, including 12 DEGs, ADORA3, CPA3, CPM, EDN3, FCRL2, MFNG, NAT1, PCSK5, PPARGC1A, PRRX2, TNFRSF17, and WDR78. Specifically, the adenosine receptor A3 (ADORA3) protein encoded by the ADORA3 gene is a G-protein-coupled receptor that functions in inflammatory and immunological responses as well as cancer growth through influencing the nucleotide metabolic process [47,48,49]. There is increasing evidence proving that ADORA3 is overexpressed in several cancers, including breast cancer [50], thyroid cancer [51], bladder cancer [52], and colon cancer [53] and functions as a tumor promoter [54]. Carboxypeptidase A3 (CPA3) is a member of the CPA family of zinc metalloproteases released by mast cells and may be involved in the inactivation of venom-associated peptides and the degradation of endogenous proteins [55]. Previous studies have shown the elevated expression of CPA3 in asthma [56] and anaphylactic shock [57]. However, few studies have investigated the role of CPA3 in cancers. CPM is also an arginine/lysine CP which exerts important roles in angiogenesis, proliferation, and apoptosis through modulating chemokines or kinins in cancer cells [58]. Notably, a recent study reports that CPM/Src-FAK pathway is involved in cell migration and invasion in colon cancer [59]. Endothelin 3 (END3) is reported to participate in the progression of several cancers including malignant melanoma [60], cervical cancer [61], and colon cancer [62]. Fc Receptor Like 2 (FCRL2) is a member of the immunoglobulin receptor superfamily that is involved in the development of lymphoblastic leukemia by immunomodulating B cell function [63,64,65]. Besides, it has been reported that the inherited polymorphism in the acetyltransferase 1 (NAT1) gene increases the risk of colorectal adenocarcinoma [66]. Manic fringe (MFNG) is reported to exhibit antitumor effects in lung cancer [67]. The peroxisome proliferator-activated receptor-γ coactivator 1-α (PPARGC1A) gene also contributes to tumor growth and metastasis in several cancers [68, 69]. In addition, studies have suggested that both the paired related homeobox 2 (PRRX2) gene [70, 71] and the tumor necrosis factor receptor superfamily member 17 (TNFRSF17) gene [72, 73] are associated with the development of several cancers, while the proprotein convertase subtilisin/kexin type 5 (PCSK5) gene and the WD repeat domain 78 (WDR78) gene have not been reported to be associated with pathogenesis and progression. Thus, the functions of these genes in colon cancer should be further investigated using preclinical and clinical experiments.

Conclusions

In conclusion, the prognostic model based on the signature of the 12 genes (ADORA3, CPA3, CPM, EDN3, FCRL2, MFNG, NAT1, PCSK5, PPARGC1A, PRRX2, TNFRSF17, and WDR78) exhibited a relatively satisfactory and credible predictive power for the prognosis of colon cancer, making it a great potential biomarker. However, the prognostic significance and practicability of the 12-gene prognostic model in colon cancer should be further confirmed in clinical studies.