Bioinformatics analysis reveals immune prognostic markers for overall survival of colorectal cancer patients: a novel machine learning survival predictive system

Zhang, Zhiqiao; Huang, Liwen; Li, Jing; Wang, Peng

doi:10.1186/s12859-022-04657-3

Bioinformatics analysis reveals immune prognostic markers for overall survival of colorectal cancer patients: a novel machine learning survival predictive system

Research
Open access
Published: 08 April 2022

Volume 23, article number 124, (2022)
Cite this article

Download PDF

You have full access to this open access article

BMC Bioinformatics Aims and scope Submit manuscript

Bioinformatics analysis reveals immune prognostic markers for overall survival of colorectal cancer patients: a novel machine learning survival predictive system

Download PDF

Zhiqiao Zhang¹^na1,
Liwen Huang¹^na1,
Jing Li¹^na1 &
…
Peng Wang¹

4484 Accesses
18 Citations
3 Altmetric
Explore all metrics

Abstract

Objectives

Immune microenvironment was closely related to the occurrence and progression of colorectal cancer (CRC). The objective of the current research was to develop and verify a Machine learning survival predictive system for CRC based on immune gene expression data and machine learning algorithms.

Methods

The current study performed differentially expressed analyses between normal tissues and tumor tissues. Univariate Cox regression was used to screen prognostic markers for CRC. Prognostic immune genes and transcription factors were used to construct an immune-related regulatory network. Three machine learning algorithms were used to create an Machine learning survival predictive system for CRC. Concordance indexes, calibration curves, and Brier scores were used to evaluate the performance of prognostic model.

Results

Twenty immune genes (BCL2L12, FKBP10, XKRX, WFS1, TESC, CCR7, SPACA3, LY6G6C, L1CAM, OSM, EXTL1, LY6D, FCRL5, MYEOV, FOXD1, REG3G, HAPLN1, MAOB, TNFSF11, and AMIGO3) were recognized as independent risk factors for CRC. A prognostic nomogram was developed based on the previous immune genes. Concordance indexes were 0.852, 0.778, and 0.818 for 1-, 3- and 5-year survival. This prognostic model could discriminate high risk patients with poor prognosis from low risk patients with favorable prognosis.

Conclusions

The current study identified twenty prognostic immune genes for CRC patients and constructed an immune-related regulatory network. Based on three machine learning algorithms, the current research provided three individual mortality predictive curves. The Machine learning survival predictive system was available at: https://zhangzhiqiao8.shinyapps.io/Artificial_Intelligence_Survival_Prediction_for_CRC_B1005_1/, which was valuable for individualized treatment decision before surgery.

View this article's peer review reports

Identification and validation of a prognostic signature based on six immune-related genes for colorectal cancer

Article Open access 28 May 2024

Identification of a novel immune prognostic model in gastric cancer

Article 28 August 2020

An immune-related gene signature for predicting survival and immunotherapy efficacy in hepatocellular carcinoma

Article 21 October 2020

Introduction

The latest research showed that colorectal cancer (CRC) was the fourth most common cancer in the world, resulting in 1,096,601 new cases and 551,269 deaths in 2018 [1]. Although great progress has been made in diagnosis and treatment of CRC, global data demonstrated that the mortality was still unsatisfactory for CRC patients [2]. Alterations of chromosomal copy number, gene methylation, and gene expression were involved in the occurrence and progress of CRC, leading to huge heterogeneity of prognosis in CRC patients [3, 4]. Due to the huge demand for predicting the prognosis of patients with colorectal cancer, different research teams have established prognostic models for patients with colorectal cancer based on different prognostic markers [5,6,7]. However, the calculation formulas of these exquisite prognostic models are complex, which seriously restricts the popularization and application of clinical practice. Due to the huge heterogeneity of prognosis in CRC patients, a single biomarker was not enough to provide accurate prognostic information for CRC patients. More importantly, most of the current prognostic models could only predict the prognosis for a special group, but could not predict the prognosis for an individual patient [8, 9]. From the patient's point of view, mortality risk predicted percentage for an individual patient is more valuable and important than that for a special group. Therefore, it is necessary and valuable to construct predictive models for providing individual mortality risk prediction.

A large number of molecular biological evidences have confirmed that genes played important roles in the endogenous regulation of tumorigenesis and progression [10,11,12,13]. Immune microenvironment was closely related to tumor development, progression and prognosis [14, 15]. Several studies have explored the potential roles of immune genes in the prognosis of CRC [16,17,18]. Two immune-related prognostic models were developed for predicting prognosis of CRC patients [19, 20]. Hu et al. established a prognostic model of colorectal cancer through CEACAM8+ neutrophils, CD3+, CD8+ T lymphocytes and FOXP3 + regulatory T cells [19]. Zhou et al. established a prognostic immune risk score for stage I–III colon cancer patients with an area Under the receiver operating characteristic curve of 0.741 in train dataset for 5-year mortality [20]. However, these two models failed to provide individual mortality risk prediction for a specific patient.

Machine learning has been applied to medical image recognition, diagnosis and prognosis [21, 22]. Kawakami et al. used different machine learning algorithms to predict the clinical stage and pathological type for ovarian cancer patients [23]. Enshaei et al. created an machine learning model to predict the prognosis of ovarian cancer patients [24]. These studies provided new insights for the applications of machine learning in diagnosis and prediction. However, to date, there is no clinical study on machine learning model for predicting the individualized mortality risk for various tumors.

Our research team was committed to develop precision medical predictive tools for predicting the individualized mortality risk for different tumors [25,26,27,28,29,30,31,32]. Inspired by the above machine learning researches, we planned to build and verify an machine learning survival predictive system to predict the individual mortality risk based on machine learning algorithms and immune genes for CRC patients.

Methods

Study datasets

TCGA dataset involved 20,236 mRNAs and 521 CRC patients. The original expression values were log2 transformed. GSE39582 dataset involved 556 CRC patients and 23,494 mRNAs [33]. Probe IDs were generated on GPL570 platform and gene symbols were determined by Gencode.v29. Flow chart (Additional file 5: Fig. S1) displayed the flow chart of the current study. For survival analysis, GSE39582 dataset was used as model dataset and TCGA dataset was used as validation dataset.

Differentially expressed analyses

Differentially expressed analyses were performed between 480 tumor samples and 41 normal samples. Log₂ |fold change|> 1 and P value < 0.05 were defined as cut off values. Package “edgeR” was used to normalize the original expression values with Trimmed mean of M values method [34].

Immune gene

Immune genes were determined in Immunology Database and Analysis Portal database [35]. Cistrome Cancer database was used to search transcription factors [36]. To screen transcription factors highly related with immune genes, |correlation coefficient|> 0.5 and P value < 0.01 were defined as cut-off values. Gene biological processes were identified through TISIDB database. Tumor immune infiltration indexes were calculated through single sample gene set enrichment analysis [37, 38].

Introduction of regression algorithms

The prediction of mortality risk based on individual level is helpful to optimize the level of individualized treatment for cancer patients. In order to provide the mortality probability of a special individual patient at all time points, some extended regression algorithms, including Cox proportional hazard regression model, Random Survival Forest model, and Multi-Task Logistic Regression model, were used to provide individual mortality risk curves of cancer patients [39].

Cox proportional hazard regression algorithm

Cox proportional hazard regression model was carried out according to the original articles [40, 41]. The advantage of Cox proportional hazards regression analysis is that it can be applied to both measurement variables and classification variables. Meanwhile, Cox proportional hazards model can simultaneously show the impact of multiple independent variables on survival outcome.

Random survival forest algorithm

Random survival forest is an integrated algorithm based on the combination of multiple decision trees with the following advantages: handling capacity of non-linear effect; evaluation of variable relative importance and selection of important variables according to the given threshold; exploration of the relationship between included variables and study outcomes [42, 43]. Based on the samples in original cohort, bootstrap method was used to construct a lot of new trees for training the random survival forest [44]. For each branch node, the best combination of variables used to split the branch is generated based on the principle of maximizing the difference between the next branch groups. Random survival forest has been used in clinical research and showed good application ability in variable selection and outcome prediction [43,44,45,46].

Multi-task logistic regression algorithm

Multi-task logistic regression (MTLR) has been proposed for clinical medicine through combining multiple logistic regression models in a dependent way to establish a predictive function [47]. MTLR model can be used to predict the survival probability of an individual in a certain time range. MTLR model was superior to logistic regression model in goodness of fit and prediction performance [48]. Other details of machine learning algorithms could be found in our previous studies [25, 27,28,29,30,31,32, 49].

Statistical analyses

Statistical analyses were carried out by SPSS Statistics 19.0 (SPSS Inc., USA). Machine learning and bioinformatics analyses were performed by Python language and R software language with appropriate packages and corresponding algorithms [25, 27,28,29,30,31,32, 49]. The top important packages included pec, rms, survival, rmda, ggplot2, GOplot, timereg, randomForestSRC, and riskRegression.

Results

Study datasets

Table 1 displayed clinical features of CRC patients. Ninety-eight patients out of 428 patients died in TCGA dataset (validation) and 187 patients out of 556 patients died in GSE39582 dataset (model dataset).

Table 1 Clinical features of included patients

Full size table

Differentially expressed analyses

There were 4087 mRNAs identified by differentially expressed analyses in TCGA cohort. Meanwhile, there were 3588 immune genes identified in TCGA cohort. A total of 1384 differentially expressed immune genes were found after intersecting the datasets of differentially expressed genes and immune genes. Volcano chart (Additional file 5: Fig. S2A) identified 1384 differentially expressed immune genes (779 up-regulation and 605 down-regulation).

Functional enrichment analyses

Gene Ontology chord chart (Fig. 1) and Bar chart (Additional file 5: Fig. S2B) showed that biological processes of immune genes were mainly enriched in: positive regulation of MAPK cascade, regulation of apoptotic signaling pathway, regulation of DNA-binding transcription factor activity, positive regulation of establishment of protein localization, leukocyte differentiation, regulation of leukocyte activation, cell recognition, positive regulation of stress-activated MAPK cascade, positive regulation of stress-activated protein kinase signaling cascade, and regulation of intrinsic apoptotic signaling pathway.

Immune regulatory network

The original gene expression values were translated into '1' (as high expression) and '0' (as low expression) according to median values for both GSE39582 dataset and TCGA dataset. Univariate Cox regression identified 119 immune genes as prognostic biomarkers for overall survival (OS). Transcription factors that highly related with prognostic immune genes were identified according to previous thresholds. The associations among immune mRNAs and transcription factors were determined in STRING database. The regulatory network among immune genes and transcription factors was depicted by cytoscape v3.6.1 (Fig. 2).

Variable selection process

The current study first explored the relative importance of different independent variables through the random survival forest package. The top 30 important prognostic immune genes were displayed in Fig. 3. We puted the genes with potential prognostic value found in the random survival forest into the multivariate Cox proportional hazard regression model to further investigate the independent prognostic risk factors of tumor patients. Through the step-by-step iterative method of multivariate COX proportional hazard regression, we explored and ascertained the optimal prognostic model with the highest C index among different gene combinations. The final machine learning survival predictive system was established based on these prognostic genes in optimal prognostic model by using different machine learning algorithms.

Construction of prognostic model

Multivariate Cox regression identified twenty independent prognostic mRNAs for OS (Table2; Fig. 4). The formula of prognostic model was as following: Prognostic score = (− 0.542 * BCL2L12) + (0.479 * FKBP10) +

Table 2 Information of prognostic immune genes

Full size table

(− 0.347 * XKRX) + (0.597 * WFS1) + (− 0.768 * TESC) + (− 0.739 * CCR7) + (− 0.624 * SPACA3) + (0.628 * LY6G6C) + (0.530 * L1CAM) + (0.709 * OSM) + (− 0.460 * EXTL1) + (0.602 * LY6D) + (0.583 * FCRL5) + (− 0.527 * MYEOV) + (0.618 * FOXD1) + (− 0.389 * REG3G) + (0.433 * HAPLN1) + (− 0.472 * MAOB) + (− 0.439 * TNFSF11) + (− 0.425 * AMIGO3). A prognostic nomogram was showed in Fig. 5. Therefore RFS model, MTLR model, and Cox model were all based on the previous 20 independent prognostic genes.

Additional file 5: Fig. S3 showed there were significant differences between survival curves of two subgroups for twenty immune mRNAs. Additional file 5: Fig. S4 and Fig. S5 were predictive value distribution chart and survival status scatter chart performed by ggplot2 package, indicating that CRC patients with high prognostic scores tend to have a shorter survival time.

Performance of cox model in model cohort

Survival curve chart (Fig. 6A) indicated that there were significant differences between two groups for prognostic model. Concordance indexes were 0.852, 0.778, and 0.818 for 1-year, 3-year, and 5-year survival (Fig. 6B). Calibration curves (Additional file 5: Fig. S6) showed good agreements between predicted mortality and actual mortality.

Performance of cox model in validation cohort

Survival curves (Fig. 7A) demonstrated the mortality of high risk group was significantly poorer than that of low-risk group. Concordance indexes were 0.894, 0.866, and 0.769 for 1-year, 3-year, and 5-year survival (Fig. 7B). Additional file 5: Fig. S7 showed calibration curves of validation cohort.

Correlation analyses

Correlation analyses (Fig. 8) showed prognostic score was positively correlated with pathological stage, the American Joint Committee on Cancer (AJCC) PM, AJCC PT, and AJCC PT. Additional file 5: Fig. S8 presented correlation significance between clinical variables and immune genes.

Independence assessment

Prognostic model, AJCC PM, and age were independent risk factors for OS in model cohort (Table 3). In validation cohort, prognostic model, AJCC PM, AJCC PT, and age were ascertained to be independent risk factors for OS.

Table 3 Results of cox regression analyses

Full size table

Subgroup analyses

Subgroup analyses were performed to explore the discriminate ability of prognostic model in different pathological stages. The results showed that the prognostic model has reliable discriminative ability in all pathological stages for model group and validation group (Fig. 9).

Random survival forest model

Random survival forest (RFS) model was build for predicting OS based on previous immune genes. Random survival forest error rate chart (Additional file 5: Fig. S9) indicated that the model error rate dynamic changes according to different tree numbers. The predictive performance of RFS model was summarized in Additional file 5: Fig. S10.

Survival curves (Additional file 5: Fig. S11A) demonstrated the mortality of high risk group was significantly higher than that of low-risk group. Concordance indexes were 0.890, 0.869, and 0.899 for 1-year, 3-year, and 5-year survival (Additional file 5: Fig. S11B). Additional file 5: Fig. S12 showed calibration curves of RFS model.

Multi-task logistic regression model

We further constructed Multi-task logistic regression (MTLR) model to predict OS for CRC patients. Survival curves (Additional file 5: Fig. S13A) demonstrated the mortality of high risk group was significantly higher than that of low-risk group. Concordance indexes were 0.841, 0.780, and 0.826 for 1-year, 3-year, and 5-year survival (Additional file 5: Fig. S13B). Additional file 5: Fig. S15 showed calibration curves of MTLR model.

Comparisons of three prognostic models

Figure 10 demonstrated the dynamic changes of areas under the receiver operating characteristic curves for three prognostic models, suggesting that RFS model was superior to MTLR model and Cox model (The solid line represents the AUROC value, and the dash line represents the 95% confidence interval of the AUROC value in Fig. 10). Time dependent ROC curve analyses suggested that concordance index of RFS model was superior to that of MTLR model and Cox model for 1-year, 3-year, and 5-year survival (Fig. 11). The further comparisons demonstrated that the concordance index of RFS model was superior to that of Cox model except for 12 months, whereas concordance index of RFS model was superior to that of MTLR model for all time points (Table 4). The Brier score of RFS model, MTLR model, and Cox model were 0.144, 0.208, and 0.150, indicating diagnostic accuracy of RFS model was superior to that of MTLR model and Cox mode.

Table 4 Comparison of areas under receiver operating characteristic curves

Full size table

Machine learning survival predictive system

Machine learning survival predictive system was constructed for individual mortality risk prediction for CRC patients (Fig. 12), which was available at: https://zhangzhiqiao8.shinyapps.io/Artificial_Intelligence_Survival_Prediction_for_CRC_B1005_1/.

Machine learning survival predictive system provided individualized mortality risk predictive curve based on three machine learning algorithms: RFS model (Fig. 12A), MTLR model (Fig. 12B), and Cox model (Fig. 12C). Additionally, MTLR algorithm further provided median survival time in Fig. 12B. Cox survival regression algorithm provided predicted mortality percentage and 95% confidence interval for selected time points in Fig. 12D.

Gene survival analysis screen system

Gene Survival Analysis Screen System was constructed for exploratory research of immune genes (Additional file 5: Fig. S15), which was available at: https://zhangzhiqiao8.shinyapps.io/Gene_Survival_Subgroup_Analysis_18_CRC_B1005/.

Shapley additive instruction

Shapley additive instruction (SHAP) is a method that can be used to interpret the output of machine learning models. In order to show the importance of included prognostic genes in the prognostic model and its effect on prognosis, we drew the SHAP values of 20 included prognostic genes for each patient. The SHAP value distribution chart of different genes showed the direction and degree of the influence of each prognostic gene on the output of the model (Fig. 13). Each point in the Fig. 13 represents an individual patient. Red represents a high SHAP value, and blue represents a lower SHAP value.

Discussion

The current study identified twenty immune genes as prognostic markers for overall survival of colorectal cancer. Through protein–protein interaction regulatory network, the current research described potential regulatory relationships among immune genes and transcription factors. Through three machine learning algorithms, the current research established an individual mortality risk predictive system for CRC patients. Based on individual mortality risk curves predicted by three machine learning algorithms, our machine learning survival predictive system could accurately predict the individual mortality risk of CRC patients.

The previous prognostic models provided predicted mortality percentages for different subgroups, but not the individual mortality risk curve for a special patient [23, 24]. Based on different machine learning algorithms, the current study provided three individual mortality risk predictive curves. The results of three individual mortality risk predictive curves were similar to a certain extent, providing a reliable individual mortality risk predictive method for CRC patients. Meanwhile, the current study further provided median survival time, predicted mortality percentage, and 95% confidence interval, which were superior to previous prognostic models.

As a non-parametric algorithm for Time-to-event data, random survival forest was regarded as a better method for prognostic prediction and variable selection [50, 51]. Random survival forest could solve the multicollinearity problem and was suitable for high dimensional survival data [52]. Because of high flexibility and non-parametric characteristics, random survival forest has been used for biomedical high dimensional survival data [53, 54]. The predictive accuracy of RSF model was superior to that of Cox model in cardiac arrhythmias patients [52]. Similar to the previous study [52], concordance indexes and Brier score suggested that the predictive accuracy of RFS model was superior to that of Cox model in current study. To date, there were few researches on MTLR model for prognostic studied.

Biological processes of immune genes were determined through TISIDB database. Major biological processes of tumor necrosis factor (ligand) superfamily, member 11 (TNFSF11) were leukocyte differentiation, acute inflammatory response, and regulation of leukocyte activation. Major biological processes of regenerating islet-derived 3 gamma (REG3G) were activation of innate immune response, toll-like receptor signaling pathway, and acute inflammatory response. Major biological processes of lymphocyte antigen 6 complex, locus D (LY6D) were leukocyte differentiation, lymphocyte differentiation, and response to stilbenoid. Major biological processes of sperm acrosome associated 3 (SPACA3) were response to virus, phagocytosis, and regulation of leukocyte activation. Major biological processes of chemokine (C–C motif) receptor 7 (CCR7) were dendritic cell chemotaxis, dendritic cell antigen processing and presentation, and establishment of T cell polarity. Major biological processes of BCL2-like 12 (BCL2L12) were aging, negative regulation of peptidase activity, and negative regulation of proteolysis. Major biological processes of FK506 binding protein 10 (FKBP10) were protein peptidyl-prolyl isomerization, protein folding, and peptidyl-proline modification. Major biological processes of tescalcin (TESC) were negative regulation of protein kinase activity, leukocyte differentiation, and protein targeting to membrane. Major biological processes of L1 cell adhesion molecule (L1CAM) were axonogenesis, positive regulation of cell growth, and regulation of cell size. Major biological processes of oncostatin M (OSM) were acute inflammatory response, positive regulation of defense response, and positive regulation of response to external stimulus.

The prognosis of BCL2L12 negative colon cancer patients was significantly poorer than that of BCL2L12 positive colon cancer patients [55]. High CCR7 positive cell density was significantly related to prognosis in colorectal cancer [56]. Colorectal cancer patients with high expression of L1CAM have higher risk of early metastasis [57]. FKBP10 might play an important role in the development of gastric cancer through cell adhesion molecules and extracellular matrix receptors [58]. High expression of HAPLN1 could upregulate the tumorigenicity of mesothelioma [59]. OSM was negative correlated with poor survival in breast cancer patients [60]. LY6D immunoreactivity was related to the invasiveness of ER positive breast cancer patients [61]. MYEOV stimulated the migration of colorectal cancer cells and promoted the proliferation and invasion of colorectal cancer [62]. FOXD1 promoted the progression of colorectal cancer through ERK 1/2 pathway [63].

Previous study suggested that immune microenvironment was closely related to tumorigenesis [14, 64]. F nucleus might inhibit anti-tumor immune response by reducing the density of CD4+ T cells in colorectal cancer [65]. PD-L1 promoted the development of colon cancer by reducing the antitumor immunity of CD8+ T cells [66]. FOXM1 inhibited the maturation of dendritic cells in colorectal cancer [67]. There was a correlation between the activity of natural killer cells and the development of tumor [68]. There was a negative correlation between eosinophil count values and risk of colorectal cancer [69]. Macrophage migration inhibitory factor could regulate the development of colorectal cancer [70]. High mast cell density indicates good prognosis for colon cancer [71]. High expression of monocyte was related to the poor prognosis of CRC patients [72]. Neutrophil to lymphocyte ratio was related with prognosis of colorectal cancer patients [73].

The current research established an individual mortality risk predictive system for CRC patients with the following advantages: First, based on three machine learning algorithms, the current research provided three individual mortality risk predictive curves, which was valuable for individualized treatment decision before surgery. These three prognostic models provided strong support for each other's reliability. Second, the current Machine learning survival predictive system provided median survival time, predicted mortality percentage, and 95% confidence interval, which were important for improving individualized treatment decision.

Shortcomings: First, the mortality rates in model group and validation group were 22.9% and 33.6%, respectively. High censoring rates of study datasets might weaken the convincing power of accuracy evaluation of prognostic models to a certain extent. Second, as a prognostic model, the sample size of the current research was relatively small, which was not enough to provide a convincing conclusion for clinical application. Third, large sample size and high quality follow-up management are very important for tumor long-term prognostic study. However, independent external verification cohorts often require a large sample size, long-term follow-up management and a large amount of research funding. It is very difficult for small research teams to set up a private independent external validation cohort. Therefore we selected external verification cohort (from GEO database) as external validation cohort. Fourth, several important variables, including information of radiotherapy, chemotherapy, and biotherapy, were not included in the current analysis. Fifth, GSE39582 dataset lacks some important basic information such as lymphovascular invasion, vascular invasion, residual tumor, and perineural invasion, affecting the general judgment of the model to a certain extent. Prospective, multicenter, and large sample size clinical studies are helpful to verify the clinical application value of the current prognostic model. Sixth, The tumor samples (n = 480) and normal samples (n = 41) are highly imbalanced in TCGA cohort for differentially expressed analyses. The sample imbalance may affect the results of differential expression analysis to some extent, thus affecting the differentially expressed genes. Considering the problem of sample imbalance, the differentially expressed genes in the current study need to be confirmed by larger sample size and more balanced data set.

Conclusion

In conclusion, the current study identified twenty prognostic immune genes for CRC patients and constructed an immune-related regulatory network. Based on three machine learning algorithms, the current research provided three individual mortality predictive curves. The Machine learning survival predictive system was available at: https://zhangzhiqiao8.shinyapps.io/Artificial_Intelligence_Survival_Prediction_for_CRC_B1005_1/, which was valuable for individualized treatment decision before surgery.

Availability of data and materials

The study data is available at: https://zhangzhiqiao8.shinyapps.io/Gene_Survival_Subgroup_Analysis_18_CRC_B1005/.

Abbreviations

CRC:: Colorectal cancer
TCGA:: The cancer genome atlas
GEO:: The gene expression omnibus
ROC:: Receiver operating characteristic
DFS:: Disease free survival
HR:: Hazard ratio
CI:: Confidence interval
AJCC:: The American Joint Committee on Cancer
SD:: Standard deviation
MTLR:: Multi-task logistic regression
RFS:: Random survival forest

References

Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424.
Article PubMed Google Scholar
Arnold M, Sierra MS, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global patterns and trends in colorectal cancer incidence and mortality. Gut. 2017;66(4):683–91.
Article PubMed Google Scholar
Li K, Zeng L, Wei H, Hu J, Jiao L, Zhang J, Xiong Y. Identification of gene-specific DNA methylation signature for colorectal cancer. Cancer Genet. 2018;228–229:5–11.
Article PubMed CAS Google Scholar
Berg KCG, Sveen A, Holand M, Alagaratnam S, Berg M, Danielsen SA, Nesbakken A, Soreide K, Lothe RA. Gene expression profiles of CMS2-epithelial/canonical colorectal cancers are largely driven by DNA copy number gains. Oncogene. 2019;38(33):6109–22.
Article CAS PubMed PubMed Central Google Scholar
Miao Y, Zhang H, Su B, Wang J, Quan W, Li Q, Mi D. Construction and validation of an RNA-binding protein-associated prognostic model for colorectal cancer. PeerJ. 2021;9:e11219.
Article PubMed PubMed Central CAS Google Scholar
Qian Y, Wei J, Lu W, Sun F, Hwang M, Jiang K, Fu D, Zhou X, Kong X, Zhu Y, et al. Prognostic risk model of immune-related genes in colorectal cancer. Frontiers in genetics. 2021;12:619611.
Article CAS PubMed PubMed Central Google Scholar
Björkman K, Jalkanen S, Salmi M, Mustonen H, Kaprio T, Kekki H, Pettersson K, Böckelman C, Haglund C. A prognostic model for colorectal cancer based on CEA and a 48-multiplex serum biomarker panel. Sci Rep. 2021;11(1):4287.
Article PubMed PubMed Central CAS Google Scholar
Zuo S, Dai G, Ren X. Identification of a 6-gene signature predicting prognosis for colorectal cancer. Cancer Cell Int. 2019;19:6.
Article PubMed PubMed Central Google Scholar
Zhang L, Chen S, Wang B, Su Y, Li S, Liu G, Zhang X. An eight-long noncoding RNA expression signature for colorectal cancer patients’ prognosis. J Cell Biochem. 2019;120(4):5636–43.
Article CAS PubMed Google Scholar
Zeng J, Cai X, Hao X, Huang F, He Z, Sun H, Lu Y, Lei J, Zeng W, Liu Y, et al. LncRNA FUNDC2P4 down-regulation promotes epithelial-mesenchymal transition by reducing E-cadherin exp ression in residual hepatocellular carcinoma after insufficient radiofrequency ablation. Int J Hyperthermia. 2018;34(6):802–11.
Article CAS PubMed Google Scholar
Zhong X, Long Z, Wu S, Xiao M, Hu W. LncRNA-SNHG7 regulates proliferation, apoptosis and invasion of bladder cancer cells assurance guidel ines. J Buon. 2018;23(3):776–81.
PubMed Google Scholar
Shi X, Zhao Y, He R, Zhou M, Pan S, Yu S, Xie Y, Li X, Wang M, Guo X, et al. Three-lncRNA signature is a potential prognostic biomarker for pancreatic adenocarcinoma. Oncotarget. 2018;9(36):24248–59.
Article PubMed PubMed Central Google Scholar
Huang Y, Xiang B, Liu Y, Wang Y, Kan H. LncRNA CDKN2B-AS1 promotes tumor growth and metastasis of human hepatocellular carcinoma by targeting let-7c-5p/NAP1L1 axis. Cancer Lett. 2018;437:56–66.
Article CAS PubMed Google Scholar
Pags F, Galon J, Dieu-Nosjean MC, Tartour E. Immune infiltration in human tumors: a prognostic factor that should not be ignored. Oncogene. 2010;29(8):1093–102.
Article CAS Google Scholar
Domingues P, Gonzlez-Tablas M, Otero PD, Miranda D, Ruiz L, Sousa P, Ciudad J, Gonalves JM, Lopes MC, et al. Tumor infiltrating immune cells in gliomas and meningiomas. Brain Behav Immun. 2016;53:1–15.
Article CAS PubMed Google Scholar
Narayanan S, Kawaguchi T, Peng X, Qi Q, Liu S, Yan L, Takabe K. Tumor infiltrating lymphocytes and macrophages improve survival in microsatellite unstable colorectal cancer. Sci Rep. 2019;9(1):13455.
Article PubMed PubMed Central CAS Google Scholar
Zhang L, Zhao Y, Dai Y, Cheng JN, Gong Z, Feng Y, Sun C, Jia Q, Zhu B. Immune landscape of colorectal cancer tumor microenvironment from different primary tumor location. Front Immunol. 2018;9:1578.
Article PubMed PubMed Central CAS Google Scholar
Mao Y, Feng Q, Zheng P, Yang L, Zhu D, Chang W, Ji M, He G, Xu J. Low tumor infiltrating mast cell density confers prognostic benefit and reflects immunoactivation in colorectal cancer. Int J Cancer. 2018;143(9):2271–80.
Article CAS PubMed Google Scholar
Hu X, Li YQ, Ma XJ, Zhang L, Cai SJ, Peng JJ. A risk signature with inflammatory and t immune cells infiltration in colorectal cancer predicting distant metastases and efficiency of chemotherapy. Front Oncol. 2019;9:704.
Article PubMed PubMed Central Google Scholar
Zhou R, Zhang J, Zeng D, Sun H, Rong X, Shi M, Bin J, Liao Y, Liao W. Immune cell infiltration as a biomarker for the diagnosis and prognosis of stage I-III colon cancer. Cancer Immunol Immunother CII. 2019;68(3):433–42.
Article CAS PubMed Google Scholar
Tran WT, Jerzak K, Lu FI, Klein J, Tabbarah S, Lagree A, Wu T, Rosado-Mendez I, Law E, Saednia K, et al. Personalized breast cancer treatments using artificial intelligence in radiomics and pathomics. J Med Imaging Radiat Sci. 2019;50:S32.
Article PubMed Google Scholar
Nir G, Karimi D, Goldenberg SL, Fazli L, Skinnider BF, Tavassoli P, Turbin D, Villamil CF, Wang G, Thompson DJS, et al. Comparison of artificial intelligence techniques to evaluate performance of a classifier for automatic grading of prostate cancer from digitized histopathologic images. JAMA Netw Open. 2019;2(3):e190442.
Article PubMed PubMed Central Google Scholar
Kawakami E, Tabata J, Yanaihara N, Ishikawa T, Koseki K, Iida Y, Saito M, Komazaki H, Shapiro JS, Goto C, et al. Application of artificial intelligence for preoperative diagnostic and prognostic prediction in epithelial ovarian cancer based on blood biomarkers. Clin Cancer Res Off J Am Assoc Cancer Res. 2019;25(10):3006–15.
Article CAS Google Scholar
Enshaei A, Robson CN, Edmondson RJ. Artificial intelligence systems as prognostic and predictive tools in ovarian cancer. Ann Surg Oncol. 2015;22(12):3970–5.
Article CAS PubMed Google Scholar
Zhang Z, Li J, He T, Ouyang Y, Huang Y, Liu Q, Wang P, Ding J. The competitive endogenous RNA regulatory network reveals potential prognostic biomarkers for overall survival in hepatocellular carcinoma. Cancer Sci. 2019;110(9):2905–23.
Article CAS PubMed PubMed Central Google Scholar
Zhang Z, Ouyang Y, Huang Y, Wang P, Li J, He T, Liu Q. Comprehensive bioinformatics analysis reveals potential lncRNA biomarkers for overall survival in pat ients with hepatocellular carcinoma: an on-line individual risk calculator based on TCGA cohort. Cancer Cell Int. 2019;19:174.
Article PubMed PubMed Central CAS Google Scholar
Cheng C, Wang Q, Zhu M, Liu K, Zhang Z. Integrated analysis reveals potential long non-coding RNA biomarkers and their potential biological functions for disease free survival in gastric cancer patients. Cancer Cell Int. 2019;19:123.
Article PubMed PubMed Central Google Scholar
Zhang Z, He T, Huang L, Ouyang Y, Li J, Huang Y, Wang P, Ding J. Two precision medicine predictive tools for six malignant solid tumors: from gene-based research to clinical application. J Transl Med. 2019;17(1):405.
Article CAS PubMed PubMed Central Google Scholar
Zhang Z, Li J, He T, Ding J. Bioinformatics identified 17 immune genes as prognostic biomarkers for breast cancer: application study based on artificial intelligence algorithms. Front Oncol. 2020;10:330.
Article PubMed PubMed Central Google Scholar
Zhang Z, Li J, He T, Ouyang Y, Huang Y, Liu Q, Wang P, Ding J. Two predictive precision medicine tools for hepatocellular carcinoma. Cancer Cell Int. 2019;19:290.
Article PubMed PubMed Central CAS Google Scholar
Zhang Z, Liu Q, Wang P, Li J, He T, Ouyang Y, Huang Y, Wang W. Development and internal validation of a nine-lncRNA prognostic signature for prediction of overall survival in colorectal cancer patients. PeerJ. 2018;6:e6061.
Article PubMed PubMed Central CAS Google Scholar
Zhu M, Wang Q, Luo Z, Liu K, Zhang Z. Development and validation of a prognostic signature for preoperative prediction of overall survival in gastric cancer patients. Onco Targets Ther. 2018;11:8711–22.
Article CAS PubMed PubMed Central Google Scholar
Marisa L, de Reyniès A, Duval A, Selves J, Gaub MP, Vescovo L, Etienne-Grimaldi MC, Schiappa R, Guenot D, Ayadi M, et al. Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value. PLoS Med. 2013;10(5):e1001453.
Article CAS PubMed PubMed Central Google Scholar
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40.
Article CAS PubMed Google Scholar
Bhattacharya S, Andorf S, Gomes L, Dunn P, Schaefer H, Pontius J, Berger P, Desborough V, Smith T, Campbell J, et al. ImmPort: disseminating data to the public for the future of immunology. Immunol Res. 2014;58(2–3):234–9.
Article CAS PubMed Google Scholar
Mei S, Meyer CA, Zheng R, Qin Q, Wu Q, Jiang P, Li B, Shi X, Wang B, Fan J, et al. Cistrome cancer: a web resource for integrative gene regulation modeling in cancer. Cancer Res. 2017;77(21):e19–22.
Article CAS PubMed PubMed Central Google Scholar
Jia Q, Wu W, Wang Y, Alexander PB, Sun C, Gong Z, Cheng JN, Sun H, Guan Y, Xia X, et al. Local mutational diversity drives intratumoral immune heterogeneity in non-small cell lung cancer. Nat Commun. 2018;9(1):5361.
Article CAS PubMed PubMed Central Google Scholar
Charoentong P, Finotello F, Angelova M, Mayer C, Efremova M, Rieder D, Hackl H, Trajanoski Z. Pan-cancer immunogenomic analyses reveal genotype-immunophenotype relationships and predictors of res ponse to checkpoint blockade. Cell Rep. 2017;18(1):248–62.
Article CAS PubMed Google Scholar
Haider H, Hoehn B, Davis S, Greiner R. Effective ways to build and evaluate individual survival distributions. J Mach Learn Res. 2020;21:1–63.
Google Scholar
Ld F, Dy L. Time-dependent covariates in the Cox proportional-hazards regression model. Annu Rev Public Health. 1999;20:145–57.
Article Google Scholar
Jl K, et al. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol. 2018;18(1):24.
Article Google Scholar
Xu H, Gu X, Tadesse MG, Balasubramanian R. A modified random survival forests algorithm for high dimensional predictors and self-reported outcomes. J Comput Gr Stat Joint Publ Am Stat Assoc Inst Math Stat Interface Found N Am. 2018;27(4):763–72.
Google Scholar
Nasejje JB, Mwambi H. Application of random survival forests in understanding the determinants of under-five child mortality in Uganda in the presence of covariates that satisfy the proportional and non-proportional hazards assumption. BMC Res Notes. 2017;10(1):459.
Article PubMed PubMed Central Google Scholar
Hsich E, Gorodeski EZ, Blackstone EH, Ishwaran H, Lauer MS. Identifying important risk factors for survival in patient with systolic heart failure using random survival forests. Circ Cardiovasc Qual Outcomes. 2011;4(1):39–45.
Article PubMed Google Scholar
Ruyssinck J, van der Herten J, Houthooft R, Ongenae F, Couckuyt I, Gadeyne B, Colpaert K, Decruyenaere J, De Turck F, Dhaene T. Random survival forests for predicting the bed occupancy in the intensive care unit. Comput Math Methods Med. 2016;2016:7087053.
Article PubMed PubMed Central Google Scholar
Hamidi O, Poorolajal J, Farhadian M, Tapak L. Identifying important risk factors for survival in kidney graft failure patients using random survival forests. Iran J Public Health. 2016;45(1):27–33.
PubMed PubMed Central Google Scholar
Alaeddini A, Hong SH. A multi-way multi-task learning approach for multinomial logistic regression: an application in joint prediction of appointment miss-opportunities across multiple clinics. Methods Inf Med. 2017;56(4):294–307.
Article PubMed PubMed Central Google Scholar
Bisaso KR, Karungi SA, Kiragga A, Mukonzo JK, Castelnuovo B. A comparative study of logistic regression based machine learning techniques for prediction of early virological suppression in antiretroviral initiating HIV patients. BMC Med Inform Decis Mak. 2018;18(1):77.
Article PubMed PubMed Central Google Scholar
Zhang Z, Ouyang Y, Huang Y, Wang P, Li J, He T, Liu Q. Comprehensive bioinformatics analysis reveals potential lncRNA biomarkers for overall survival in patients with hepatocellular carcinoma: an on-line individual risk calculator based on TCGA cohort. Cancer Cell Int. 2019;19:174.
Article PubMed PubMed Central CAS Google Scholar
Shi M, Xu G. Development and validation of GMI signature based random survival forest prognosis model to predict clinical outcome in acute myeloid leukemia. BMC Med Genomics. 2019;12(1):90.
Article PubMed PubMed Central CAS Google Scholar
Wang H, Liu D, Yang J. Prognostic risk model construction and molecular marker identification in glioblastoma multiforme based on mRNA/microRNA/long non-coding RNA analysis using random survival forest method. Neoplasma. 2019;66(3):459–69.
Article CAS PubMed Google Scholar
Adham D, Abbasgholizadeh N, Abazari M. Prognostic factors for survival in patients with gastric cancer using a random survival forest. Asian Pac J Cancer Prev APJCP. 2017;18(1):129–34.
PubMed Google Scholar
Wang H, Li G. A selective review on random survival forests for high dimensional data. Quant Biosci. 2017;36(2):85–96.
CAS PubMed PubMed Central Google Scholar
Wang H, Shen L, Geng J, Wu Y, Xiao H, Zhang F, Si H. Prognostic value of cancer antigen -125 for lung adenocarcinoma patients with brain metastasis: a random survival forest prognostic model. Sci Rep. 2018;8(1):5670.
Article PubMed PubMed Central CAS Google Scholar
Kontos CK, Papadopoulos IN, Scorilas A. Quantitative expression analysis and prognostic significance of the novel apoptosis-related gene BCL2L12 in colon cancer. Biol Chem. 2008;389(12):1467–75.
Article CAS PubMed Google Scholar
Malietzis G, Lee GH, Bernardo D, Blakemore AI, Knight SC, Moorghen M, Al-Hassi HO, Jenkins JT. The prognostic significance and relationship with body composition of CCR7-positive cells in colorectal cancer. J Surg Oncol. 2015;112(1):86–92.
Article CAS PubMed Google Scholar
Tampakis A, Tampaki EC, Nonni A, Tsourouflis G, Posabella A, Patsouris E, Kontzoglou K, von Flue M, Nikiteas N, Kouraklis G. L1CAM expression in colorectal cancer identifies a high-risk group of patients with dismal prognosis already in early-stage disease. Acta Oncol (Stockholm, Sweden). 2019;59:1–5.
Google Scholar
Liang L, Zhao K, Zhu JH, Chen G, Qin XG, Chen JQ. Comprehensive evaluation of FKBP10 expression and its prognostic potential in gastric cancer. Oncol Rep. 2019;42(2):615–28.
CAS PubMed PubMed Central Google Scholar
Ivanova AV, Goparaju CM, Ivanov SV, Nonaka D, Cruz C, Beck A, Lonardo F, Wali A, Pass HI. Protumorigenic role of HAPLN1 and its IgV domain in malignant pleural mesothelioma. Clin Cancer Res Off J Am Assoc Cancer Res. 2009;15(8):2602–11.
Article CAS Google Scholar
Tawara K, Scott H, Emathinger J, Wolf C, LaJoie D, Hedeen D, Bond L, Montgomery P, Jorcyk C. HIGH expression of OSM and IL-6 are associated with decreased breast cancer survival: synergistic induction of IL-6 secretion by OSM and IL-1beta. Oncotarget. 2019;10(21):2068–85.
Article PubMed PubMed Central Google Scholar
Mayama A, Takagi K, Suzuki H, Sato A, Onodera Y, Miki Y, Sakurai M, Watanabe T, Sakamoto K, Yoshida R, et al. OLFM4, LY6D and S100A7 as potent markers for distant metastasis in estrogen receptor-positive breast carcinoma. Cancer Sci. 2018;109(10):3350–9.
Article CAS PubMed PubMed Central Google Scholar
Lawlor G, Doran PP, MacMathuna P, Murray DW. MYEOV (myeloma overexpressed gene) drives colon cancer cell migration and is regulated by PGE2. J Exp Clin Cancer Res CR. 2010;29:81.
Article PubMed CAS Google Scholar
Pan F, Li M, Chen W. FOXD1 predicts prognosis of colorectal cancer patients and promotes colorectal cancer progression via the ERK 1/2 pathway. Am J Transl Res. 2018;10(5):1522–30.
CAS PubMed PubMed Central Google Scholar
Gough MJ, Crittenden MR. Immune system plays an important role in the success and failure of conventional cancer therapy. Immunotherapy. 2012;4(2):125–8.
Article PubMed Google Scholar
Chen T, Li Q, Zhang X, Long R, Wu Y, Wu J, Fu X. TOX expression decreases with progression of colorectal cancers and is associated with CD4 T-cell density and Fusobacterium nucleatum infection. Hum Pathol. 2018;79:93–101.
Article CAS PubMed Google Scholar
O’Malley G, Treacy O, Lynch K, Naicker SD, Leonard NA, Lohan P, Dunne PD, Ritter T, Egan LJ, Ryan AE. Stromal cell PD-L1 inhibits CD8(+) T-cell antitumor immune responses and promotes colon cancer. Cancer Immunol Res. 2018;6(11):1426–41.
Article CAS PubMed Google Scholar
Zhou Z, Chen H, Xie R, Wang H, Li S, Xu Q, Xu N, Cheng Q, Qian Y, Huang R, et al. Epigenetically modulated FOXM1 suppresses dendritic cell maturation in pancreatic cancer and colon cancer. Mol Oncol. 2019;13(4):873–93.
Article CAS PubMed PubMed Central Google Scholar
Jung YS, Kwon MJ, Park DI, Sohn CI, Park JH. Association between natural killer cell activity and the risk of colorectal neoplasia. J Gastroenterol Hepatol. 2018;33(4):831–6.
Article CAS PubMed Google Scholar
Prizment AE, Vierkant RA, Smyrk TC, Tillmans LS, Lee JJ, Sriramarao P, Nelson HH, Lynch CF, Thibodeau SN, Church TR, et al. Tumor eosinophil infiltration and improved survival of colorectal cancer patients: Iowa Women’s Health Study. Mod Pathol Off J US Can Acad Pathol. 2016;29(5):516–27.
CAS Google Scholar
Pacheco-Fernandez T, Juarez-Avelar I, Illescas O, Terrazas LI, Hernandez-Pando R, Perez-Plasencia C, Gutierrez-Cirlos EB, Avila-Moreno F, Chirino YI, Reyes JL, et al. Macrophage migration inhibitory factor promotes the interaction between the tumor, macrophages, and T cells to regulate the progression of chemically induced colitis-associated colorectal cancer. Mediators Inflamm. 2019;2019:2056085.
Article PubMed PubMed Central CAS Google Scholar
Mehdawi L, Osman J, Topi G, Sjolander A. High tumor mast cell density is associated with longer survival of colon cancer patients. Acta Oncol (Stockholm, Sweden). 2016;55(12):1434–42.
Article CAS Google Scholar
Wen S, Chen N, Peng J, Ling W, Fang Q, Yin SF, He X, Qiu M, Hu Y. Peripheral monocyte counts predict the clinical outcome for patients with colorectal cancer: a systematic review and meta-analysis. Eur J Gastroenterol Hepatol. 2019;31(11):1313–21.
Article CAS PubMed Google Scholar
Li H, Zhao Y, Zheng F. Prognostic significance of elevated preoperative neutrophil-to-lymphocyte ratio for patients with colorectal cancer undergoing curative surgery: a meta-analysis. Medicine. 2019;98(3):e14126.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We would like to thank Dr. Gary S Collins (University of Oxford), Dr Manali Rupji (Emory University), Mrs Qingmei Liu for help and support on development of Machine learning survival predictive system.

Funding

Foshan Science and Technology Bureau (2020001004584).

Author information

Zhiqiao Zhang, Liwen Huang and Jing Li are co-first authors.

Authors and Affiliations

Department of Infectious Diseases, Shunde Hospital, Southern Medical University, Shunde, Guangdong, China
Zhiqiao Zhang, Liwen Huang, Jing Li & Peng Wang

Authors

Zhiqiao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Liwen Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Li
View author publications
You can also search for this author in PubMed Google Scholar
Peng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, methodology and resources: ZZ, PW, LJ, and LH; Investigation, data curation, formal analysis, validation, software, project administration, and supervision: ZZ, PW, LJ, and LH; Writing and visualization: ZZ and PW; Funding acquisition: ZZ. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Peng Wang.

Ethics declarations

Ethics approval and consent to participate

All studies in TCGA database and GEO database have received ethical approvals from ethics committees of their respective research institutes. These studies obtained informed consent from patients before admission. Details of all patients in public datasets have been anonymously processed and therefore the current research does not involve patients' privacy information. The current study was a second study based on public datasets from TCGA database and GEO database. The current study was performed according to public database policy and declaration of Helsinki. Therefore, ethical approval and informed consent were not applicable according to above reasons.

Consent for publication

All authors reviewed the manuscript and consented for publication. The current manuscript did not contain information or images that could lead to identification of a study participant and therefore it is not applicable for the specific consent to publish the information/image(s) in an online open-access publication.

Competing interests

The authors declare no potential conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Program application manual.

Additional file 2.

Gene enrichment analysis dataset.

Additional file 3.

SHAP application example in python.

Additional file 4.

Statistics analysis example in R language.

Additional file 5.

Supplementary Figure 1-15 (fifteen figures in total).

Additional file 6.

Original dataset for analysis.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Zhang, Z., Huang, L., Li, J. et al. Bioinformatics analysis reveals immune prognostic markers for overall survival of colorectal cancer patients: a novel machine learning survival predictive system. BMC Bioinformatics 23, 124 (2022). https://doi.org/10.1186/s12859-022-04657-3

Download citation

Received: 23 February 2021
Accepted: 11 March 2022
Published: 08 April 2022
DOI: https://doi.org/10.1186/s12859-022-04657-3

Bioinformatics analysis reveals immune prognostic markers for overall survival of colorectal cancer patients: a novel machine learning survival predictive system

Abstract

Objectives

Methods

Results

Conclusions

Similar content being viewed by others

Introduction

Methods

Study datasets

Differentially expressed analyses

Immune gene

Introduction of regression algorithms

Cox proportional hazard regression algorithm

Random survival forest algorithm

Multi-task logistic regression algorithm

Statistical analyses

Results

Study datasets

Differentially expressed analyses

Functional enrichment analyses

Immune regulatory network

Variable selection process

Construction of prognostic model

Performance of cox model in model cohort

Performance of cox model in validation cohort

Correlation analyses

Independence assessment

Subgroup analyses

Random survival forest model

Multi-task logistic regression model

Comparisons of three prognostic models

Machine learning survival predictive system

Gene survival analysis screen system

Shapley additive instruction

Discussion

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation