1 Introduction

At present, colorectal cancer (CRC) is third most prevalent malignancy and a significant cause of cancer-related mortality in the United States, with an estimated five-year survival rate of 65% [1, 2]. Although its initial presentation is often insidious and screening techniques are varied, including endoscopy, stool examination, blood tests, imaging, CRC is often characterized by rapid progression [3, 4]. As a result, a substantial proportion of patients are diagnosed at advanced stages, when the optimal window for therapeutic intervention has already passed. Furthermore, the post-treatment prognosis of CRC patients is marred by the persistent challenges of tumor metastasis and recurrence, which contribute to the suboptimal long-term survival rates [5].

Currently, clinical decision-making for CRC patients predominantly relies on a spectrum of clinical and pathological tumor characteristics, including but not limited to gender, age, TNM (Tumor Node Metastasis) staging, and tumor marker profiles. However, given the pronounced clinical heterogeneity inherent to CRC, these conventional prognosticators fall short in accurately forecasting individual patient outcomes [6]. Concurrently, the advent of gene expression-based predictive models has contributed to significant progress in the assessment of CRC prognosis, exemplified by models that account for fibroblast activity [7], immune-related risk models [8], and the pyroptosis-related models [9].

Tumor microenvironment (TME) oxidative stress mainly originates from tumor epithelial cells and is mainly caused by the accumulation of reactive oxygen species (ROS) in tumor cells [10,11,12]. A low ROS concentration can promote tumor glucose metabolism and maintain the high energy consumption requirements of tumor. A high concentration of ROS can induce apoptosis of tumor cells and inhibit the occurrence and development of tumors [13]. Furthermore, evidence suggests that oxidative stress has plays a role in triggering metabolic and genomic alterations that could potentially accelerate tumor metabolism by modulating the activity of key metabolic enzymes and signaling pathways, thus promoting tumor development [14]. When oxidative stress occurs, many cellular biomolecules, such as deoxyribonucleic acid, lipids and proteins, can cause dysfunction of cells, tissues, and organs due to oxidative damage caused by free radicals [15]. The response of HIF and NRF2 signaling pathways to hypoxia and oxidative stress has become an important target for cancer treatment [16]. Targeting specific genes or pathways that promote oxidative stress-induced cellular senescence during the stage of cancer progression can inhibit cancer progression [17]. Thus, oxidative stress plays an important role in tumor stress, however, the role of oxidative stress-related genes in the prognosis of colon cancer has not been studied.

In this study, we explored the expression of oxidative stress-related genes (OSRGs) in colon cancer (CC) patients using data from a public database—The Cancer Genome Atlas (TCGA). Further, a model of OSRGs closely related to CC prognosis was established, severing as an independent prognostic indicator of CC. Subsequent functional enrichment analysis was performed to reveal the pathways associated with cancer development.

2 Materials and methods

2.1 Human oxidative stress-related genes (OSRGs) dataset

The Genecards database (https://www.genecards.org) was searched for genes with relevance scores greater than 7 that are involved in oxidative stress. A total of 1094 genes were identified, as shown in the supplement.

2.2 Samples and data collected for this study

Clinical data and ribonucleic acid sequencing (RNA-seq) expression data of CC patients were downloaded from the UCSC XENA website (https://xenabrowser.net/datapages). After excluding patients with incomplete clinical data and a follow-up period of less than 30 days, the clinical data and RNA sequencing expression data were obtained. Because this was a retrospective analysis based on a public database, ethical approval was not required.

2.3 Differential expression analysis

Differentially expressed genes (DEGs) was identified using count data based on input of the OSRGs input to the R package “DESeq2” [18]. Specifically, DEGs were defined as those with an absolute log2 (fold-change) value greater than 2 and a significant p value (p < 0.05).

2.4 Univariate COX regression analysis

The “Survival” and “Survminer” R packages were employed to identify OSRGs with prognostic significance. From the initial Genecards database of 1094 genes, a refined subset of 115 prognostic OSRGs was discerned through univariate Cox regression analysis, predicated on their gene expression profiles.

2.5 Construction of ORGs-related prognostic model

Seventeen prognostic genes were identified using Lasso COX regression analysis[19] based on the 115 prognostic OSRGs. After the expression values of each prognostic gene were combined, a risk score formula for each patient was constructed and its estimated regression coefficients were weighted by Lasso COX regression analysis (Table 1). The risk score was calculated as follows: risk score = \({\sum }_{i=1}^{n}{expr}_{i}{\beta }_{i}\), where expr represents the prognostic gene expression value and β represents the Lasso coefficient.

Table 1 The full names and coefficients of the 17 prognostic oxidative stress-related genes

The formula was used to score the risk of each patient in the TCGA cohort, and the patients were divided into a low-risk group and a high-risk group with the median risk score as the cut-off point. The Kaplan–Meier method was used to evaluate the difference in survival between the two groups; a between-groups comparison was performed using the log-rank statistical method. The predictive efficacy of the risk score in forecasting patient prognosis was further scrutinized through a Cox regression analysis, providing a robust statistical framework to evaluate the prognostic value of the model.

2.6 Functional enrichment analysis and genomic interaction analysis

To unravel the biological underpinnings of the prognostic genes, a comprehensive suite of functional enrichment analyses was performed. This included Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Set Enrichment Analysis (GSEA), aimed at identifying pivotal biological pathways and processes.

2.7 Immune infiltration analysis

Using the CIBERSORT deconvolution algorithm, the cellular composition was estimated based on the gene expression profiles, thereby providing quantification of relative abundance of 22 distinct immune cell types within each sample [20, 21].

2.8 Statistical analysis

The Cox proportional risk model was used to analyze univariate and multivariate survival. All statistical analyses were performed in R. All statistical tests were bilateral. p < 0.05 was considered statistically significant.

3 Results

3.1 Identification of differentially expressed OSRGs

RNA-seq expression data from 453 tumor samples and 41 normal samples in the TCGA-COAD dataset were used to identify DEGs. The expression values of 807 OSRGs were extracted. Compared with normal samples, 55 downregulated genes and 32 upregulated genes were identified based on the following criteria: p < 0.05 and [log2 (fold change)] > 2. Among these, the top five genes that were highly expressed in tumor tissue compared to normal tissue are “GDF15”, “MMP3”, “TG”, “TIMP1” and “MMP1”, while the top five genes that were under expressed in tumor samples compared to normal samples were “SGK1”, “PTGS1”, “NR3C2”, “ACADS” and “MYLK”. Scatter plots were constructed to show the expression level of these differentially expressed OSRGs between tumor and normal tissues, as shown in Fig. 1a. A volcano map of the differentially expressed OSRGs is shown in Fig. 1b.

Fig. 1
figure 1

Differentially expressed oxidative stress-related genes between tumor and normal tissues in The Cancer Genome Atlas (TCGA) cohort. a Scatter plots. b Volcano map

3.2 Construction of a model to predict the prognosis of colon cancer patients in the TCGA cohort using the 17 prognostic OSRGs

A total of 17 OSRGs were significantly correlated with prognosis after Lasso COX regression analysis (Fig. 2a, b). The full names of these 17 genes and their coefficients in the formula are shown in Table 1.

Fig. 2
figure 2

Oxidative stress-related genes model in The Cancer Genome Atlas (TCGA) cohort. a, b Lasso COX analysis of prognostic genes. c The number of patients in the different risk groups. d Survival status of patients in the different groups. e Overall survival (OS) was shorter in the high-risk group compared to the low-risk group f ROC analysis of OS for the model and various clinicopathologic factors

The expression coefficients of 17 risk genes were obtained. The prognostic model based on these 17 genes was established using the following formula: Risk Score = 0.209 × expression level of RYR2) + (-0.229 × expression level of GSR) + (0.049 × expression level of GSTM1) + (0.096 × expression level of HSPA1A) + (0.683 × expression level of ACADL) + (0.080 × expression level of STK25) + (0.423 × expression level of ALOX12) + (0.048 × expression level of MAPK12) + (-0.006 × expression level of SERPINA1) + (0.098 × expression level of CYP19A1) + (0.098 × expression level of NOL3) + (0.033 × expression level of NGF) + (-0.006 × expression level of GDF15) + (0.007 × expression level of NTRK2) + (0.108 × expression level of DAPK1) + (0.032 × expression level of UCN) + (0.024 × expression level of HBA2).

Next, the risk score of each CC patient in the TCGA cohort was calculated and the patients were divided into a high-risk group and a low-risk group using the median risk value as the cut-off point (Fig. 2c, d). Overall survival (OS) was found to be shorter in the high-risk group compared to the low-risk group (Fig. 2e).

A receiver operating characteristic curve (ROC) was used to examine the predictive performance of the model. The AUC value of this model was 0.759, which was significantly higher than the AUC values based on age, gender, TNM stage, T, N and M stages (Fig. 2f). These results suggest that the riskscore characteristic is a better predictor of survival in patients with CC than the other clinical and pathological factors.

The associations between the clinicopathological parameters and the risk model were examined (Fig. 3a–d). The results showed that the risk model was differentially expressed across various clinicopathological parameters, such as TNM stage (p < 0.001), T stage (p < 0.001), N stage (p < 0.001) and M stage(p < 0.001).

Fig. 3
figure 3

a Oxidative stress-related genes (OSRGs) model in the cohorts stratified by the depth of tumor invasion. b OSRGs model in the cohorts stratified by lymph node metastasis. c OSRGs model in the cohorts stratified by distant metastasis. d OSRGs model in the cohorts stratified by TNM stage

Kaplan–Meier curves based on the risk score and the other clinical and pathological factors are shown (Fig. 4a–f). A significant difference in survival was observed between the high-risk and low-risk groups (p < 0.001).

Fig. 4
figure 4

Kaplan–Meier curve based on the age (a), gender (b), depth of tumor invasion (c), lymph node metastasis (d), distant metastasis (e), and risk score (f) in The Cancer Genome Atlas (TCGA) cohort

Univariate (Fig. 5a) and multivariate (Fig. 5b) Cox regression analysis were performed to verify the independent predictive ability of the 17-genes model for survival. Univariate Cox analysisfound that risk score (p < 0.001), TNM stage (p < 0.001), T stage (p = 0.003), N stages (p < 0.001) and M stage (p < 0.001) was associated with survival in CC patients. These factors were then included in a multivariate Cox analysis, which confirmed the independent prognostic effects of the risk score characteristics (p < 0.001). A nomogram of the CC predictive model based on the 17-gene riskscore is presented in Fig. 5c.

Fig. 5
figure 5

Univariate (a) and multivariate (b) Cox regression analysis of the survival of colon cancer patients in The Cancer Genome Atlas (TCGA) cohort. c Nomogram of the predictive model based on the 17-gene risk score (Age:1-below 60, 2-over 60; Gender:1-male, 2-female; Stage:1-I, 2-II,3-III, 4-IV; T.Stage:1-T1, 2-T2, 3-T3, 4-T4; N.Stage:1-N0, 2-N1,3-N2; M.Stage:1-M0, 2-M1;Risk:1-Low risk, 2-High risk)

3.3 Kaplan–Meier curves curves based on the 17-genes expression

Firstly, the differential expression of the 17 OSRGs between colon adenocarcinoma tissues and normal tissues was examined. As shown in Fig. 6, STK25, CYP19A1, NOL3, NGF, GDF15, and UCN were upregulated in tumor tissues. GSTM1, HSPA1A, ACADL, DAPK1, MAPK1, and NTRK2 were down-regulated in tumor tissues. In addition, survival curves were calculated and plotted separately for these DEGs between normal and tumor tissues. The significant genes for clinical diagnosis were RYR2, GSR, GSTM1, HSPA1A, ALOX12, SEROINA1, CYP19A1, NOL3, DAPK1, UCN, and HBA2 (Fig. 7). According to these two graphs, the logically correct genes are CYP19A1, NOL3 and UCN; these genes were highly expressed in tumor tissues and have significant clinical predictive significance.

Fig. 6
figure 6

Differential expression of 17 genes between normal tissues and colon adenocarcinoma tissues in The Cancer Genome Atlas (TCGA) cohort

Fig. 7
figure 7

Kaplan–Meier curves based on the expression of the 17 genes between normal tissues and colon adenocarcinoma tissues in The Cancer Genome Atlas (TCGA) cohort

3.4 Functional enrichment based on the 17 genes

Enrichment analysis was conducted based on the risk scores of the 17 OSRGs model, providing a biological understanding of these genes. The GO functional enrichment and the KEGG pathway enrichment analysis of these genes are summarized in Fig. 8a, b. The top enriched GO terms in biological processes were “axon development”, “regulation of membrane”, and “potential synapse organization”, those in cellular components were “neuronal cell body”, “synaptic membrane”, and “collagen-containing extracellular matrix”, and those in molecular function were “channel activity” and “passive transmembrane transporter activity”. In the KEGG pathway enrichment analysis, “neuroactive ligand-receptor interaction”, “adrenergic signaling in cardiomyocytes”, “cell adhesion molecules”, and “calcium signaling pathway” were enriched in the high-risk groups. In the GSEA enrichment results (Fig. 8c), we found that the “adaptive thermogenesis”, “lipid catabolic process”, “regulation of small molecule metabolic process”, “temperature homeostasis” were enriched in the high-risk groups.

Fig. 8
figure 8

Enrichment analysis of the differential genes between the high and low-risk groups. a Bubble graph of GO enrichment. b Bubble graph of KEGG pathways. c GESA enrichment analysis

3.5 Correlations with immune cells

The association of between the model and the immune cell infiltration in the TME of CC patients was analyzed (Fig. 9a). In Cibersort analysis, the proportions of resting dendritic cells, active dendritic cells, Tregs cells, activated CD4+ T cells and CD4+ memory T cells in the high-risk group was significantly higher than that in the low-risk group. The content of CD8+ T cells did not differ significantly between the two groups (Fig. 9b).

Fig. 9
figure 9

a Immune cell infiltration, b Cibersort analysis based on 17-gene model in the tumor microenvironment of CC patients in The Cancer Genome Atlas (TCGA) cohort

4 Discussion

Oxidative stress plays a pivotal role in carcinogenesis tumor progression [22,23,24,25,26]. The cellular burden of ROS, stemming from both endogenous and exogenous sources, can escalate, leading to the modification and damage of cellular macromolecules, in particular, genomic DNA, thereby seeding the potential for mutations [27, 28]. Despite this, the specific contribution of oxidative stress to the pathogenesis and progression of CC remains an area ripe for deeper investigation. This study is the first to explore the role of oxidative stress in a cohort of colon cancer patients and has prospective significance.

Over the past two decades, the prevalence and burden of colorectal cancer (CRC) have seen a significant rise, disproportionately affecting men and individuals in the middle to older age categories [29]. It is critical that we enhance preventive strategies targeting the principal risk factors at play. Concurrently, the thorough analysis of prognostic indicators has emerged as a vital aspect of our ongoing battle against CRC, essential for improving patient outcomes. Currently, tumor prognosis is predominantly gauged by patient’s clinical and pathological characteristics [30]. The advent of advanced gene sequencing technologies has ushered in a new era where mRNA levels serve as prognostic indicators for an array of cancers. Encouragingly, expansive databases like the TCGA have equipped researchers with robust tools for gene model exploration, facilitating a more profound comprehension of the nexus between OSRGs and CC. In the current study, by leveraging the extant gene expression data from CC patients, we initially pinpointed 115 differentially expressed OSRGs within the TCGA-COAD dataset, marking a significant step towards unraveling the intricate interplay of these genes in CC.

From an oxidative stress-centric perspective, our analysis of CC-related prognostic genes identified 17 significant OSRGs via Lasso Cox regression analyses. These genes were instrumental in the development of a prognostic model capable of stratifying CC patients into distinct risk categories. The application of this 17 OSRGs model to patient data revealed a stark divergence in survival curves between the low- and high-risk groups, with the latter exhibiting markedly reduced survival times. Moreover, patients assigned to the high-risk group consistently presented with more advanced TNM, T, N, and M staging compared to their low-risk counterparts. The independent prognostic significance of our oxidative stress-related model was substantiated by Cox regression analysis, underscoring its efficacy as a metric for gauging the pathogenesis and disease progression, and its critical role in the prognostic assessment of CC patients. We also confirmed that this risk signature was positively correlated with the degree of immune cell infiltration in the CRC tumor microenvironment. In summary, we could predict the survival and immune cell infiltration of CC patients based on this risk signature.

The trio of pivotal genes within our model, CYP19A1, UCN, and NOL3, have each garnered considerable attention in the existing literature. CYP19A1 has been shown to be a promising target for enhancing the efficacy of immunotherapy in colon cancer [31]. UCNs, integral members of the corticotropin-releasing factor (CRF) family, are implicated in a spectrum of biological processes, including inflammation and the orchestration of cancer development [32]. NOL3, encoding an anti-apoptotic protein, is intricately involved in pathways governing apoptosis, autophagy, and signal transduction, with its caspase recruitment domain (CARD) demonstrated to modulate p53 activity through CARD–CARD interactions [33].

GO and KEGG analyses elucidated the involvement of the differentially expressed OSRGs in a risk model predictive of the OS of CC patients. These genes were found to be implicated in multiple biological mechanisms, including the modulation of viral infection and the MAPK signaling pathway, which may contribute to the prognostic heterogeneity observed. GSEA further delineated the divergent molecular landscapes between high-risk and low-risk groups as defined by our oxidative stress-related model. Notably, pathways such as the "reactive oxygen species pathway" and the "p53 signaling pathway" were significantly enriched in the high-risk group, underscoring their potential role in CC prognosis [34, 35].

Immune cell infiltration is associated with CC occurrence and progression. Immunotherapy has considerably extended the period that patients with advanced tumor survive [36]. In our study risk score was significantly and positively correlated with the richness of specific types of immune cell infiltration in the tumor microenvironment. We found that CD4+ memory T cells, dendritic cells, were positively correlated with the OSRGs risk score. Recent breakthroughs in single-cell and spatial transcriptomics have marked a pivotal phase in biological research [37,38,39], which is more helpful to understand the composition and function of immune cells in the tumor microenvironment and the interactions between various immune cell types.

Despite its strengths, there are some limitations of this study that should be noted. First, the data analysis and results validation of this study were conducted using data from public databases, and performance of model was not tested in prospective experiments. Secondly, our study is only retrospective study which needs more prospective clinical trials to verify the efficacy of its predictions. In addition, the mechanism underlying the roles of the 17 OSRG in the occurrence and development of CC still require further studied.

5 Conclusion

In summary, we established a new prediction model of oxidative stress-related genes, which is also helpful for clinicians to make preliminary judgment on patient prognosis. According to above we found CYP19A1, NOL3 and UCN are highly expressed in tumor tissues and have significant clinical predictive significance. Our study provides new insights into the role of oxidative stress in colon cancer.