Introduction

Clear cell renal cell carcinoma (ccRCC) is a major histological subtype of renal cell carcinoma (RCC), and accounts for approximately 60%-85% of RCC cases. ccRCC is characterized by epithelial cells of renal proximal convoluted tubules [1, 2]. The early stages of ccRCC typically present as asymptomatic, with approximately 25%-30% patients exhibiting metastasis at the time of diagnosis [3]. The relapse or distant metastasis rate for ccRCC patients after radical nephrectomy exceeds 20%. Furthermore, the resistance of ccRCC to radiotherapy and chemotherapy results in a poor prognosis [4, 5]. Thus, improved prognosis prediction of advanced ccRCC patients will greatly assist clinicians in decision-making. Moreover, the identification of key genetic drivers for progression can aid in the development of new treatments.

Relevant prognostic factors have been observed in addition to the American Joint Committee on Cancer (AJCC) Tumor Node Metastasis (TNM) stage [6]. It is worth noting that gene expression profiling has the potential to classify different tumor types because of the significant involvement of genes in tumor development and metastasis [7]. The rapid development of gene sequencing technology has made Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases increasingly important in bio-informatics analysis [8, 9]. These databases offer sequencing data for discovering new functional genes and analyzing their impact on prognosis. Thus, the analysis of the objective need for clinical variable gene combinations and nomograms can serve as an effective tool in the development of individualized patient treatment strategies.

Nomograms are established based on Cox regression analysis results and are widely used for cancer prognosis, primarily because of their ability to reduce statistical predictive models to a single numerical estimate of the probability of an event, such as death or recurrence [10, 11]. In ccRCC patients, a nomogram combining differentiation-related gene (DRG)-based risk score and prognostic clinicopathological variables was previously constructed to provide a visual method for determining prognosis [12]. However, another nomogram based on risk gene signature and clinical features may provide a practical method for recurrence prediction and facilitate personalized management of ccRCC patients after surgery [13]. Thus, research on the prediction model based on tumor stage requires further investigation.

This study found the key prognostic genes affecting different stages of ccRCC. Phosphoserine aminotransferase 1 (PSAT1) is a protease of class V pyridoxal phosphate-dependent aminotransferase family. A gene mutation in PSAT1 leads to metabolic and genetic disorders such as phosphoserine aminotransferase deficiency, serine deficiency, and Neu-Laxova syndrome, wherein patients require postnatal serine and glycine supplementation for symptom alleviation. In recent years, an increasing number of investigations have shown that PSAT1 is highly associated with the occurrence, development, treatment, and prognosis of various cancers [14,15,16,17,18]. Therefore, the objective of this study was to conduct a comprehensive bioinformatics analysis to identify the prognostic genes in patients at different stages of ccRCC and to develop a new nomogram model for predicting overall survival (OS) in patients with late-stage ccRCC based on the data from the GEO and TCGA databases.

Materials and methods

Acquisition of microarray data

The discovery phase involved the identification of datasets comparing mRNA expression in tissues of patients with late-stage (stage III+ stage IV) ccRCC with that in the tissues of patients with early-stage (stage I+ stage II) ccRCC. Gene expression profiles of GSE73731 (with 256 samples), GSE89563 (with 16 samples), and GSE150404 (with 60 samples) were obtained from the National Center for Biotechnology Information (NCBI) GEO database (https://www.ncbi.nlm.nih.gov/geo/). The GSE73731 dataset was based on the GPL570 platform, whereas GSE89563 and GSE150404 were based on the GPL17692 platform.

Screening for integrated differentially expressed genes (DEGs) at different stages

The GEO2R tool, which relies on the R package “Limma” provided by the GEO database, was used for identifying DEGs in each dataset. The cut-off criteria for screening over-expressed DEGs were adjusted p-values < 0.05 and log2FC > 1. The significantly up-regulated genes were separately extracted.

Genes over-expressed in all datasets were identified by constructing a Venn diagram using an online tool (http://bioinformatics.psb.ugent.be/webtools/Venn/), which depicted three lists of up-regulated genes. The expression levels of all genes and survival analysis of selected genes at different stages were verified using the Assistant for Clinical Bioinformatics (ACBI) tool (https://www.aclbi.com). The levels of potential hub genes were determined using R software to create a heatmap.

Collection of clinical and bioinformatics data

The TCGA database was accessed on June 9, 2023, and the clinical data including tumors RNA expression data of 532 ccRCC patients were collected (https://tcga-data.nci.nih.gov/). Clinical parameters included sex, age, race, pathologic T stage, pathologic N stage, pathologic M stage, grade, neoadjuvant therapy, vital status, and follow-up duration (days). Depending on the stage of ccRCC patients, we divided the patients into late-stage and early-stage groups with the help of the ACBI module of TCGA (https://www.aclbi.com/static/index.html#/tcga). The RNA sequencing expression profiles and corresponding clinical information by stage-group downloaded from the TCGA dataset (https://portal.gdc.com) were matched with the TCGA-ccRCC dataset (https://tcga-data.nci.nih.gov/). Considering the influence of surgical factors, we excluded the data of patients whose follow-up time was less than 30 days. The median RNA expression value in the two groups was regarded as the cut-off to the RNA expression levels as high or low in each group.

Development of risk prediction model

According to the TCGA data, we developed a nomogram combining gene expression with clinical information (new model) for the prediction of 3-year and 5-year OS in individuals with different stages of ccRCC. Another nomogram only using clinical variables was developed for a head-to-head comparison with the first comprehensive model in ccRCC patients at different stages.

Patients and tissue specimens

A total of 20 pairs of ccRCC specimens were obtained from patients who underwent radical nephrectomy or partial nephrectomy at the Affiliated Hospital of Jiangnan University. None of the patients in our study received neoadjuvant chemotherapy. In all, 20 matched fresh ccRCC specimens (10 pairs of late-stage cases and 10 pairs of early-stage cases) and adjacent noncancerous renal tissues were selectively used for qRT-PCR, Western blotting, and immunohistochemical analysis. The diagnosis for each patient was confirmed by histopathological analysis. Informed consent was obtained from the patients before inclusion in the study, and the study protocol was approved by the Ethics Committee of the Affiliated Hospital of Jiangnan University.

RNA extraction and qRT-PCR assays

Total RNA was extracted by RNA-easy (RC101, Vazyme, Nanjing, CN) according to the reagent instructions. In all, 1 µg of total RNA was used for cDNA synthesis using a cDNA reverse transcription kit (R323, Vazyme, Nanjing, CN). Real-time PCR was performed in triplicates on a Bio-Rad CFX96 PCR system to detect PSAT1expression. The results were normalized to the expression of GAPDH. The primer sequences are listed below: PSAT1-F: ACAGGAGCTTGGTCAGCTAAG, PSAT1-R: CATGCACCGTCTCATTTGCG; GAPDH-F: GGAGCGAGATCCCTCCAAAAT, GAPDH-R: GGCTGTTGTCATACTTCTCATGG.

Immunohistochemistry analysis

The ccRCC tissue samples were obtained from the Affiliated Hospital of Jiangnan University according to institutional guidelines. Tissue paraffin blocks were sectioned, and stained with antibodies specific to PSAT1 (10501-1-AP: 1:400, Proteintech, Wuhan, CN), followed by scanning with a Pannoramic Scanner (3DHISTECH, Budapest, Hungary).

Western blotting

The kidney tissues were treated destructed and then lysed by boiling for 10 min in sample buffer (2% SDS, 10% glycerol, 10% β-mercaptoethanol, bromophenol blue, and Tris-HCl, pH = 6.8). The lysates were fractionated by SDS-PAGE and the isolates were transferred to PVDF membranes (Millipore, IPVH00010, NH, US). The blots were probed with specific primary antibodies followed by a secondary antibody and the membranes were then detected by ECL (Sigma, WBULS0500, MO, US). PSAT1 (10501-1-AP: 1:10000) and GAPDH (66009-1-Ig; 1:10000) antibodies were purchased from Proteintech Group (IL, US). Secondary antibodies were conjugated with HRP (Proteintech Group; SA00001-1, SA00001-2; 1:10000). Uncropped WB are shown in Fig. S3.

Statistical analyses

Statistical analyses were conducted using SPSS version 27.0 (SPSS Inc., IBM Corp., Armonk, NY, USA) and R software for Windows, version 4.2.3. Data are presented as mean ± SD or median and range. Student’s t test was performed for normally distributed continuous variables, while Mann-Whitney U test was performed for non-normally distributed data. Chi square or Fisher’s exact test was applied to compare categorical variables.

The Cox proportional hazard regression model was used to estimate the hazard ratio and its 95% confidence interval (CI) for each potential risk factor, and data were visualized through Forest plots. The stepwise multivariate Cox regression analysis included inclusion and exclusion criteria of type I error = 0.1.

Discrimination reflects the ability of a model to distinguish events and non-events correctly, and these were validated using C-statistics. The Concordance index (C-index) is analogous to the area under the receiver operating characteristic (ROC) curve. The predictive capacity of models was summarized using ROC curves [19]. Calibration refers to the closeness between the predicted probabilities and the actual outcomes, and this was validated using calibration plots [20].

A two-sided p-value of < 0.05 was considered statistically significant.

Results

Identification of DEGs in ccRCC patients at different stages

We downloaded three ccRCC gene expression profiles (GSE73731, GSE89563, and GSE150404) from the GEO database and screened 22, 89, and 120 over-expressed DEGs, respectively, using the GEO2R tool to differentiate between late-stage and early-stage patients. A Venn diagram was constructed (Fig. 1A) and three genes (PSAT1, PRAME, and KDELR3) that were over-expressed in the three profiles were identified. The three selected genes were verified in TCGA (Fig. 1B-D), and the gene PSAT1 was the most significantly differentially expressed (Fig. 1E).

Fig. 1
figure 1

Screening and identification of differentially co-expressed genes. A Different colors represent the number of up-regulated genes in different datasets, and the middle intersection represents the number of co-expressed genes in the three GSE (GSE73731, GSE89563, GSE150404) datasets. B PSAT1 expression. C PRAME expression. D KDELR3 expression. Red dots represent the number of cases in the late-stage ccRCC group, blue dots represent the number of cases in the early-stage ccRCC group. Horizontal line in the middle of the box: represents the median. Top and bottom lines of the box: represent the upper and lower quartiles. The bin size represents the quartile spacing. G1: late-stage group; G2: early-stage group. ****p < 0.0001. E The heatmap illustrates the expression levels of potential 3 genes. Each row represents the expression level of each gene in different samples, and each column represents the expression level of all genes in each sample

Determination of the effect of PSAT1 on the prognosis of late-stage patients

The clinicopathological data were integrated with the expression levels of PSAT1, PRAME, and KDELR3, which matched the corresponding data in the two databases based on row-name, and finally, we included 529 cases. The median of hub-gene expression level was used to divide patients into the up-regulated and down-regulated subgroups within different stage groups. Patients with a follow-up duration of < 30 days or those who had no clinical OS information were excluded. Finally, data of 499 ccRCC patients for OS in different disease stages was obtained, wherein the early-stage group included 303 patients and late-stage group included 196 patients.

Detailed characteristics were compared between the two groups. The mean ages of patients in the early-stage and late-stage groups were 59.58 ± 12.69 and 61.59 ± 11.40 years, respectively, which showed no statistically significant difference (p = 0.073). Besides, no significant difference was also observed between the two groups with regard to gender, ethnicity, and neoadjuvant therapy (all p > 0.05). However, a significant difference in the expressions of PSAT1, PRAME, KDELR3 was observed between groups; these genes were highly expressed in patients with late-stage ccRCC than in those with early-stage ccRCC (all p < 0.05) (Table 1).

Table 1 Comparison of relative factors between the two groups

Based on the ACBI-TCGA database, we found significance differences in the OS curves between the two groups (Fig. 2A, log-rank p < 0.001). Kaplan–Meier curves of OS according to gene expression level for hub-genes in the different stage groups showed that only PSAT1 expression exhibited statistical significance in terms of OS curves (Fig. 2B, log-rank p = 0.001). High levels of PSAT1 expression were associated with a poor prognosis in late-stage patients, whereas the other two genes did not have a prognostic value with regard to OS in patients with different stages of ccRCC (Fig. S1).

Fig. 2
figure 2

Kaplan-Meier plots for ccRCC patients. A Overall survival in different stage groups. The mortality rates of the two groups show a downward trend, which decreased rapidly at the beginning and then leveled off. The red curve represents the late-stage group, blue curve represents the early-stage group. B PSAT1 expression level about late-stage patients for overall survival. The mortality rates of the different expression level showed a downward trend, which decreased rapidly at the beginning and then leveled off. The blue curve represents the high-expression, green curve represents the low-expression. Follow-up time for horizontal axis and vertical axis for survival rate. Survival curves were obtained by linking the corresponding survival rates at each time point. G1: late-stage group; G2: early-stage group. p < 0.05 was considered statistically significant

Confirm risk/protect factors of prognosis

To search for potentially relevant risk factors, univariate Cox-regression analysis revealed that six variables, age (p < 0.001), grade (p = 0.002), neoadjuvant therapy (p = 0.045), PSAT1 expression (p < 0.001), PRAME expression (p = 0.031), and KDELR3 expression (p = 0.025), showed a significant correlation with OS in the two stage groups of patients. The remaining parameters did not indicate significant statistical associations (all p > 0.05) (Fig. 3A and Table 2).

Fig. 3
figure 3

Influence factors for overall survival by Cox-regression analysis. A Univariate analysis. B Multivariate analysis. Short line parallel to the X-axis, the length of the line segment corresponds to the 95% CI. Abscissa values correspond on both ends of the line around 95% CI of two Numbers, the square line corresponding to the HR. Perpendicular to the X axis of the straight line, known as invalid. Crossing this line indicates that the results are not statistically significant. HR > 1 is on the right side of the line, indicating a risk factor. HR < 1 is on the left side of the line, indicating a protective factor. HR: Hazard Ratio; CI: Confidence internal. Statistically significant p < 0.05

Table 2 COX regression analysis factors affecting overall survival

The significant risk factors determined in the univariate analysis were further evaluated via multivariate Cox-analysis. Finally, age [HR: 1.032, 95% CI: 1.018 to 1.046, p < 0.001], grade (HR: 1.491, 95% CI: 1.022 to 2.175, p = 0.038), and PSAT1 expression (HR: 1.855, 95% CI: 1.322 to 2.604, p < 0.001) were identified as independent risk factors of OS in the two groups, whereas neoadjuvant therapy (HR: 0.432, 95% CI: 0.221 to 0.827, p = 0.01) was identified as a protective factor (Fig. 3B and Table 2).

Development of a nomogram combining gene and clinical variables

Based on the abovementioned results, we developed a prediction model and generated a graphical nomogram predicting the probability of 3-year and 5-year OS in the late-stage compared with early-stage groups (Fig. 4A). PSAT1 was included in the nomogram of OS and the predictive accuracy of the nomogram calculated by AUC was 0.720 for 3-year OS and 0.719 for 5-year OS (Fig. 4B-C), which revealed moderate discriminatory ability.

Fig. 4
figure 4

Predicting and verification of 3-year and 5-year overall survival. A The risk factors were represented by points on the axis, with each factor corresponding to a line drawn upward. The total points located on the axis indicated the probability of 3-year and 5-year overall survival, represented by a line drawn downward to the survival axis. B 3-year overall survival in new mode. C 5-year overall survival in new model. D 3-year overall survival in only clinical variables model. E 5-year overall survival in only clinical variables model. AUC represents the area under the ROC curve. AUC value is between 0.5 and 1, the closer to 1, the better the performance of model diagnosis, the higher the accuracy. (1) 0.7 < AUC < 0.9, indicating that the accuracy of the model is good and has certain clinical application value. (2) 0.5 < AUC < 0.7, indicating that the accuracy of this index/model is low and has little clinical application value. AUC: area under curve

To verify the importance of PSAT1 expression in late-stage patients, we also developed prediction models based on other statistically significant clinical parameters (Fig. S2), except PSAT1. The AUCs were 0.696 for 3-year OS and 0.684 for 5-year OS (Fig. 4D-E). The results demonstrated that the predictive accuracy of the new model was significantly higher than that of model considering only clinical parameters. Further, the calibration plots of the new model suggested good agreement between the observed outcome and predicted probability unlike clinical parameter model (Fig. 5).

Fig. 5
figure 5

The calibration curve’s accuracy represents different models’ predictions. A 3-year overall survival in new model. B 5-year overall survival in new model. C 3-year overall survival based solely on clinical variables model. D 5-year overall survival based solely on clinical variables model. Diagonal dotted lines in the figure is the reference line, the predicted probability is equal to the probability of the actual situation, the farther off diagonal suggests that the bigger the error of the prediction. (1) If the predicted value is equal to the actual value, the plotted fit line coincides with the reference line. (2) If the predicted value is greater than the actual value, that is, the risk is overestimated, the fitted line is below the reference lin. (3) If the predicted value is less than the actual value, that is, the risk is underestimated, the fitted line is above the reference line

Validation the expression of PSAT1 at different molecular levels

To validate the role of PSAT1 in ccRCC development and progression, we first examined PSAT1 expression status in ccRCC tissues. PSAT1 mRNA levels were detected in 20 ccRCC tissues and matched noncancerous adjacent tissues. The results demonstrated that the mRNA expression of PSAT1 was significantly increased in late-stage ccRCC tumor tissues (Fig. 6A).Quantitative values for each specific pair of tissues are described in Fig. S5A.

Fig. 6
figure 6

Verification of PSAT1 expression levels. A PSAT1 mRNA levels are elevated in 20 pairs of ccRCC tissues compared with matched tissues. B PSAT1 protein levels in 20 pairs of ccRCC tissues and matched noncancerous tissues were detected by WB. 20 pairs ccRCC samples: (10 pairs: late-stage; 10 pairs: early-stage). The abscissa represents the sample type, and the ordinate represents the relative expression values of mRNA or protein levels in tumor tissues compared with adjacent normal tissues. The level of the columns represents the level of expression. qRT-PCR and WB data are presented as the mean ± SD. (N: noncancerous, T: tumor; ****P < 0.0001, ***P < 0.001; ns: no significance). C The representative images of HE staining and PSAT1 immunohistochemistry in late-stage and early-stage ccRCC pair tissues. The scale bars indicate 100 µm

We further detected the protein expression levels of PSAT1 in 20 pairs of tumor and adjacent noncancerous tissues, including tissues from 10 pairs from early-stage and 10 pairs from late-stage cases. We observed that the PSAT1 expression was up-regulated in all tumor tissues compared with matched noncancerous tissues, and that the expression was higher in late-stage tumor tissues (Fig. 6B). Hematoxylin-Eosin (HE) staining showed that the tumor tissue had clear cell outline, transparent cytoplasm, centered nucleus, cells arranged in sheets, and less interstitium compared with the normal adjacent tissue, and the results were more obvious in patients with late-stage ccRCC (Fig. 6C). Immunohistochemical staining in serial sections also confirmed a positive correlation between the expression levels of PSAT1 in human ccRCC tumor tissues and the disease stage (Fig. 6C). The expression level was also higher in late-stage cases. In Fig. S5B, immunohistochemical scores and quantitative values of the proportion of positive cells in pathologically stained sections are described.

Discussion

The prognosis of ccRCC patients is closely associated with tumor stage, with the later stages often indicating a poor prognosis [21]. Treatment options for advanced stage are limited. Currently, there is a scarcity of effective therapeutic strategies for recurrent and metastatic ccRCC [22]. Thus, the development of a new prognostic tool is crucial for identifying high-risk patients requiring additional treatment and attention. Moreover, finding a promising therapeutic target is crucial for developing anti-tumor drugs and improving the survival rate of patients with advanced ccRCC.

With the advancement of bioinformatics, an increasing number of genes have been identified as closely associated with ccRCC occurrence and development [23, 24]. Accordingly, we focused on gene expression in different stages of ccRCC using the GEO dataset and verified our findings with ACBI-TCGA. In our study, we found three DEGs (PSAT1, PRAME, and KDELR3) between late-stage patients and early-stage patients across three mRNA arrays. The post-match data included complete variables for comparison in both stage groups. Except for the stage parameter, another principal clinical factor for ccRCC prognosis is the grade parameter [25, 26]. We concluded that late-stage group patients also tended to have a higher disease grade. This further confirms that tumor staging and histological grading are the main parameters associated with ccRCC prognosis [27].

In survival analysis, we found that patients in the late stage of the disease showed a poor prognosis, and PSAT1 was the only gene associated with OS in this group. However, the relative expression of the target genes may be involved in the prognosis [23]. The ccRCC clinicopathological information downloaded from TCGA was processed via univariate and multivariate Cox regression analysis. Age, PSAT1 expression, grade, and neoadjuvant therapy were found to be significant independent prognostic factors associated with OS. Neoadjuvant therapy has been proved to be a protective factor. However, in previous studies, the benefit of neoadjuvant therapy for locally advanced ccRCC with currently available therapeutic agents has been controversial [28]. Christopher et al. [29] suggested that neoadjuvant treatment with pazopanib is effective in treating patients with localized ccRCC, which is similar to the findings of our study.

Various nomograms have been identified to determine the prognosis of ccRCC patients. For instance, Xia et al. [12] constructed a prognostic nomogram based on the prognostic risk signature and clinicopathological characteristics, which exhibited high accuracy and a robust predictive performance. The study by Zhu et al. [30] reported that combining methylation risk scores with conventional clinical covariates improved the prediction of clinical prognosis in ccRCC patients. In our study, we developed a new graphical nomogram that combines PSAT1 expression with clinicopathological data for predicting OS in ccRCC patients at different stages. By assigning values to clinical variables and PSAT1 expression for each patient, we calculated a total score that predicts the OS of late-stage patients at 3 and 5 years. The abovementioned estimates can be used for patient counseling and informed decision-making. The C-index, AUC, and calibration curve indicators from the entire TCGA set confirmed the discriminative accuracy of our nomogram, possibly making it a preferred predictive model. Moreover, the predictive power of our new model was higher than that of single clinical variable model.

Of the three hub-genes identified in our study, PSAT1 was finally included in the model to predict the survival of late-stage ccRCC patients. Previous research reported that the dysregulation of PSAT1 activity may alter glucose and glutamine utilization in serine biosynthesis, promoting tumorigenesis and chemoresistance in colorectal cancers given that PSAT1 is a metabolism-related gene [31, 32]. Increased transcription of PSAT1, caused by promoter hypomethylation, was also linked to a poor response to tamoxifen therapy and cancer recurrence in early-stage breast cancer [33, 34]. Furthermore, studies have shown that the up-regulation of PSAT1 promotes cell proliferation and is associated with a poor outcome in patients with non-small cell lung cancer [35, 36]. These studies indicate a strong correlation between PSAT1 levels and tumor progression as well as prognosis.

The confirmation of the association of PSAT1 expression levels with ccRCC indicates its involvement in metabolism, development, and progression. Zhang et al. [37] screened ccRCC-related glycolytic genes in public databases and constructed a prediction model of 13 genes including PSAT1, which could be valuable for diagnosing and predicting ccRCC. The study by Cheng et al. [38] introduced a new gene signature, including PSAT1, to determine the ccRCC prognosis in TCGA cohorts based on amino acid metabolism-related genes. In light of the GEPIA2 analysis, some other tumors (BLCA, CESC, COAD, DLBC, GBM, LGG, LUAD, LUSC, OV, PRAD, READ, STAD, THYM, UCEC, and UCS) are high expressed of PSAT1 (Fig. S4), which bodes well for the possibility of studying PSAT1 as a biomarker in other tumors. However, for advanced stage ccRCC patients, the role of PSAT1 remains elusive. In our research, PSAT1 was found to be highly expressed in patients with late-stage ccRCC and affected the OS of patients. Our model further demonstrated the application value of PSAT1 in accurately determining the prognosis in advanced ccRCC patients.

The results of mRNA and protein level validation may provide guidance for clinical decision making. For example, in clinical practice, when the clinical renal cell carcinoma patients after surgical treatment, the histopathology suggests that the renal cell carcinoma is clear cell carcinoma, the expression of PSAT1 can be further detected by immunohistochemistry. If PSAT1 expression is positive, the prognosis of the patients is poor, which provides a reference for clinicians for the next treatment of the patients.

To the best of our knowledge, this is the first nomogram to predict OS in patients with different stages of ccRCC by combining genetic information and clinical data. Furthermore, the significance of PSAT1 in late-stage ccRCC was confirmed in this study, providing more specific and precise insights on its role.

Conclusion

The prognostic gene expression profiles of ccRCC patients at different stages were determined in this study. The high expression of PSAT1 in tumor tissues, especially late-stage tumor tissues, combined with clinical data, in this nomogram enhances its prognostic value in different-stage ccRCC patients, particularly for the prediction of OS.