Introduction

Colorectal cancer (CRC) remains a major public health problem and is the third most common cancer and the third leading cause of cancer-related death among men and women1. Of the 1,900,000 new cases of colorectal cancer annually, approximately 70% of CRC patients are diagnosed with stage II/III disease. To reduce the risk of cancer recurrence and improve survival, fluorouracil-based adjuvant chemotherapy (ACT) is recommended as the standard treatment for stage III CRC and some high-risk stage II CRC (e.g., T4, high grade, fewer than 12 lymph nodes examined) after surgery2. However, ACT may only provide additional survival benefit in certain subsets of patients. Currently, the selection of patients is suboptimal, which leads to either over- or undertreatment3. A previous study has reported that 50% of stage III patients are cured by surgery alone, and 20% of those can survival with the addition of ACT. Altogether, only 20% of stage III CRC patients really benefit from ACT, exposing 80% of those to unnecessary toxicity4. In stage II patients, the role of ACT remains an area of great controversy because only a subset of patients will yield considerable benefit. Even though the QUASAR clinical trial revealed that ACT could improve survival of patients with stage II CRC, the absolute improvements were small (approximately 3.6%)5. Furthermore, up to 30% of stage II CRC patients will experience relapse, which is generally fatal6. Therefore, the current staging system is not sufficient for management in patients with stage II/III CRC, and it is crucial to identify biomarkers for detecting patients who could benefit from ACT.

Mass spectrometry (MS)-based proteomic is a promising technique for the discovery of diagnostic and prognostic methods and the identification of prognostic signatures of proteins7,8,9, which are usually the final executors of biological activities. Thus, proteomic (proteomic-derived signature) has been successfully applied in improvement of diagnostic accuracy10,11, response to therapy12,13, and prognosis prediction14. Moreover, proteomic might, in theory, objectively reflect the tumor’s biology nature to relate patient’s prognosis. Specifically, proteomic has been showed to be an effective tool to predict prognosis15 and response to treatment16 in CRC. However, few studies have focused on the prediction of postoperative survival and ACT benefit.

In the present study, we investigate the comprehensive proteomic profiling to explore the clinical significance of proteomic features in stage II/III CRC, and then develop and validate a proteomic signature to predict disease-free survival (DFS) and overall survival (OS) in multicenter cohorts. With this proteomic signature and pathologic stage, we further detect the subset of patients that could benefit from ACT.

Results

Proteomic profiling of discovery cohort

The workflow of present study and proteomic landscape is shown in Fig. 1 and Supplementary Fig. 1. The baseline clinical parameters were well balanced in 60 CRC patients with and without relapse in terms of sex, age, T stage, N stage, and treatment, which rules out the potential impact of these factors on relapse (Supplementary Table 2). QC samples (293 T cell) were routinely assayed as quality control samples to guarantee good reproducibility and sensitivity (Supplementary Fig. 2). We then explored detailed protein expression patterns by using mass spectrometry (MS)-based high-throughput assay between two pre-defined subject CRC groups in discovery cohort: relapse and relapse-free (Fig. 1b) and found distinct profiles. GSEA identified extracellular matrix organization (ECM), ECM proteoglycans, complement cascade upregulated in the relapse group, while antigen processing, presentation, and cell cycle downregulated compared to relapse-free group. (Fig. 1c).

Fig. 1: Outline of workflow and proteomic landscape of patients from discovery cohort.
figure 1

a Study design and flow chart. b Heatmap of proteins significantly associated with recurrence in CRC. c GSEA (H: Reactome) analysis of stage II/III CRC patients revealed the pathways associated with relapse (n = 30) or non-relapse (n = 30). LC-MS/MS Liquid chromatography tandem mass spectrometry, CRC Colorectal cancer, SY6H Sun Yat-sen University, the Sixth affiliated Hospital.

Clinicopathological characteristics of the training and validation cohorts

A total of 740 pretreatment, stage II/III CRC specimens obtained from patients at 3 academic institutions were included in our analysis. The baseline demographic and clinicopathological features of patients in the training cohort (n = 203), internal validation cohort (n = 204), and external validation cohort (n = 333) are shown in Table 1. The median follow-up time was 104.2 months (IQR 66.6−116.2) in the training cohort, 108.4 months (IQR 78.2−118.8) in the internal validation cohort, and 76.7 months (IQR 34.7−88.3) in the external validation cohort.

Table 1 Clinical characteristics of patients in the training, internal, and external validation cohorts.

Construction of proteomic signature

Twelve relevant proteins were identified by the coarse-to-fine feature selection strategy from discovery cohort. The least absolute shrinkage and selection operator (LASSO)/ SVM logistic model was applied into further selection and multiple immunohistochemistry was used to build the proteomic signature including FHL3, GGA1, TGFBI (Supplementary Figs. 2, 3; Supplementary Tables 3, 4). The risk score of each patient was calculated using the following formula based on their regression coefficient of the expression levels of these 3 markers (Supplementary Table 4): risk score = 0.003 × Hscore of FHL3 in tumor -0.006× Hscore of GGA1 in tumor +0.004× Hscore of TGFBI in stromal. For each of the training cohort and the two validation cohorts, X-tile plots were used to generate an optimum cutoff value (Supplementary Fig. 7) to stratify patients into high- and low-proteomic signature groups for further analyses.

Association between proteomic signature and prognosis

In all three cohorts, the Kaplan–Meier survival curves have revealed a significant difference in DFS between the high and low- proteomic signature groups (p < 0.005), with relatively high hazard ratios (HRs, > 2.9) (Fig. 2a–c, upper). Furthermore, a significant difference in OS was also confirmed between the high- and low- proteomic signature groups (p < 0.05), with hazard ratios (HRs, > 2.1) (Fig. 2a–c, lower). The number of patients who had an event for each risk group among each cohort and DFS, and OS outcomes are listed in the appendix (Supplementary Tables 5, 6). Subgroup analyses further revealed that the proteomic signature was a predictor for DFS stratified by clinical stage (Fig. 3) from each cohort.

Fig. 2: Kaplan-Meier curves for DFS and OS according to the PS.
figure 2

a Training cohort (upper: DFS, lower: OS, n = 203), (b) internal validation cohort (upper: DFS, lower: OS, n = 204), and (c) external validation cohort (upper: DFS, lower: OS, n = 333). We calculated the p values using the unadjusted log-rank test and hazard ratios using a univariate Cox regression analysis. DFS Disease-free survival, OS Overall survival, PS Proteomic signature, HR Hazard ratio, CI Confidential interval.

Fig. 3: Kaplan-Meier curves for DFS according to the PS among stage II/III CRC patient subgroups.
figure 3

a The training cohort (upper: stage II, n = 97; lower: stage III, n = 106), (b) internal validation cohort (upper: stage II, n = 101; lower: stage III, n = 103), (c) external validation cohort (upper: stage II, n = 172; lower: stage III, n = 161). We calculated the p-values using the unadjusted log-rank test and hazard ratios using a univariate Cox regression analysis. DFS Disease-free survival, PS Proteomic signature, CRC Colorectal cancer, HR Hazard ratio, CI Confidential interval.

The results of the univariate analysis of DFS by clinicopathological and proteomic signature subgroups in the three cohorts are shown in Fig. 4. After adjusting for the clinicopathological variables and the CEA level, multivariate analysis showed that proteomic signature was associated with DFS in the training cohort (HR 2.62, 95% CI 1.38−4.96, p = 0.003, Table 2), as well as in the internal validation cohort (HR 2.81, 95% CI 1.33−5.96, p = 0.007) and the external validation cohort (HR 2.84, 95% CI 1.61−5.02, p < 0.001). Moreover, proteomic signature was associated with OS in the training cohort (HR 2.53, 95% CI 1.26−5.10, p = 0.009, Supplementary Table 7) and the external validation cohort (HR 2.93, 95% CI 1.58−5.42, p < 0.001). These survival results demonstrated the high prognostic accuracy of the proteomic signature.

Fig. 4: Univariate analysis of the PS and clinicopathological characteristics with DFS.
figure 4

PS Proteomic signature, DFS Disease-free survival.

Table 2 Multivariable DFS analyses in each cohort.

Prognostic accuracy of proteomic signature integrated with clinicopathologic features

In addition, multivariable analysis was performed to generate a nomogram to predict 8-year DFS in the training cohort using the predictors including age, tumor location, N stage, and proteomic signature (Fig. 5a, Supplementary Table 8). Among these predictors, the proteomic signature had the highest C-index. The calibration plots for the nomogram of the 8-year DFS were predicted well in the training cohort (C-index 0.78, 95% CI 0.71–0.85), the internal validation cohort (0.78, 0.72–0.84), and the external validation cohort (0.75, 0.68–0.82; Fig. 5b–d, Supplementary Fig. 8). The ability of the proteomic signature to predict DFS was superior to that of existing risk factors such as N stage, primary tumor location, and age (Supplementary Fig. S8).

Fig. 5: Nomogram to predict the risk of recurrence in stage II/III CRC patients.
figure 5

a Nomogram to predict DFS. Calibration curves to predict 8-year disease-free survival in (b) the training cohort, (c) the internal validation cohort, and (d) the external validation cohort; The nomogram-predicted probability is plotted on the x-axis and the actual survival is plotted on the y-axis. PS Proteomic signature, CRC Colorectal cancer, ROC Receiver operator characteristic, DFS Disease-free survival.

Association between proteomic signature and benefit from ACT

In order to verify the clinical significance of proteomic signature for detecting patients that could benefit from ACT, subgroup analyses stratified by proteomic signature, pathological stage and ACT were performed. Subgroup analyses indicated that both pT stage and pN stage were correlated with DFS (Fig. 6a, b) and OS (Supplementary Fig. 9) among all patients. In the high-proteomic signature group, both pT and pN stage were significantly associated with DFS (HR: 1.90, 95% CI: 1.36–2.64, p < 0.001 in pT stage, and p < 0.001 in pN stage, Fig. 6) and OS (HR: 2.07, 95% CI: 1.45–2.94, p < 0.001 in pT stage, and p < 0.001 in pN stage, Supplementary Fig. 9). In the low proteomic signature group, only pN stage was significantly associated with DFS (p < 0.001) and OS (p < 0.001). Subgroup analysis for pN stage with high-proteomic signature revealed that, stage II (pN0) patients with ACT, had better DFS (HR: 1.97, 95% CI: 1.11–3.48, p = 0.017) (Fig. 7a) and OS (HR: 3.03, 95% CI: 1.49–6.17, p = 0.001) (Supplementary Fig. 10) than those without ACT, and pN2 patients had survival benefit from the ACT (HR: 2.08, 95% CI: 1.03–4.21, p = 0.037) for DFS (Fig. 7c) and OS (HR: 2.65, 95% CI: 1.28–5.47, p = 0.006) (Supplementary Fig. 10). Subgroup analyses indicate that not all stage II/III CRC patients will benefit from ACT, and not only pathological stage, but also proteomic signature could serve as a powerful tool to optimize decision making regarding ACT treatment strategy.

Fig. 6: Kaplan-Meier curves for DFS according to the PS.
figure 6

The results are shown for all patients (n = 740, left), patients with a high PS (n = 496, middle), and patients with a low PS (n = 244, right). The results are also stratified according to pT stage (a), and pN stage (b). p-values were calculated using two-sided log-rank test. PS Poteomic signature, DFS Disease-free survival, HR Hazard ratio, CI Confidential interval.

Fig. 7: ACT benefits based on DFS according to pN stage and PS.
figure 7

ac Kaplan-Meier DFS curves are shown for patients according to their use of ACT. In addition, patients with a high PS (left) were stratified according to pN0 (n = 206, upper), pN1 (n = 154, middle), and pN2 (n = 76, bottom). Patients with a low PS (right) were also stratified according to pN0 (n = 118, upper), pN1 (n = 75, middle), and pN2 (n = 23, bottom). p values were calculated using two-sided log-rank test. PS Proteomic signature, DFS Disease-free survival, HR Hazard ratio, CI Confidential interval.

Discussion

This study not only developed and validated a robust proteomic signature from comprehensive proteomic profiling associated with tumor relapse and survival of stage II/III CRC patients, but also investigated the association between the proteomic signature and ACT efficacy. We revealed heterogeneity of CRC with and without relapse in proteomic features. More importantly, the present study could identify patients who can benefit from ACT through the stratification of the proteomic signature and pathological stage.

CRC patients with the same stage who receive similar treatment might have different clinical outcomes, which makes accurate prognostication essential for treatment planning. Previous studies have indicated the prognostic value or drug sensitivity of proteomic features in CRC7,17,18 and few proteomic biomarkers have been applied in clinical practice due to the small sample sizes and a lack of large-scale validation cohorts. Our recent study applied proteomic analysis to define proteomic signature for progression of gastric lesion and validate their value via IHC11. Similarly, the present multicenter study revealed a proteomic signature predicting survival and ACT efficacy in stage II/III colorectal cancer.

The most important finding of this study was that this classifier could serve as a powerful tool for optimizing decision-making on ACT for stage II/III CRC. Stratified with proteomic signature and pathological stage, for patients with stage II disease in the high-proteomic signature group, receiving ACT may indicate better prognosis compared with not receiving ACT, and patients with pN2 disease in the high-proteomic signature group experienced a substantial benefit from ACT. Although current guidelines recommend ACT for most stage II/III CRC patients19,20, some studies have demonstrated that not all patients will benefit from ACT3,4. Our findings are consistent with previous reports that patients with stage II disease and a high-risk feature have to receive ACT and more aggressive systemic therapy should be considered for patients with pN2 disease and a poor prognosis. Thus, the proteomic signature might provide a new stratification method for identifying patients who should and should not receive ACT.

We validated proteomic signature including three proteins- GGA1, FHL3, TGFBI, associated with disease progression and efficacy of chemotherapy. Of them, transforming growth factor-beta-induced protein (TGFBI), as an extracellular matrix (ECM) protein, has indicated a critical role in tumor progression, angiogenesis21, and sensitivity of 5-Fluorouacil based chemotherapy in CRC22,23. TGFBI is frequently methylated and associated with chemotherapy resistance24. Much evidence has demonstrated that TGFBI is secreted by macrophages and had a role in immunosuppression in cancers25,26. Andrei Turtoi. et al. employed proteomics analysis and identified proteomic signature including TGFBI was associated with CRC liver metastasis27. The above studies provided evidence that TGFBI might affect tumor prognosis, chemotherapy efficacy and may be an effector of the tumor-promoting actions of TGFβ and a potential therapeutic target. Another protein has also indicated potential importance previously. Four and a half LIM domains 3 (FHL3), as a member of FHL proteins, was identified to be a novel TGF-beta-like signaling pathway and indicates a useful molecular target for cancer therapy28. Several studies implied that FHL3 contributed to tumor metastasis29 and EMT, and chemotherapy resistance30. The expression patterns of proteins in the signature may provide new insights into the molecular mechanisms that underlie tumor relapse and chemotherapy resistance, thus could provide potential novel targets and treatment strategies for CRC patients.

Using proteomic analysis and IHC as our recent study reported11, we identified and validated a proteomic signature to predict prognosis and ACT efficacy. Tissue is very commonly applied for biomarker detection in operative samples or biopsies. For example, the detection of MMR proteins by IHC is currently recommended for deciding the application of immunotherapy in metastatic CRC according to the guideline19. The application of our surgical tissue proteomic signatures may potentially provide information about postoperative prognosis and ACT efficacy for stage II/III CRC patients, thus help them decide for further appropriate management strategies.

The present study has several limitations that merit consideration. The first is the retrospective data collection and limited sample size. Although this study was performed following the REMARK guidelines31 and consecutive patients were enrolled from multicenter cohorts, the signature has not yet been validated in prospective studies; we are currently performing a prospective study to validate our findings (NCT03025854). Second, the biological functions of these molecules in carcinogenesis and development needs to be further explored, even though previous studies indicated their importance in cancer development and chemotherapy efficacy. Third, the performance of the proteomic signature was only examined in Chinese patients, and future studies are warranted to validate its performance in different ethnic populations.

In conclusion, we developed a proteomic signature that effectively predicted prognosis in stage II/III CRC patients. The prognostic value of classifier was validated in independent populations. Combination of the proteomic signature with pathological stage might provide an aid in selecting which patients might benefit from ACT. Larger-scale, prospective studies are warranted before regulatory approval of clinical routine application of key protein signatures.

Methods

Patient cohorts and tumor specimens

This study complied with the REMARK guidelines for tumor marker prognostic studies (Supplementary Table 1). In the discovery phase, proteomic profiling analysis was conducted on tumor samples and adjacent normal colorectal mucosa from 60 patients with stage II/III CRC (Supplementary Table 2). In the training and validation phase, we analyzed a cohort of patients with stage II-III colorectal cancer who received treatment at 3 academic centers in China. The patients in training and internal validation cohorts originated from the institutional database program of colorectal disease (IDPCD) at the Sixth Affiliated Hospital, Sun Yat-sen University32, which has prospectively enrolled CRC patients and integrated the patients from our National Key Research and Development Project of CRC Screen, Surveillance, and Intervention33. The patients in external validation cohort originated from tumor registry at 2 academic cancer centers. We excluded samples if the patient met the exclusion criteria (clinical quality control, eg, metastatic cancer, received previous treatment with any anticancer therapy, stage I disease, or missing mortality or recurrence data). All patients received curative-intent surgery, and no patients received preoperative antitumor treatment. After radical surgery, a proportion of patients received available standard systemic treatment, include fluorouracil (FU) or capecitabine with or without oxaliplatin. We included 740 samples which passed quality control for the final analysis. The workflow for the development and validation of PS classifier have been detailed in Fig. 1a. This multicenter study was conducted in accordance with the Declaration of Helsinki. This study was approved by the Institutional Review Board of The Sixth Affiliated Hospital, Sun Yat-sen University (2020ZSLYEC-229), and written informed consent was obtained from all patients before treatment.

Proteomic analysis

Tissue samples were prepared as previously described in ref. 34. In brief, tissues were lysed using 8 M urea lysis buffer followed by sonication. The protein was then reduced and alkylated using the FASP method. The digested peptides were separated into three fractions using a reverse-phase C18 column and a stepwise gradient of increasing acetonitrile concentration at pH 10. The experimental workflow of proteomic analysis in the discovery phase was shown in Supplementary Fig. 1a. Protein profiles were acquired on an Orbitrap Fusion and Orbitrap Fusion Lumos mass spectrometer (Thermo Fisher Scientific, Rockford, IL, USA) or a Q Exactive HF mass spectrometer (Thermo Fisher Scientific, Rockford, IL, USA)34. A data-dependent mode was performed by measuring MS1 in the Orbitrap at a resolution of 120,000 followed by up to 20 data-dependent MS/MS scans with higher-energy collision dissociation (normalized collision energy of 35%). Digested 293 T cells used as quality control samples were assayed daily to guarantee the sensitivity and reproducibility (Supplementary Figure 2). Raw files generated by MS experiments were submitted to Firmiana, a one-stop proteomic data processing platform35. Peptides with a false discovery rate (FDR) lee than 1% were selected and only proteins with high quality and unique peptides were considered qualified to minimize the FDR at protein level. We used label-free intensity-based absolute quantification (iBAQ) to quantify proteins36. The iBAQ values were then converted to the intensity-based fraction of total (iFOT) to perform further on data analysis37.

Quantitative RT-PCR (qRT-PCR)

A FastPure Cell/Tissue Total RNA Isolation Kit V2 (Vazyme, Nanjing, China) was used to extract total RNA from cells and frozen specimens. Complementary DNA (cDNA) was synthesized by using the HiScript III RT SuperMix for qPCR (+gDNA wiper) (Vazyme, Nanjing, China). Following were the primer sequences used for RT-qPCR: GGA1, forwards TCACGGAGATGGTGATGAGCCA and reverse TCCTCTG TGTCACTCGCCAGTC; TGFBI, forwards GGACATGCTCACTATCAACGGG and reverse CTGTGGACACATCAGACTCTGC; FHL3, forwards ACAAGGGTGCTCAC TACTGCGT and reverse TTCTCGATGCCACGGCTGATCA; NDUFS7, forwards AGGCACGAGGTGTCCATCAGAG and reverse CAGTTGACGAGGTCATCCAGC T; glyceraldehyde 3-phosphate dehydrogenase (GAPDH), forward CCAAAATCAGAT GGGGCAATGCTGG and reverse TGATGGCATGGACTGTGGTCATTCA.

Immunofluorescence (IF)

Cells were previously seeded onto glass coverslips overnight, fixed with 4% paraformaldehyde for 15 min, and then penetrated with 0.5% Triton X-100 for 30 min at room temperature. After washing with PBS for three times, the cells were incubated with primary antibodies (1:100) against target proteins in blocking buffer at 4 °C overnight and with the corresponding secondary antibodies for 1 h at room temperature. Then, the ProLongTM Glass Antifade Mountant with NucBlueTM (Invitrogen, USA) was applied to mount the fixed cells for 5 min at room temperature, and the fixed cells were kept in the dark at 4 °C. Microscopy detection was performed, and images were analyzed under a Zeiss Axioskop-2 microscope.

Vector’s construction and Transfection

The cDNA of GGA1, NDUFSF7, TGFB1 and FHL3 were amplified from HCT116 cell line and respectively cloned into pCDH-CMV-MCS-EF1-Puro vector. The Lipofectamine™ 3000 Reagent (Invitrogen) was used to mediate the plasmid containing the target gene into cells according to the recommendation of protocol. The transient transfection of plasmids and siRNAs were performed using the Lipofectamine 3000 kit (Invitrogen, USA) according to the recommendation of protocol. The siRNA sequences for transfection are listed as follow: TGFBI: CCACTACATTGATGAGCTA; FHL3: TCGAGAATGTCTGGTCTGT; NDUFS7: GGCACACTCACCAACAAGA; GGA1: GGTCGTGTCTCCCAAGTAT.

Detection of mismatch repair (MMR)

Immunohistochemistry (IHC) staining was performed to detect the MMR status in primary tumor specimens by using antibodies targeting MLH1 (clone ES05; Zhong Shan Jin Qiao, Beijing, China, 1:40), MSH2 (clone RED2; Zhong Shan Jin Qiao, Beijing, China, 1:200), MSH6 (clone UMAB258; Zhong Shan Jin Qiao, Beijing, China, 1:200) and PMS2 (clone EP51; Zhong Shan Jin Qiao, Beijing, China, 1:40). Tumors showing the loss of at least one MMR protein by IHC in any tumor nuclei were designated as MMR deficient (dMMR), whereas those tumors with intact expression in all tumor nuclei were designated as MMR proficient (pMMR). The positive nuclear staining of lymphocytes, stromal cells and normal epithelial cells served as internal controls.

Multiple immunohistochemistry (mIHC)

Then artificial intelligence (AI)-assisted multiplex IHC (Supplementary Fig. 1b) was performed to develop and validate the prognosis value of the relapse-specific markers. After the specificities of the antibodies employed were validated by siRNA knockdown or recombinant expression via IF (Fig. S3, S4). A multiplex IHC platform was constructed, and the stability of the platform was verified with a variety of antibodies. Validated primary antibodies including GGA1 (H00026088-M01, NOVUS, USA, 1:200), FHL3 (11028-2-AP, Proteintech, China, 1:300), NDUFS7 (15728-1-AP, Proteintech, China, 1:100) and TGFBI (ab170874, Abcam, USA, 1:400) were sequentially applied, followed by horseradish peroxidase (HRP)-conjugated secondary antibody incubation and tyramide signal amplification (Supplementary Fig. 4, Supplementary Table 3). In AI-assisted analyses, identification of the tumor region and intratumoral stromal region was performed through inForm following the steps check-train-confirm. inForm software was used to determine Hscore of each marker in tumor area or intratumoral stromal area. The data were normalized for further analysis. Then, we found that 3 of 4 proteins were mainly expressed in tumor cells, while TGFBI was more highly expressed in stromal cells than in tumor cells (Supplementary Figs. 3, 4).

PS classifier construction

In the discovery stage, differentially expressed proteins (DEP) were identified as we previously described11. Wilcoxon test was used to perform the DEP analysis between the tissue groups to identify relapse-specific DEPs (relapse vs non-relapse tumors) and cancer-specific DEPs (tumor vs normal tissues). The least absolute shrinkage and selection operator (LASSO)/ SVM logistic-based machine learning approach38,39 was applied into further selection and was used to build the PS including the specific proteins (Supplementary Figure 5-6; Supplementary Table 4). The risk score of each patient was calculated using the following formula based on their regression coefficient of the expression levels of these markers: Risk score = β1χ1 + β2χ2 + β3χ3 +…… + βnχn. The regression coefficient was calculated by the COX model. For each of the training cohort and the two validation cohorts, X-tile plots were used to generate an optimum cutoff value (Supplementary Fig. 7) to stratify patients into high- and low-PS groups for further analysis.

Bioinformatics and statistical analysis

The primary endpoint was DFS, defined as the duration from surgery to the first observation of disease relapse (local or distant disease) or death from any cause. An additional endpoint was overall survival (OS) defined as the duration from surgery to death due to any cause. Kaplan-Meier methods were used to assess the association between the variables and survival, and the log-rank test was used to compare survival curves. Hazard ratios (HRs) were calculated by Cox regression analysis. In order to detect the subset of patients that could benefit from ACT, stratified analyses were performed according to the pathologic stage and PS associated with chemotherapy efficacy. The area under the curve (AUC) was calculated to evaluate the sensitivity and specificity of the model for predicting recurrence.

All statistical analyses were performed by R software (R Foundation for Statistical Computing, Vienna, Austria). Least Absolute Shrinkage and Selection Operator (LASSO) and support vector machine-recursive feature elimination (SVM-RFE) analyses were done using the “glmnet” package and e1071. Nomograms and calibration plots were generated using rms package. GSEA analysis was generated using “clusterprofiler” package40. P values less than 0.05 were considered statistically significant, and all statistical tests were two sided.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.