Construction and validation of a prognostic nomogram for predicting cancer-specific survival in patients with intermediate and advanced colon cancer after receiving surgery and chemotherapy

Background Existing predictive models often focus solely on overall survival (OS), neglecting the bias that other causes of death might introduce into survival rate predictions. To date, there is no strict predictive model established for cancer-specific survival (CSS) in patients with intermediate and advanced colon cancer after receiving surgery and chemotherapy. Methods We extracted the data from the Surveillance, Epidemiology, and End Results (SEER) database on patients with stage-III and -IV colon cancer treated with surgery and chemotherapy between 2010 and 2015. The cancer-specific survival (CSS) was assessed using a competitive risk model, and the associated risk factors were identified via univariate and multivariate analyses. A nomogram predicting 1-, 3-, and 5-year CSS was constructed. The c-index, area under the curve (AUC), and calibration curve were adopted to assess the predictive performance of the model. Additionally, the model was externally validated. Results A total of 18 risk factors were identified by univariate and multivariate analyses for constructing the nomogram. The AUC values of the nomogram for the 1-, 3-, and 5-year CSS prediction were 0.831, 0.842, and 0.848 in the training set; 0.842, 0.853, and 0.849 in the internal validation set; and 0.815, 0.823, and 0.839 in the external validation set. The C-index were 0.826 (se: 0.001), 0.836 (se: 0.002) and 0.763 (se: 0.013), respectively. Moreover, the calibration curve showed great calibration. Conclusion The model we have constructed is of great accuracy and reliability, and can help physicians develop treatment and follow-up strategies that are beneficial to the survival of the patients. Supplementary Information The online version contains supplementary material available at 10.1007/s00432-023-05154-7.


Introduction
Colon cancer is one of the most prevalent gastrointestinal malignancies all around the world.It was reported that there were over 1.9 million patients diagnosed with colorectal cancer in 2020, and more than 935,000 patients died of it (Sung et al. 2020).Colon cancer is the fifth leading cause of cancer-induced death in China, and the number of the newlydiagnosed patients has surpassed that of gastric cancer (Cao et al. 2020(Cao et al. , 2021)).With the recent popularity of early screening for colon cancer and significant progress in the diagnosis and clinical management, the prognosis of colon cancer patients in China is significantly improved.However, these patients still have a poorer 5-year overall survival than the patients in developed countries such as Japan, the United States, Canada, and North Europe (Allemani et al. 2018).
Yiheng Shi, Xiaoting Wu and Wanxi Qu have contributed equally to this work and share first authorship.
Surgery combined with chemotherapy is the most important treatment for patients with intermediate and advanced colon cancer, which has been demonstrated to effectively improve the patients' prognosis and extend their survival (Labianca et al. 2010;Benson et al. 2018).During the follow-up process, some late-stage patients on surgery and chemotherapy were found to have survival time close to or even exceeding ten years.However, improvement in prognosis is typically accompanied by increased competitive risk.In addition to colon cancer-specific death, these patients could potentially die of other events such as cardiovascular diseases, chronic lower respiratory diseases, and accidents (Siegel et al. 2020).Compared with early-stage colon cancer, those at stages III or IV progress more rapidly, with higher-grade malignancy.These patients have a greater risk of recurrence and metastasis even after receiving surgery and chemotherapy, leading to a poorer prognosis (Snaebjornsson et al. 2017).In the clinical assessment of the prognosis of these patients, only all-cause mortality is often considered, while the impact of other competitive causes of death on patient prognosis is overlooked.This will obviously cause a significant predictive bias and reduce the predictive value of their true mortality rate.Therefore, from a clinical perspective, cancer-specific survival (CSS) can better reflect the basic characteristics of the tumor and the impact of treatment plans on survival.Additionally, identifying the CSS of patients can help develop treatment plans and followup strategies that are beneficial to the survival of patients.Moreover, studies have found that traditional survival analyses such as the Kaplan-Meier method and Cox regression tend to overestimate the risk of cancer-specific death in patients when competitive risks exist, as they treat competing events as independent outcomes, leading to the occurrence of competing risk bias (Cai et al. 2020;de Glas et al. 2016;Wang et al. 2022a).For patients with intermediate and advanced colon cancer who have achieved good survival benefits after systemic treatment, there is still a lack of a more comprehensive and accurate predictive model.To better guide clinical decision-making and provide patients with more accurate individualized risk assessments and prognostic management, competitive risk events should not be ignored.Both cancer and non-cancer mortality risks should be seriously considered by clinicians (Chesney et al. 2021).A competing risk model can correct the aforementioned bias (Walraven and Hawken 2016) and handle survival data with multiple outcome events, with its focus no longer limited to a single outcome indicator, and it has been widely applied clinically (Wang et al. 2022a, b;Lv et al. 2021a).
This study, based on the Surveillance, Epidemiology, and End Results (SEER) database, constructed a comprehensive competing risk nomogram model to predict the survival benefits of patients with intermediate and advanced colon cancer after receiving surgery and chemotherapy.The model was thoroughly validated using patient data from the Affiliated Hospital of Xuzhou Medical University.

Patients from SEER database
The data for modeling were collected from the SEER database.SEER is an open-access clinical database that contains cancer data of patients accounting for 30% of the U.S. population, covering 18 States (Doll et al. 2018).Data of the patients, who were diagnosed with stage-III and -IV colon cancer between 2010 and 2015 and had undergone surgery and chemotherapy, were collected.The detailed inclusion criteria were: 1. Patients were pathologically diagnosed with colon cancer, excluding the rectum (site recode, international classification of diseases for oncology   1.For ease of subsequent research and analysis, we renamed and assigned values to the included indicators, as shown in Supplementary Table 2 and 3.

Construction and validation of the competitive risk model
In this study, colon cancer-specific death was chosen as the outcome of interest, and deaths caused by other reasons were defined as competitive risk events.Patients still alive were classified as right-censored data.A competing risk model was used to predict the cumulative incidence function (CIF) of events.This model can handle survival data with multiple outcome events simultaneously, and can obtain more accurate predictions by calculating the CIF of each outcome event (Häggström et al. 2016).A total of 16,621 patients were randomly assigned in a 7:3 ratio into a training set (n = 11,634) and an internal validation set (n = 4987).The data of patients from our hospital were used as an external validation set (n = 264).Risk factors associated with the CSS of colon cancer patients were identified through univariate and multivariate analyses.We constructed a competitive risk model with these risk factors to predict the 1-year, 3-year, and 5-year CSS for patients with intermediate and advanced colon cancer.The c-index, receiver operator characteristic curve (ROC), area under the curve (AUC), and calibration curve were adopted to assess the predictive performance of the model.In addition, we compared the mortality predicted by the conventional survival analysis with that predicted by the competitive risk model.

Statistical analyses
R studio and SPSS 26 software were adopted for statistical analyses.Categorical data were expressed as percentage and Chi-square test was applied to compare the differences between the groups.Continuous data in normal-distribution were described as mean ± standard deviation (SD).We calculated the cumulative incidence function (CIF) of each variable to avoid prediction bias in conventional analysis, and plotted CIF curve.Univariate analysis was performed using the Fine-Gray test, and variables with statistical significance were selected for multivariate analysis to identify CSS-associated independent risk factors.A competitive risk model-based nomogram was constructed using the Fine-Gray proportional hazards model.In addition, we compared the competitive risk model with conventional survival analysis in the predictive performance for the 1-year, 3-year, and 5-year CSS.A p value less than 0.05 would be considered statistically significant.

Risk stratification
We calculated the scores of each patient according to the prognosis-associated risk factors.In the training set, the Taking the median score as the cutoff, the patients were classified into a low-risk group and a high-risk group, and performed risk stratification for data in both the internal validation set and the external one.Patients in the high-risk group had an evidently lower cumulative cancer-specific mortality than those in the low-risk group (Fig. 6A-C).The 1-year, 3-year, and 5-year cancer-specific mortality of the patients in the training set were 15.5%, 46.5%, and 59.4%, respectively, in the high-risk group, and were 1.6%, 8.0%, and 13.8%, respectively, in the low-risk group.The cancer-specific mortality of the patients in the validation set was similar to those in the training set.

Comparison of different models for survival risk prediction
In addition, we compared the competitive risk model with conventional survival analysis in the prediction of the mortality risk at different time points, as shown in Table 3.For data from SEER, the 1-year, 3-year, and 5-year risk of death were 9.61%, 30.69%, and 41.18%, respectively, by conventional survival analysis, and were 8.61%, 27.46%, and 36.72%,respectively, by the competitive risk model.As for data from our hospital, the 1-year, 3-year, and 5-year risk of death were 12.96%, 36.84%,44.56%, respectively, by conventional survival analysis, and were 12.12%, 34.47%, 41.69%, respectively, by the competitive risk model.

Discussion
In this study, we used a competitive risk model based on SEER and introduced a nomogram that could predict the 1-year, 3-year, and 5-year CSS of intermediate and advanced colon cancer patients who had undergone surgery and chemotherapy, and assessed its performance through internal and external validation.The c-index, ROC, and calibration curve indicated that the nomogram constructed in this study was of great reliability and remarkable predictive value, and its predictive performance was significantly better than the AJCC-TNM staging system.Furthermore, we performed risk stratification in those patients.The cancer-specific mortality of high-risk patients was higher than those with low risk at different time points, which could be helpful to the clinical risk-stratifying and decision-making.We found that the 1-year, 3-year, and 5-year CSS predicted by the conventional survival analysis were lower than that predicted by the competitive risk model, which suggests bias in the results of conventional survival analysis under the existence of competitive risk (Wolbers et al. 2014) so that the mortality of colon cancer patients could be overestimated.It is estimated that about 46% of medical studies using the Kaplan-Meier method for survival analysis are affected by competing risk factors, which may lead to an overestimation of the real risk of the event by 10% (Walraven and McAlister 2016).A study by Zhou et al. (2019) yielded a consistent conclusion.If it does not be taken into account the effects of competitive risk events on the prognosis, there would be an underestimated survival of colon cancer patients at stage-I and -III.On the other hand, there are studies proposing that a competitive risk model is more proper to be recommended in the assessment of disease-specific mortality.It is of higher reliability and more accurate predictive value (Wolbers et al. 2014;Verduijn et al. 2011;Xu et al. 2021a).TNM-staging system remains the most commonly used tool in clinical settings for the prognostic assessment of cancer patients, which assesses the patients` to be applicable for the whole population (Xu et al. 2021a).However, the TNM-staging system has its own limitations in precisely distinguishing individual prognostic variances.Except for TNM-staging, several demographical or clinical information, such as age, marital status, treatments, and tumor biomarkers, could also affect patients` prognosis (Liu et al. 2020;Xu et al. 2021b).In recent years, nomogram presents to be of great potential in the field of oncology, which forms, by adding other prognosis-associated key factors on the basis of AJCC-TNM staging, a comprehensive clinical predictive model (Balachandran et al. 2015), and lists variables in the form of figures.By quantifying each variable into a specific score, it calculates the cumulative score of all the variables and matches it with the result scale to produce the final predicted probability (Wang et al. 2020).Nomograms can help clinicians and patients to make more accurate decisions due to their merits of being intuitive, individualized, and rapid in prognostic prediction (Balachandran et al. 2015;Li et al. 2021).
In this study, 18 independent risk factors were identified to be associated with the prognosis of intermediate and advanced colon cancer patients receiving surgery and chemotherapy, and constructed a comprehensive prediction model with these factors in the form of the nomogram.We found that age was evidently associated with the survival of colon cancer patients.The older the patient was, the worse the prognosis would be, which might be related to progressive organ failure and increasing susceptibility to concomitant diseases (Yamano et al. 1990;Sorbye et al. 2013).CEA level is also a crucial indicator revealing the prognosis of colon cancer patients (Locker et al. 2006).CEA testing is a necessary preoperative examination.A study by Zhou et al. (2019) discovered that stage-I, -II, and -III colon cancer patients with higher CEA levels had significantly poorer OS and CSS.In this study, we also observed that the prognosis of stage-III and -IV colon patients with higher CEA levels was poorer.We also found that a larger tumor size was associated with a poorer prognosis, which could be related to that large tumors tends to have stronger invasiveness.Another study Saha et al. (2015) based on U.S. National Cancer Database also found that among colon cancer patients, a larger tumor size indicated a poorer prognosis.Moreover, the number of lymphatic metastases is an important risk factor for assessing the prognosis of colon cancer patients.More lymph node metastases would indicate poorer survival outcomes (Kataoka et al. 2019).We focused not just on the number of positive lymph nodes because the results were subjected to the number of lymph nodes submitted for examination during the surgery.We used LNR as an alternative, and it has been demonstrated that LNR is more accurate than the N stage in predicting the prognosis of the patients (Parnaby et al. 2015).In addition, we assessed the effects of tumor deposition and perineural invasion on the patients` prognosis.According to the definition in the 7th edition of the AJCC-TNM classification, tumor deposition, also known as cancer nodules, refers to isolated nodules located within the lymphatic draining area of the primary lesion.These nodules often contain no identifiable lymph nodes, blood vessels, or nerves.The presence of tumor deposition has been proven to be a prognosis-associated risk factor colon cancer patients (Cohen et al. 2021).This is consistent with the results shown   (Jin et al. 2020;Kuai et al. 2021;Lv et al. 2021b).This is the first study using a competitive risk model for CSS-prediction in intermediate and advanced colon cancer patients after receiving surgery and chemotherapy.It is demonstrated, through internal and external validation, that the model is of considerable accuracy and reliability.However, some limitations still exist.Firstly, the SEER database cannot include all the risk factors that are associated with the patients' prognosis, which could result in selection bias when modeling.Targeted therapy is another crucial approach for colon cancer patients at intermediate and advanced stages, especially for those at stage-IV and with metastasis.Secondly, the data in the SEER database are incomplete.For example, it only collects the patients' CEA levels tested before the surgery, while the postoperative and follow-up results are not included.Postoperative CEA level would be more instructive (Konishi et al. 2018).Patients with a CEA level that increased before the surgery and decreased to normal after the surgery have a similar prognosis as those with a CEA level normal before the surgery, while patients with a postoperative CEA level still over the normal baseline might have a higher risk of local recurrence and metastasis.On the other hand, there are no records of specific regimens and courses for patients who received chemotherapy.Lastly, this study is a retrospective-design, and further validation is needed in prospective populations.

Fig. 2
Fig. 2 Cumulative cancer-specific mortality of each independent risk factor (CIF) A race; B sex; C marital status; D tumor stage; E T stage; F N stage; G tumor grade; H tumor site; I lymphadenectomy;

Fig. 3 Fig. 5
Fig. 3 ROC of the competitive risk nomogram predicting the 1-year, 3-year, and 5-year CSS; A training set, B internal validation set, C external validation set; ROC, receiver operating characteristic; CSS, cancer-specific survival; AUC, area under ROC curve

Fig. 6
Fig. 6 Estimated cumulative mortality of high-risk patients and low-risk patients in A training set, B internal validation set, and C external validation set."1" represented cancer-specific death, and "2" represented death from other causes The patients had received surgery and chemotherapy.4. Year of the diagnosis was from 2010 to 2015.

Table 1
Clinical information and demographical characteristics of the patients in the training, internal validation, and external validation sets

Table 2
lowest score was 38.84, the highest score was 472.79, and the median was 279.54.

Table 3
Mayo et al. (2016)19)heng et al. 2020ival analysis and the competitive risk model in predicting the risk of death at different time-points the nomogram in this study.Multiple studies(Mirkin et al. 2018;Zheng et al. 2020) have found that colon cancer patient concomitant with tumor deposition and lymph node metastasis could have poorer survival outcomes.Perineural invasion refers to the invasion of cancer cells into nerve tissues surrounding the intestinal wall, which reflects the histopathological characteristics of tumor invasion and is considered to be an indicator of adverse prognosis in colon cancer patients(Skancke et al. 2019).Mayo et al. (2016)reported that patients with perineural invasion would have a more adverse prognosis.Perineural invasion could directly affect the OS and CSS of colon cancer patients, which is consistent with what we observed in this study.Besides, other independent risk factors such as marital status, gender, history of metastasectomy, and tumor grades have been demonstrated by multiple studies by