Introduction

Colon cancer is one of the most prevalent gastrointestinal malignancies all around the world. It was reported that there were over 1.9 million patients diagnosed with colorectal cancer in 2020, and more than 935,000 patients died of it (Sung et al. 2020). Colon cancer is the fifth leading cause of cancer-induced death in China, and the number of the newly-diagnosed patients has surpassed that of gastric cancer (Cao et al. 2020, 2021). With the recent popularity of early screening for colon cancer and significant progress in the diagnosis and clinical management, the prognosis of colon cancer patients in China is significantly improved. However, these patients still have a poorer 5-year overall survival than the patients in developed countries such as Japan, the United States, Canada, and North Europe (Allemani et al. 2018).

Surgery combined with chemotherapy is the most important treatment for patients with intermediate and advanced colon cancer, which has been demonstrated to effectively improve the patients' prognosis and extend their survival (Labianca et al. 2010; Benson et al. 2018). During the follow-up process, some late-stage patients on surgery and chemotherapy were found to have survival time close to or even exceeding ten years. However, improvement in prognosis is typically accompanied by increased competitive risk. In addition to colon cancer-specific death, these patients could potentially die of other events such as cardiovascular diseases, chronic lower respiratory diseases, and accidents (Siegel et al. 2020). Compared with early-stage colon cancer, those at stages III or IV progress more rapidly, with higher-grade malignancy. These patients have a greater risk of recurrence and metastasis even after receiving surgery and chemotherapy, leading to a poorer prognosis (Snaebjornsson et al. 2017). In the clinical assessment of the prognosis of these patients, only all-cause mortality is often considered, while the impact of other competitive causes of death on patient prognosis is overlooked. This will obviously cause a significant predictive bias and reduce the predictive value of their true mortality rate. Therefore, from a clinical perspective, cancer-specific survival (CSS) can better reflect the basic characteristics of the tumor and the impact of treatment plans on survival. Additionally, identifying the CSS of patients can help develop treatment plans and follow-up strategies that are beneficial to the survival of patients. Moreover, studies have found that traditional survival analyses such as the Kaplan–Meier method and Cox regression tend to overestimate the risk of cancer-specific death in patients when competitive risks exist, as they treat competing events as independent outcomes, leading to the occurrence of competing risk bias (Cai et al. 2020; de Glas et al. 2016; Wang et al. 2022a). For patients with intermediate and advanced colon cancer who have achieved good survival benefits after systemic treatment, there is still a lack of a more comprehensive and accurate predictive model. To better guide clinical decision-making and provide patients with more accurate individualized risk assessments and prognostic management, competitive risk events should not be ignored. Both cancer and non-cancer mortality risks should be seriously considered by clinicians (Chesney et al. 2021). A competing risk model can correct the aforementioned bias (Walraven and Hawken 2016) and handle survival data with multiple outcome events, with its focus no longer limited to a single outcome indicator, and it has been widely applied clinically (Wang et al. 2022a, b; Lv et al. 2021a).

This study, based on the Surveillance, Epidemiology, and End Results (SEER) database, constructed a comprehensive competing risk nomogram model to predict the survival benefits of patients with intermediate and advanced colon cancer after receiving surgery and chemotherapy. The model was thoroughly validated using patient data from the Affiliated Hospital of Xuzhou Medical University.

Methods

Patients from SEER database

The data for modeling were collected from the SEER database. SEER is an open-access clinical database that contains cancer data of patients accounting for 30% of the U.S. population, covering 18 States (Doll et al. 2018). Data of the patients, who were diagnosed with stage-III and -IV colon cancer between 2010 and 2015 and had undergone surgery and chemotherapy, were collected. The detailed inclusion criteria were:

  1. 1.

    Patients were pathologically diagnosed with colon cancer, excluding the rectum (site recode, international classification of diseases for oncology ICD-O-3/WHO 2008).

  2. 2.

    The diagnosis was based on the criteria of AJCC Seventh Edition.

  3. 3.

    The patients had received surgery and chemotherapy.

  4. 4.

    Year of the diagnosis was from 2010 to 2015.

Exclusion criteria:

  1. 1.

    Expected survival time less than 1 month.

  2. 2.

    Stage-I and -II colon cancer patients diagnosed based on AJCC Seventh Edition.

  3. 3.

    Patients with incomplete or unclear demographic information, relevant clinical indicators, and survival status.

External validation data and ethical statement

Data on 264 colon cancer patients at the Affiliated Hospital of Xuzhou Medical University from March 2014 to March 2018 were collected, according to the inclusion and exclusion criteria, to externally validate the model. The training set and internal validation data were obtained from the publicly available SEER database, and thus no additional informed consent was required. The external validation was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Review Committee of Xuzhou Medical University, with the approval number: XYFY2023-KL004-01.

Variable selection

Demographic information extracted from the SEER database includes Age recode with single ages and 100 + , Sex, Race recode (W, B, AI, API), and Marital status at diagnosis. Clinical indicators encompass Site recode ICD-O-3/WHO 2008, CS tumor size (2004–2015), Derived AJCC Stage Group, 7th ed (2010–2015), Grade, Surg Prim Site (1998 +), Regional nodes positive (1988 +), Scope Reg LN Sur (2003 +), Surg Oth Reg/Dis (2003 +), Radiation recode, CEA Pretreatment Interpretation Recode (2010 +), Perineural Invasion Recode (2010 +), Tumor Deposits Recode (2010 +), SEER Combined Mets at DX-bone (2010 +), SEER Combined Mets at DX-brain (2010 +), SEER Combined Mets at DX-liver (2010 +), SEER Combined Mets at DX-lung (2010 +), First malignant primary indicator, COD to site rec KM, Survival months, and SEER cause-specific death classification. The Lymph node ratio (LNR) is defined as Regional nodes positive/Scope Reg LN Sur. The outcome event is cancer-specific death, and the survival time is the time from the patient being diagnosed with intermediate and advanced colon cancer to death from colon cancer. The detailed selection process of patients is shown in Supplementary Table 1. For ease of subsequent research and analysis, we renamed and assigned values to the included indicators, as shown in Supplementary Table 2 and 3.

Construction and validation of the competitive risk model

In this study, colon cancer-specific death was chosen as the outcome of interest, and deaths caused by other reasons were defined as competitive risk events. Patients still alive were classified as right-censored data. A competing risk model was used to predict the cumulative incidence function (CIF) of events. This model can handle survival data with multiple outcome events simultaneously, and can obtain more accurate predictions by calculating the CIF of each outcome event (Häggström et al. 2016). A total of 16,621 patients were randomly assigned in a 7:3 ratio into a training set (n = 11,634) and an internal validation set (n = 4987). The data of patients from our hospital were used as an external validation set (n = 264). Risk factors associated with the CSS of colon cancer patients were identified through univariate and multivariate analyses. We constructed a competitive risk model with these risk factors to predict the 1-year, 3-year, and 5-year CSS for patients with intermediate and advanced colon cancer. The c-index, receiver operator characteristic curve (ROC), area under the curve (AUC), and calibration curve were adopted to assess the predictive performance of the model. In addition, we compared the mortality predicted by the conventional survival analysis with that predicted by the competitive risk model.

Statistical analyses

R studio and SPSS 26 software were adopted for statistical analyses. Categorical data were expressed as percentage and Chi-square test was applied to compare the differences between the groups. Continuous data in normal-distribution were described as mean ± standard deviation (SD). We calculated the cumulative incidence function (CIF) of each variable to avoid prediction bias in conventional analysis, and plotted CIF curve. Univariate analysis was performed using the Fine-Gray test, and variables with statistical significance were selected for multivariate analysis to identify CSS-associated independent risk factors. A competitive risk model-based nomogram was constructed using the Fine-Gray proportional hazards model. In addition, we compared the competitive risk model with conventional survival analysis in the predictive performance for the 1-year, 3-year, and 5-year CSS. A p value less than 0.05 would be considered statistically significant.

Results

Patients’ clinical characteristics

A total of 16,621 stage-III and -IV colon cancer patients were included, who had received surgery and chemotherapy. These patients were randomly assigned, in a 7:3 ratio, into a training set (n = 11,634) and an internal validation set (n = 4987). Patients at stage-III accounted for 67.6% (n = 11,242) and those at stage-IV accounted for 32.4% (n = 5379). For patients with distant metastases, most of them were liver metastasis (n = 3926, 23.6%). The incidence of tumor deposition (n = 4615, 27.7%) and perineural invasion (n = 3714, 22.3%) were low even in those patients. There were 8,703 CEA-positive patients (52.4%) and 7,091 CEA-normal patients (47.6%). There were 6165 patients who died of colon cancer, accounting for 78% of the total number of death, and 1,735 patients died of other reasons, accounting for 22%. The external validation set included information from 264 patients. Among these patients, 184 (70%) were stage-III colon cancer patients and 80 (30%) were stage-IV colon cancer patients. By the end of the follow-up, 123 patients (45.4%) were still alive, 124 died specifically of colon cancer, accounting for 87.9% of the total number of death, and 17 died of other reasons, accounting for 12.1%. Clinical information and demographical characteristics of the patients are shown in Table 1.

Table 1 Clinical information and demographical characteristics of the patients in the training, internal validation, and external validation sets

Nomogram construction based on the competitive risk model

Univariate analysis showed that Age, Sex, Race, Marital status, Tumor site, grade, stage, T stage, N stage, Surgery type, Lymphadenectomy, Metastasectomy, Radiation, CEA, Perineural Invasion, Tumor Deposits, Tumor size, LNR, Bone metastasis, Brain metastasis, Liver metastasis, and Lung metastasis are independent risk factors for colon cancer-specific death. Multivariate analysis of the above risk factors showed: age (HR 1.01,95% CI 1–1.01), Marital status (married as a reference; single: HR 1.11, 95% CI 1.02–1.21), Sex (male as a reference; female: HR 0.93, 95% CI 0.87–1), Race (white as a reference; black: HR 1.17,95% CI 1.07–1.28), Tumor grade (grade I/II as a reference; grade III/IV: HR 1.26, 95% CI 1.17–1.36), Tumor stage (stage III as a reference; stage IV: HR 2.78 2.49–2.39), Tumor size (HR 1, 95% CI 1–1), Tumor site (Ascending Colon as a reference; Hepatic Flexure (HR 1.18,95% CI 1–1.4; Sigmoid Colon: HR 0.78,95% CI 0.7–0.88), T stage (T1-2 as a reference; T3: HR 1.51, 95% CI 1.29–1.77; T4: HR 2.31, 95% CI 1.96–2.73), N stage (N0/N1 as a reference; N2: HR 1.11, 95% CI 1.01–1.21), Lymphadenectomy (0 as a reference; > 4: HR 0.7, 95% CI 0.52–0.94), Metastasectomy (NO as a reference; YES: HR 0.85, 95% CI 0.78–0.93), LNR (HR 2.9,95% CI 2.41–3.47), CEA (negative/normal as a reference; positive/elevated: HR 1.33, 95% CI 1.23–1.43), Perineural Invasion (Not identified/present as a reference, Identified/present: HR 1.14, 95% CI 1.06–1.23), Tumor Deposits (NO as a reference; YES: HR 1.18, 95% CI 1.1–1.27), Lung metastasis (NO as a reference; YES: HR 1.19, 95% CI 1.05–1.35), and Liver metastasis (NO as a reference; YES: HR 1.4, 95% CI 1.27–1.56) are independent risk factors for colon cancer-specific death (Table 2). Based on the results of multivariate analysis, a nomogram based on competing risk model was constructed to predict the 1-year, 3-year, and 5-year CSS of stage-III and -IV colon cancer patients after surgery and chemotherapy (Fig. 1). According to the competing risk model, we calculated the CIF for each independent risk factor affecting patient prognosis, as shown in Fig. 2, in which the number 1 indicates cancer-specific death and number 2 indicates death from other causes.

Table 2 Univariate and multivariate analysis of CSS in the training set
Fig. 1
figure 1

1-year, 3-year, and 5-year CSS-prediction nomogram

Fig. 2
figure 2

Cumulative cancer-specific mortality of each independent risk factor (CIF) A race; B sex; C marital status; D tumor stage; E T stage; F N stage; G tumor grade; H tumor site; I lymphadenectomy; J metastasectomy; K CEA; L perineural invasion; M tumor deposits; N liver metastasis; O lung metastasis; “1” represented cancer-specific death, and “2” represented death from other causes

Nomogram Verification

This study applied c-index, ROC, and AUC to assess the predictive accuracy of the nomogram, and tested its calibration through providing a calibration curve. The c-index of the nomogram in the training set was 0.826 (se:0.001), and the AUC predicting the 1-year, 3-year, and 5-year CSS was 0.831 (95% CI 0.818–0.843), 0.842 (95% CI 0.834–0.851), and 0.848 (95% CI 0.840–0.857), respectively (Fig. 3A). This suggested the model was of considerably accurate predictive performance in the training set. The c-index in the internal validation set and external validation set was 0.836 (se: 0.002) and 0.763 (se: 0.013), respectively. The AUC predicting the 1-year, 3-year, and 5-year CSS was 0.842 (95% CI 0.825–0.860), 0.853 (95% CI 0.842–0.865), and 0.849 (95% CI 0.836- 0.862), respectively, in the internal validation set (Fig. 3B), and were 0.815 (95% CI 0.726–0.903), 0.823 (95% CI 0.767–0.879), 0.839 (95% CI 0.786–0.892), respectively, in the external validation set (Fig. 3C). This indicated the model was of great predictive value in both the internal and external validation, with high reliability. We further compared it with the conventional AJCC-TNM staging model. The AUC of AJCC-TNM staging model in predicting the 1-year, 3-year, and 5-year CSS was 0.718 (95% CI 0.703–0.732), 0.729 (95% CI 0.719–0.738), and 0.738 (95% CI 0.729–0.748), respectively, in the training set (Fig. 4A), were 0.723 (95% CI 0.701–0.745), 0.737 (95% CI 0.723–0.751), and 0.738 (95% CI 0.723–0.752), respectively, in the internal validation set (Fig. 4B), and were 0.614 (95% CI 0.523–0.706), 0.637(95% CI 0.577–0.697), 0.619 (95% CI 0.561–0.677), respectively, in the external validation set (Fig. 4C). The results firmly supported that the competitive risk model that we constructed was more effective than the conventional AJCC-TNM staging model. There was a good calibration between the predicted risk and the actual risk in both the training set and the validation set (Fig. 5A-C).

Fig. 3
figure 3

ROC of the competitive risk nomogram predicting the 1-year, 3-year, and 5-year CSS; A training set, B internal validation set, C external validation set; ROC, receiver operating characteristic; CSS, cancer-specific survival; AUC, area under ROC curve

Fig. 4
figure 4

ROC of the AJCC-TNM staging system predicting the 1-year, 3-year, and 5-year CSS; A training set, B internal validation set, C external validation set

Fig. 5
figure 5

Calibration curve of the competitive risk nomogram predicting the 1-year, 3-year, and 5-year CSS; A training set, B internal validation set, C external validation set

Risk stratification

We calculated the scores of each patient according to the prognosis-associated risk factors. In the training set, the lowest score was 38.84, the highest score was 472.79, and the median was 279.54. Taking the median score as the cut-off, the patients were classified into a low-risk group and a high-risk group, and performed risk stratification for data in both the internal validation set and the external one. Patients in the high-risk group had an evidently lower cumulative cancer-specific mortality than those in the low-risk group (Fig. 6A–C). The 1-year, 3-year, and 5-year cancer-specific mortality of the patients in the training set were 15.5%, 46.5%, and 59.4%, respectively, in the high-risk group, and were 1.6%, 8.0%, and 13.8%, respectively, in the low-risk group. The cancer-specific mortality of the patients in the validation set was similar to those in the training set.

Fig. 6
figure 6

Estimated cumulative mortality of high-risk patients and low-risk patients in A training set, B internal validation set, and C external validation set. “1” represented cancer-specific death, and “2” represented death from other causes

Comparison of different models for survival risk prediction

In addition, we compared the competitive risk model with conventional survival analysis in the prediction of the mortality risk at different time points, as shown in Table 3. For data from SEER, the 1-year, 3-year, and 5-year risk of death were 9.61%, 30.69%, and 41.18%, respectively, by conventional survival analysis, and were 8.61%, 27.46%, and 36.72%, respectively, by the competitive risk model. As for data from our hospital, the 1-year, 3-year, and 5-year risk of death were 12.96%, 36.84%, 44.56%, respectively, by conventional survival analysis, and were 12.12%, 34.47%, 41.69%, respectively, by the competitive risk model.

Table 3 Differences between conventional survival analysis and the competitive risk model in predicting the risk of death at different time-points

Discussion

In this study, we used a competitive risk model based on SEER and introduced a nomogram that could predict the 1-year, 3-year, and 5-year CSS of intermediate and advanced colon cancer patients who had undergone surgery and chemotherapy, and assessed its performance through internal and external validation. The c-index, ROC, and calibration curve indicated that the nomogram constructed in this study was of great reliability and remarkable predictive value, and its predictive performance was significantly better than the AJCC-TNM staging system. Furthermore, we performed risk stratification in those patients. The cancer-specific mortality of high-risk patients was higher than those with low risk at different time points, which could be helpful to the clinical risk-stratifying and decision-making.

We found that the 1-year, 3-year, and 5-year CSS predicted by the conventional survival analysis were lower than that predicted by the competitive risk model, which suggests bias in the results of conventional survival analysis under the existence of competitive risk (Wolbers et al. 2014) so that the mortality of colon cancer patients could be overestimated. It is estimated that about 46% of medical studies using the Kaplan–Meier method for survival analysis are affected by competing risk factors, which may lead to an overestimation of the real risk of the event by 10% (Walraven and McAlister 2016). A study by Zhou et al. (2019) yielded a consistent conclusion. If it does not be taken into account the effects of competitive risk events on the prognosis, there would be an underestimated survival of colon cancer patients at stage—I and -III. On the other hand, there are studies proposing that a competitive risk model is more proper to be recommended in the assessment of disease-specific mortality. It is of higher reliability and more accurate predictive value (Wolbers et al. 2014; Verduijn et al. 2011; Xu et al. 2021a). TNM-staging system remains the most commonly used tool in clinical settings for the prognostic assessment of cancer patients, which assesses the patients` prognosis mainly based on tumor size and depth of invasion (T stage), local lymph nodes involvement (N stage), and distant metastasis (M stage), and has been demonstrated to be applicable for the whole population (Xu et al. 2021a). However, the TNM-staging system has its own limitations in precisely distinguishing individual prognostic variances. Except for TNM-staging, several demographical or clinical information, such as age, marital status, treatments, and tumor biomarkers, could also affect patients` prognosis (Liu et al. 2020; Xu et al. 2021b). In recent years, nomogram presents to be of great potential in the field of oncology, which forms, by adding other prognosis-associated key factors on the basis of AJCC-TNM staging, a comprehensive clinical predictive model (Balachandran et al. 2015), and lists variables in the form of figures. By quantifying each variable into a specific score, it calculates the cumulative score of all the variables and matches it with the result scale to produce the final predicted probability (Wang et al. 2020). Nomograms can help clinicians and patients to make more accurate decisions due to their merits of being intuitive, individualized, and rapid in prognostic prediction (Balachandran et al. 2015; Li et al. 2021).

In this study, 18 independent risk factors were identified to be associated with the prognosis of intermediate and advanced colon cancer patients receiving surgery and chemotherapy, and constructed a comprehensive prediction model with these factors in the form of the nomogram. We found that age was evidently associated with the survival of colon cancer patients. The older the patient was, the worse the prognosis would be, which might be related to progressive organ failure and increasing susceptibility to concomitant diseases (Yamano et al. 1990; Sorbye et al. 2013). CEA level is also a crucial indicator revealing the prognosis of colon cancer patients (Locker et al. 2006). CEA testing is a necessary preoperative examination. A study by Zhou et al. (2019) discovered that stage-I, -II, and -III colon cancer patients with higher CEA levels had significantly poorer OS and CSS. In this study, we also observed that the prognosis of stage-III and -IV colon patients with higher CEA levels was poorer. We also found that a larger tumor size was associated with a poorer prognosis, which could be related to that large tumors tends to have stronger invasiveness. Another study Saha et al. (2015) based on U.S. National Cancer Database also found that among colon cancer patients, a larger tumor size indicated a poorer prognosis. Moreover, the number of lymphatic metastases is an important risk factor for assessing the prognosis of colon cancer patients. More lymph node metastases would indicate poorer survival outcomes (Kataoka et al. 2019). We focused not just on the number of positive lymph nodes because the results were subjected to the number of lymph nodes submitted for examination during the surgery. We used LNR as an alternative, and it has been demonstrated that LNR is more accurate than the N stage in predicting the prognosis of the patients (Parnaby et al. 2015). In addition, we assessed the effects of tumor deposition and perineural invasion on the patients` prognosis. According to the definition in the 7th edition of the AJCC-TNM classification, tumor deposition, also known as cancer nodules, refers to isolated nodules located within the lymphatic draining area of the primary lesion. These nodules often contain no identifiable lymph nodes, blood vessels, or nerves. The presence of tumor deposition has been proven to be a prognosis-associated risk factor colon cancer patients (Cohen et al. 2021). This is consistent with the results shown by the nomogram in this study. Multiple studies (Mirkin et al. 2018; Zheng et al. 2020) have found that colon cancer patient concomitant with tumor deposition and lymph node metastasis could have poorer survival outcomes. Perineural invasion refers to the invasion of cancer cells into nerve tissues surrounding the intestinal wall, which reflects the histopathological characteristics of tumor invasion and is considered to be an indicator of adverse prognosis in colon cancer patients (Skancke et al. 2019). Mayo et al. (2016) reported that patients with perineural invasion would have a more adverse prognosis. Perineural invasion could directly affect the OS and CSS of colon cancer patients, which is consistent with what we observed in this study. Besides, other independent risk factors such as marital status, gender, history of metastasectomy, and tumor grades have been demonstrated by multiple studies (Jin et al. 2020; Kuai et al. 2021; Lv et al. 2021b).

This is the first study using a competitive risk model for CSS-prediction in intermediate and advanced colon cancer patients after receiving surgery and chemotherapy. It is demonstrated, through internal and external validation, that the model is of considerable accuracy and reliability. However, some limitations still exist. Firstly, the SEER database cannot include all the risk factors that are associated with the patients' prognosis, which could result in selection bias when modeling. Targeted therapy is another crucial approach for colon cancer patients at intermediate and advanced stages, especially for those at stage-IV and with metastasis. Secondly, the data in the SEER database are incomplete. For example, it only collects the patients' CEA levels tested before the surgery, while the postoperative and follow-up results are not included. Postoperative CEA level would be more instructive (Konishi et al. 2018). Patients with a CEA level that increased before the surgery and decreased to normal after the surgery have a similar prognosis as those with a CEA level normal before the surgery, while patients with a postoperative CEA level still over the normal baseline might have a higher risk of local recurrence and metastasis. On the other hand, there are no records of specific regimens and courses for patients who received chemotherapy. Lastly, this study is a retrospective-design, and further validation is needed in prospective populations.

Conclusions

Based on data from the SEER database and our hospital, we have successfully constructed the first predictive model for the CSS of intermediate and advanced colon cancer patients after receiving surgery and chemotherapy. The model has shown to be of great predictive performance in both the training set and the validation set, and is more effective than the AJCC-TNM staging system. It can help clinicians to make individualized treatment and follow-up strategies.