Background

Colorectal cancer (CRC) is the third most frequent cancer and one of the main causes of cancer-related death [1]. In 2016, there were 1,700,000 new CRC cases and 830,000 deaths attributed to CRC worldwide [2]. Although treatment strategies, such as immunotherapy, chemotherapy and targeted agents, have been developing rapidly in recent decades, the prognosis of CRC is still unsatisfactory [3,4,5]. It was reported that the 5-year overall mortality rate was 65.0–70.0% for stage III CRC patients [6, 7]. Compared to patients without metastasis, the survival outcome of metastatic CRC patients was worse, with a 5-year survival rate of only 14.0% [8]. A previous study analysed 374 stage IV CRC patients and suggested that synchronous metastatic CRC patients had worse 3-year survival (33.0%) than metachronous CRC patients (54.0%, P = 0.0038) [9]. The incidence of synchronous distant metastasis was reported to be increasing in the latest study, with rates of 15%-20% in CRC patients [10, 11]. Survival of patients with distant metastasis significantly affects the organization of individualized treatment. Thus, studies focusing on survival estimation of initial stage IV CRC are urgently needed.

Various prognostic factors for stage IV CRC patients have been investigated in previous studies. Several demographic and clinicopathologic variables were proven to be independent prognostic factors: age at diagnosis [11], tumour size [12], lymph metastasis [12], resectability [12], and chemotherapy treatment [10]. Carcinoembryonic antigen (CEA) levels and primary tumour sites were reported to be associated with the survival of CRC [9, 13]. In a multicentre register study comprising 9624 stage IV CRC patients, a prognostic scoring system was established based on eight independent prognostic factors. The total score was ranged from 0 to 9, and higher scores indicated poorer survival [9, 13].

Nomograms are widely used as graphical prediction models, by which survival predictive points can be calculated based on the predictors [15]. Several prognostic nomograms for stage IV CRC have been constructed in recent years [16,17,18]. Based on 1133 stage IV CRC patients who received curative resection, nomograms to predict disease-free survival (DFS) and overall survival (OS) were constructed [16]. However, only four predictors (T stage, N stage, postoperative CEA and metastatic organs) were included in the models, which led to potential inaccuracy. In 2019, Hua Ge et al. reported a nomogram to predict OS in CRC patients at M1 stage [17]. Several social characteristics and tumour-related variables were included in the nomogram. Nevertheless, neither serum CEA level nor the performance of adjuvant treatment (radiation and chemotherapy) were included as predictors, which are widely considered independent prognostic factors for CRC patients [13]. Another SEER-based study enrolled 2996 CRC patients with stage IV disease, and a predictive nomogram was constructed [18]. External validation was performed based on a Chinese cohort with high discrimination (C-index of OS: 0.657; 95% CI 0.544–0.770), which indicated good transportability of the nomogram [18]. However, the nomogram was used to predict the survival of patients who underwent both primary and metastatic resection. A nomogram including as many necessary predictors as possible is urgently needed to accurately estimate the current survival of stage IV CRC.

Extracting data from the Surveillance Epidemiology and End Results (SEER) database, the present study aimed to identify prognostic factors for CRC patients with distant metastasis. The factors were then used to construct a prognostic prediction nomogram. The predictive model can help oncologists accurately estimate prognosis and guide the individualized treatment.

Methods

Data source and cohort selection

Data were extracted from the Surveillance Epidemiology and End Results (SEER) database of the National Cancer Institute (https://seer.cancer.gov/), which covers approximately 30% of the US population. The SEER program provides information on cancer statistics to reduce cancer burden and all authors are permitted to access the original data without informed consent. The present study complied with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards.

The database, which was Incidence—SEER 18 Regs Custom Data (with additional treatment fields), Nov 2018 Sub (1975–2016 varying), was released in April 2019 and was selected as the data source for the present study. Patients with malignant colorectal cancer were extracted from the database according to the Site recode ICD-O-3/WHO 2008 of ‘Colon and Rectum’ and the Behavior recode for analysis of ‘Malignant’. The year of diagnosis was restricted between 2010 and 2016 since data involving metastatic sites were not available until 2010. The exclusion criteria were as follows: (1) colorectal cancer cases from stage I-III; (2) two or more primary tumours; (3) diagnosed at autopsy or via death certificate; and (4) cases with unknown information about demographic and clinicopathologic variables. A detailed flow-chart for patient selection is shown in Fig. 1.

Fig. 1
figure 1

The detailed flow-chart for patient selection

Demographic and clinicopathologic variables

The following demographic and clinicopathologic variables were included in the present study: age at diagnosis (< 65 and ≥ 65 years), gender (male and female), marital status (married and unmarried), race (white, black and others), insurance status (insured and uninsured), primary site (left colon, right colon and other sites), tumour grade (I to IV: well, moderately, poorly and undifferentiated, respectively), carcinoembryonic antigen (CEA) level (normal and elevated), T stage (T1, T2, T3 and T4), N stage (N0, N1 and N2), the presence of bone, brain, liver, lung metastasis (no, yes), surgery for primary site (no, yes), radiation treatment (no/unknown, yes) and the performance of chemotherapy (no/unknown, yes). According to the Primary Site – labeled, the primary site was divided into ‘Left colon’ (C18.5-Splenic flexure of colon, C18.6-Descending colon, C18.7-Sigmoid colon, C19.9-Rectosigmoid junction and C20.9-Rectum, NOS), ‘Right colon’ (C18.0-Cecum, C18.1-Appendix, C18.2-Ascending colon, C18.3-Hepatic flexure of colon and C18.4-Transverse colon) and ‘Other sites’ (C18.8-Overlapping lesion of colon and C18.9-Colon, NOS) categories.

Statistical analysis

In the present study, the total cohort was randomly subdivided into construction and validation cohorts (ratio 1:1). A construction cohort was used to identify prognostic factors for stage IV colorectal cancer patients, and a nomogram was constructed, while the validation cohort was used to validate the performance of the model. Quantitative data are described as the mean ± standard deviation (SD), while categorical variables are presented as numbers and percentages (N, %). The primary outcome was overall survival (OS), which was defined as the time from diagnosis of colorectal cancer to all causes of death. Cox proportional hazards regression was performed to identify prognostic factors. Variables with significant differences in the univariate analysis were further analysed with a multivariate analysis to determine the independent prognostic factors. Based on the prognostic factors, the nomogram was formulated using the survival package in R. Each predictor included in the nomogram was represented on one row, and a corresponding number of points was assigned to different magnitudes of the predictor. The cumulative point axis was represented at the end of the nomogram, and higher total points indicated a worse survival outcome. The discriminative ability of the model was evaluated with Harrell's concordance index (C-index) and receiver operating characteristic (ROC) curve analysis. A larger C-index value and a greater area under the curve (AUC) in the ROC curve indicated better discrimination ability. Calibration curves (1000 bootstrap resamples) were generated to evaluate the calibration ability of the nomogram.

The case listing session of the SEER*Stat 8.3.6 program was used to generate data and IBM SPSS Statistics (version 26.0, Armonk, NY, USA) was used for statistical analyses. The construction of the prognostic nomogram and subsequent validation were performed with R version 4.0.0 (R Foundation for Statistical Computing, Vienna, Austria; www.r-project.org). All statistical tests were two-sided, and P < 0.05 was considered significant.

Results

Demographic and clinicopathologic characteristics

According to the inclusion and exclusion criteria, a total of 7099 patients with stage IV colorectal cancer were included in the construction cohort. The mean age was 61.5 ± 13.7 years, with a slight predominance for male (N = 3782, 53.3%) and married (N = 3880, 54.7%) patients. The majority of the construction population was white (N = 5358, 75.8%) and insured (N = 6755, 95.2%). Tumour grade I, grade II, grade III and grade IV accounted for 5.4%, 64.5%, 25.1% and 5.0%, respectively. More than half of tumours (N = 3948, 55.6%) occurred in the left colon, while 42.3% were located in the right colon. The number of cases with elevated CEA was 5604, accounting for 78.9% in the construction cohort. T3 stage (N = 3362, 47.4%) was the most common tumour stage, followed by T4 stage (N = 2885, 40.6%), T1 stage (N = 653, 9.2%) and T2 stage (N = 199, 2.8%). The percentage of patients with lymph node metastasis was 74.2%. There were 284, 78, 5000 and 1458 patients diagnosed with bone, brain, liver and lung metastasis, respectively. Regarding the treatment strategy, nearly 80% of patients underwent surgery for colorectal tumour sites. Radiation and chemotherapy were administered to 939 and 5299 patients, respectively. Detailed information about the demographic and clinicopathologic characteristics of the validation cohort is shown in Table 1.

Table 1 Baseline demographic and clinicopathologic characteristics in the construction and validation cohort

Survival and prognostic factors of stage IV colorectal cancer

A total of 4616 patients decreased in the construction cohort and the median overall survival (OS) was 20.0 (95% CI 19.3–20.7) months. The 1-, 3- and 5-year OS rates were 64.8%, 28.7% and 15.4%, respectively. In the univariate Cox regression analysis, the following variables were associated with survival: age at diagnosis, marital status, race, insurance status, primary site, tumour grade, CEA level, T stage, N stage, the presence of bone, brain, liver, lung metastasis, surgery for primary site, radiation treatment and chemotherapy treatment. The multivariate analysis identified that age older than 65 years, unmarried status, black race, primary site on the right colon or other sites, higher tumour grade, elevated CEA level, lower T stage, higher N stage, the presence of bone, brain, liver and lung metastasis, no surgery for the primary site and no/unknown performance of chemotherapy were independent prognostic factors for worse survival. More details about the Cox proportional hazard regression are listed in Table 2.

Table 2 Cox proportional hazard regression model for analyzing the prognostic factors for colorectal cancer patients at IV stage

Construction and validation of the nomogram

As shown in Fig. 2, the nomogram for predicting 1-, 3- and 5-year survival was constructed based on the abovementioned prognostic factors. The C-index for the prediction of OS was 0.742 (95% CI 0.726–0.758), and the AUCs of the nomogram for 1-year, 3-years and 5-years were 80.8%, 76.1% and 77.0%, respectively (Fig. 3a–c). The calibration curve revealed good agreement between the predicted and observed probabilities. All calibration curves were close to the 45-degree line (Fig. 3d–f for 1-year, 3-years and 5-years, respectively).

Fig. 2
figure 2

The nomogram for predicting 1-year, 3-year and 5-year survival for colorectal cancer patients with distant metastasis

Fig. 3
figure 3

The ROC curve (ac) and calibration curve (df) for assessing the discrimination and calibration of the nomogram in construction cohort

In the validation cohort, the nomogram showed satisfactory discrimination strength. The C-index was 0.746 (95% CI 0.730–0.762), and the AUCs for 1-year, 3-year and 5-year survival were 79.9%, 77.1% and 77.0%, respectively (Fig. 4a–c). Excellent calibration ability was achieved with all calibration curves close to the 45-degree line (Fig. 4d–f for 1-year, 3-years and 5-years, respectively).

Fig. 4
figure 4

The ROC curve (ac) and calibration curve (df) for assessing the discrimination and calibration of the nomogram in validation cohort

Discussion

In the present study, the demographic and clinicopathologic characteristics of stage IV colorectal cancer were described and the survival outcome was estimated. Previous data from four national colorectal cancer registers showed that the 3-year net survival rates of stage IV CRC patients were 20.5%-33.0% and 26.7%-38.5% for colon and rectal cancer, respectively[19]. Another single institution study reported that the 5-year OS was 19.1% for stage IV CRC [12]. Our study observed similar survival rates with the previous studies.

According to the Japanese Society for Cancer of the Colon and Rectum guidelines, resectability should be first considered when making clinical decisions [20]. Systematic chemotherapy and radiotherapy are recommended in unresectable CRC cases, while palliative care is encouraged for patients with end-stage disease [20]. Despite different treatments for patients at different stages, the guidelines do not clearly state the survival estimation for each patient. Undoubtedly, accurate survival estimation is a prerequisite for selecting the aforementioned clinical strategy. A total of fourteen independent prognostic factors were identified in the current study, and a predictive nomogram was constructed based on these predictors. The nomogram presented good discrimination and calibration in the validation cohort.

In the present study, chemotherapy was one of the predictors in the constructed nomogram, which was not previously included [16,17,18]. According to the Japanese Society for Cancer of the Colon and Rectum guidelines, postoperative adjuvant chemotherapy is recommended in patients with R0 resection [20]. In a retrospective study with 37.0 months of median follow-up, OS rates were 62.1% and 40.4% for CRC patients with and without adjuvant chemotherapy, respectively [13]. Another multi-institutional analysis reported that fewer recurrences were found in patients who received preoperative chemotherapy [10]. In addition, surgery for the primary site was another independent prognostic factor for stage IV CRC. Consistent with our study, no surgery was associated with a 2.807-fold increased risk of death in a previous study [17]. Compared to conventional open surgery, laparoscopic surgery showed advantages of reduced blood loss and shorter hospital stay [21]. No survival difference was found between the two surgical methods [22]. However, information on surgical methods and associated complications was not available in the SEER database.

Tumour grade was the most sensitive predictor in the current study, which was inconsistent with a previous SEER study [18]. This discrepancy may be attributed to the different study populations. In the previous study, only stage IV CRC patients who underwent primary and metastatic resection were included. However, all stage IV CRC patients were selected for the present study. The latest study comprising 126 CRC patients with distant metastasis concluded that grade classification was an independent prognostic factor [12]. Compared to patients with differentiated histology, the hazard ratio for patients with undifferentiated histology was 3.226 (95% CI 1.558–6.711). The 5-year OS rates for patients with differentiated and undifferentiated histology were 21.1% and 11.1%, respectively [12]. There is growing evidence that the primary tumour site is associated with survival of CRC patients. The outcome of patients with tumours in the left colon was better compared to patients with tumours in the right colon [9, 23]. A retrospective SEER dataset reported that the right colon was more likely to present higher T stage and worse histology [23]. Another study concluded that there was increased expression of BRAF mutations in patients with right colon cancer, which was associated with worse survival [24]. The same trend was observed in the present study. The predictive model constructed by Hua Ge et al. indicated that survival of patients with tumour located in the rectum was better compared to patients with tumour located in other sites [17]. In the present study, the “Rectum” site was incorporated into “Left colon”. We did not specifically analyse the survival outcome for patients with a tumour location in the rectum. Compared to the survival of patients with tumour located in different sites, our study suggested that patients with tumour located on overlapping lesions or with undetermined sites exhibited the worst survival.

In the present study, T stage, N stage and the presence of metastasis were proven to be prognostic factors. Thus, these factors were selected into the nomogram. As previously reported, these factors are widely accepted in various cancer prediction models [25, 26]. Consistent with other predictive models, higher T stage, higher N stage and increased sites of metastases indicated worse survival. Furthermore, several previous studies reported that elevated CEA levels indicated poor survival in CRC patients [12, 16]. The aforementioned variables were incorporated into the predictive nomogram of CRC patients [14, 27].

There are some limitations in the current study. First, chemotherapy and radiation information in SEER is incomplete. Further study looking into chemotherapy and radiation on prognosis in stage IV colorectal cancer should be performed. Second, only internal validation was performed. The transportability of nomogram in additional patient populations should be validated in the future. Furthermore, detailed information about treatment, including surgical methods, chemotherapy regimens and targeted agents was not available in the current SEER database, and was reported to be associated with survival [22, 28]. Third, the present real-world study may overestimate the effect of the treatment (especially surgery) on the prognosis. To avoid the selection bias, a further randomized controlled trial on surgery should be performed to quantify such effect. Last but not least, cases with missing data were excluded. This may lead to the reduction of sample’s representativeness. There could be substantial missing data bias due to the amount of missing data. The results might be different if there can be more complete data. Our study developed the auxiliary modelling for prognostic prediction in stage IV colorectal cancer. Such auxiliary tool should be carefully used based on the comprehensive situation of the patients.

Conclusions

A prognostic nomogram for patients with stage IV colorectal cancer was constructed. The predictive model presented satisfactory discrimination and calibration, which can be used for survival estimation and individualized treatment decision-making in CRC patients with distant metastasis.