1 Introduction

Colorectal cancer (CRC) is a significant worldwide health problem, and the incidence and mortality are rapidly increasing [1]. The liver is the most common metastatic organ. Approximately 15% ~ 25% of CRC patients have liver metastases at the first time at diagnosis and approximately 25% of CRC patients diagnosed early eventually develop liver metastasis [2, 3]. Moreover, they have only 30 months for the 5 year median overall survival time, and liver metastasis has become the main cause of poor prognosis for CRC patients [4]. Great progress has been made in the systemic treatment of CRLM, but the prognosis of these patients is still not ideal [5, 6]. Surgery remains the preferred treatment strategy for CRLM patients. Synchronous resection for CRLM patients has been gradually applied and popularized due to its advantages of safety and minimal damage, but only 10% ~ 20% of patients are suitable for surgical resection of metastases [7,8,9]. Thus, for the majority of patients with unresectable metastatic lesions, the advantages of primary lesion surgery are gradually being fully recognized [10, 11]. It has been reported that patients undergoing primary resection can obtain more clinical benefits [12]. Therefore, the treatment and prognosis evaluation of the subgroup of CRLM patients who received primary lesion surgery will be the focus of clinical strategy implementation in the future.

The traditional TNM classification is the most mainstream prognosis evaluation strategy in malignant tumours, and it is stratified according to tumour invasion, lymph node, and organ metastasis [13]. However, this staging system does not offer a reliable prediction of survival for minority-specific patient populations. Up to now, many other prognostic indicators of clinical malignant tumours have been discovered, such as age, sex, tumour size, tumour grade, LNR, and tumour molecular markers, which may contribute considerably to individualized survival prediction. Nomography, as a useful and accessible prognostic tool for physicians, can integrate multiple independent prognostic factors and has been extensively applied to survival prediction, individualized treatment planning, and the follow-up decisions [14,15,16]. Thus, the established nomogram model has better prediction accuracy than TNM staging systems. However, few studies have employed this model to predict the survival prognosis of CRLM patients who have received local surgery thus far.

The SEER database has a large sample size, high quality, and strong statistical power, which can provide tumor-related researchers with high clinical reference value data [17]. In this study, our purpose was to develop nomograms that estimate the individualized survival probabilities of OS (the overall survival) and CSS (the cancer-specific survival) in CRLM patients treated with primary lesion surgery using data from the SEER database.

2 Methods

2.1 Patient selection

Details of CRLM patients were extracted from the SEER database from 2010 to 2015 by utilizing SEER*Stat software (version 8.3.6; http://www.seer.cancer.gov). The SEER database is an authoritative source for surveillance and analysis of various cancers based on the population that collects cancer diagnosis, treatment, and survival data covering approximately 34.6% of the population in the United States [18].

The inclusion criteria were as follows: CRLM patients confirmed by pathology (tissue biopsy) from 2010 to 2015; history of surgical resection for local lesions of CRC; adenocarcinoma histological subtypes according to the standard of the World Health Organization (WHO); primary tumour located in the colorectum; confirmation with liver metastasis; malignant behaviour based on ICD-O-3[(C18,8140/3), (C19,8140/3), (C20,8140/3)]. The exclusion criteria were as follows: primary hepatocellular carcinoma; other metastasis sites outside the liver or combined with other cancers; incomplete clinical, pathological, and follow-up data; and disease confirmed by autopsy or death certificate. All procedures performed in the present study were in accordance with the principles outlined in the 1964 Helsinki Declaration and its later amendments. These data do not require the approval and informed consent of the Institutional review board, because SEER research data is publicly available and all patient data are de-identified.

2.2 Demographics and variables

The demographic variables included sex, age, race, and marital status. The tumour-associated variables included tumour site, tumour stage, tumour grade, tumour size, number of detected lymph nodes, number of positive lymph nodes, number of primary tumours, neurological/venous invasion, CEA, and lymph node ratio (LNR, as the number of positive lymph nodes/number of detected lymph nodes, which is an independent prognostic factor for CRC [19]). The treatment-associated variables included chemotherapy and radiation performance. The survival variables included survival status, cause of death, and survival time. The TNM staging system was applied in line with the 6th edition of the AJCC.

2.3 Nomogram development

The training cohort dataset (n = 1812) from SEER was used to develop the original nomogram. Nomograms were used for visualizing the multivariate regression analysis and predicting 1-, 3-, and 5-year OS and CSS rates. OS was estimated by using the Kaplan–Meier method and compared using the log-rank test, and the difference is shown by the survival curves. The Fine-Gray competing risk model [20] was used to analyse the effect of each variable on the CSS, and the probabilities of cancer-related death and competing risk death were expressed as cumulative incidence functions (CIFs). A multivariate Cox proportional hazard regression model was used in multivariate analysis to identify the independent prognostic factors.

2.4 Validation and calibration of the nomogram

The C-index analysis and the area under the receiver operating characteristic curve (ROC) validated the precision of the prediction model in the training and validation cohorts. The concordance between the predicted and observed probabilities was determined through calibration curve. These prediction models were subjected to bootstrap internal validation. Decision curve analysis (DCA) is a novel algorithm for assessing the clinical utility of various predictive models. It can compensate any limitations of the ROC curve by representing false and true positive fractions as functions of risk thresholds. So, DCA was employed to evaluate the potential clinical application of the novel nomogram. The net benefit of the prediction model in clinical application was evaluated by decision curve analysis. Finally, based on the established models, we conduct prognostic risk stratification analysis (RSA) on patients in the training model group, and further evaluate the prognosis of different risk groups and the clinical reliability of these models.

2.5 Statistical procedures

X-tile software (https://medicine.yale.edu/lab) analysis was applied to determine the optimal grouping cut-off point and complete the transformation of continuous variables to categorical variables. The χ2 test and log-rank test were used to describe the differences between the training cohort and validation cohort in terms of demographic characteristics, tumour characteristics, and treatment characteristics. The hazard ratio (HR) and corresponding 95% confidence interval (95% CI) were calculated. P-values less than 0.05 were considered statistically significant. The statistical analyses above were performed by SPSS software (version 22.0) and R software (version 3.6.1, the R packages including rmda, rms, survival, caret, foreign, cmprsk, splines, timeROC and forestplot).

3 Results

3.1 Clinical characteristics

A total of 3622 CRLM patients who received primary lesion surgery treatment were obtained from the SEER database using the inclusion criteria and were randomly divided into a training cohort (1812 patients) and a validation cohort (1810 patients) at a ratio of 1:1. The patient selection and grouping process is shown in the flow chart (Fig. 1). In the primary cohort, the median age was 61 years old (interquartile range, 52–71 years old), including 1540 females (42.5%) and 2082 males (57.5%), and the median follow-up time was 22 months (interquartile range, 12–36 months). In the training cohort, the median age was 61 years old (interquartile range, 52–71 years old), including 776 females (42.8%) and 1036 males (57.2%), and the median follow-up time was 22.5 months (interquartile range, 12–37 months). In the validation cohort, the median age was 62 years old (interquartile range, 52–71 years old), including 764 females (42.2%) and 1046 males (57.8%), and the median follow-up time was 22 months (interquartile range, 12–35 months). The demographic characteristics and clinical characteristics of the training cohort, validation cohort, and primary cohort are shown in Table 1.

Fig. 1
figure 1

The flow chart of patient selection and grouping process

Table 1 Demographics characteristics and clinical characteristics of CRLM patients treated by primary lesion surgery in the training cohort, validation cohort, and primary cohort

3.2 Prognostic factors associated with OS

In the training cohort, the outcomes of the univariate analysis are shown in Table 2. All significant predictors of OS were assessed by multivariate Cox regression. Combined with the AIC optimization analysis, it was suggested that age, tumour grade, marital status, CEA, LNR, chemotherapy, tumour size, and primary tumour site were independent prognostic factors of OS in patients (Table 2). Based on the Kaplan–Meier and log-rank test methods, the survival curves of each major variable were plotted using the Cox risk model (Fig. 2).

Table 2 Univariate and multifactorial survival analysis of postoperative OS for patients
Fig. 2
figure 2

Survival curve analysis of the main factors affecting the postoperative prognosis of patients

3.3 Prognostic factors associated with CSS

In this study, the Fine-Gray competing risk model was used to conduct independent prognostic factor analyses of CSS, and the multifactorial analysis results with stepwise screening showed that age, tumour grade, CEA, LNR, chemotherapy, tumour size, and tumour site were independent risk factors for CSS in patients (Table 3). The evaluation index of the competing risk model used cumulative incidence function (CIF) analysis (Fig. 3).

Table 3 Multifactorial survival analysis of postoperative CSS for patients
Fig. 3
figure 3

Cumulative incidence curves of postoperative mortality risk in CRLM patients treated by primary lesion surgery in the competing risk model. Number "1" indicates that the cause of death was due to the tumor; number "2" indicates that the cause of death was due to the others

3.4 Construction of the prognostic nomogram

According to the above results, these independent prognostic factors were incorporated into the construction of the nomogram for predicting OS and CSS in the training group (Fig. 4). These nomograms showed the prognostic results at 1, 3, and 5 years for OS and CSS, and the visualization results of the prognostic models are shown in Fig. 5.

Fig. 4
figure 4

Forest plot of proportional hazards model for OS and CSS in CRLM patients treated by primary lesion surgery. A indicates the forest plot of predictors for OS; B indicates the forest plot of predictors for CSS

Fig. 5
figure 5

Nomogram prediction models for OS (A) and CSS (B) at 1, 3 and 5 years in CRLM patients treated by primary lesion surgery

3.5 Calibration and validation of the nomogram

In the training cohort and validation cohort, Harrell’s C-index of the prognostic models was [0.747 (95% CI 0.724–0.769) and 0.744 (95% CI 0.721–0.766)] for OS and [0.706 (95% CI 0.682–0.730) and 0.707 (95% CI 0.683–0.0731)] for CSS. These values were considerably greater than those of the TNM staging system, in which the C-indices were [0.614 (95% CI 0.587–0.641) and 0.626 (95% CI 0.599–0.652)] for OS and [0.609 (95% CI 0.583–0.636) and 0.622 (95% CI 0.596–0.648)] for CSS (Table 4). These results ultimately showed the highly discriminative ability to predict the absolute risk for CRLM patients treated by primary lesion surgery.

Table 4 C-indexes of the postoperative nomogram models and AJCC-TNM evaluation system for CRLM patients treated by primary lesion surgery

Upon internal validation, we found that the Area under curve (AUC) values of the prognostic nomogram for the 1 year, 3 years and 5 years OS were 0.843, 0.769, and 0.781 in the training cohort, and 0.811, 0.751, and 0.748 in the validation cohort, respectively(Fig. 6A, B); and the AUC values of the prognostic nomogram for the 1-year, 3 years and 5 years CSS were 0.839, 0.761 and 0.778 in the training cohort, and 0.812, 0.754 and 0.746 in the validation cohort, respectively (Fig. 6a, b). Meanwhile, the AUC values for the 1 year, 3 years and 5 years OS predictions in the AJCC-TNM evaluation system were found to be 0.610, 0.623 and 0.658 for the training group and 0.596, 0.620 and 0.638 for the validation group, respectively(Fig. 6C, D); the AUC values for the 1 year, 3 years and 5 years CSS predictions in the AJCC-TNM evaluation system were found to be 0.623, 0.645 and 0.649 for the training cohort and 0.606, 0.623 and 0.642 for the validation cohort, respectively (Fig. 6c, d). It is indicated the superior predictive capabilities of our nomograms.

Fig. 6
figure 6

ROC analysis for validating our proposed nomograms for 1 year, 3 years, and 5 years OS and CSS of patients with CRLM treated by primary lesion surgery. A and B show the AUC for OS in the training and validation cohort in nomogram models; C and D show the AUC for OS in the training and validation cohort in AJCC-TNM evaluation system; a and b show the AUC for CSS in the training and validation cohort in nomogram models; c and d show the AUC for CSS in the training and validation cohort in AJCC-TNM evaluation system

To reduce the appearance of overfitting in the C-index, the consistency index was corrected. The calibration curves demonstrated that the predictive power of the nomograms was in line with the actual observed values in both the training and validation cohorts for 1-, 3-, and 5 year OS and CSS (Fig. 7).

Fig. 7
figure 7

Correction curves of nomogram prediction models for postoperative OS and CSS for CRLM patients treated by primary lesion surgery in the training and validation cohort. A (a, b, c) and B (a, b, c) show the nomogram model calibration curves for 1 year, 3 year, and 5 year OS for patients in the training and validation cohort, and C (a, b, c) and D (a, b, c) show the nomogram model calibration curves for 1 year, 3 year, and 5-year CSS for patients in the training and validation cohort. The x-axes showed that the predicted survival was calculated by the nomogram, and the y-axes showed that the actual survival was calculated by the K-M method

Normally, the DCA curve was widely used to identify the clinical benefts and utility of the nomogram. The clinical net benefits of the nomogram model and the AJCC TNM staging system are presented in Fig. 8. Applying this model to the training cohort and validation cohort, across a broad range of risk thresholds, the blue line (nomogram model) manifests significantly higher net benefits compared to the red line (TNM staging system). For instance, as illustrated in Fig. 8A, at the 30% risk threshold, the net benefit was about 75% in the nomogram model and about 20% in the TNM staging system. That is to say, the nomogram models produced a higher net benefit in assisting the decision to initiate treatment for high-risk patients compared to the TNM staging system or treatment for all patients or none.

Fig. 8
figure 8

Decision curve analysis for OS (A, C) and CSS (B, D) of patients with CRLM treated by primary lesion surgery in (A, B) training cohort, and in (C, D) validation cohort.. Black horizontal line indicate that all patients died and did not receive the intervention; gray solid line indicate that no patients died and all patients received the intervention; blue solid line indicate that nomogram model; red solid line indicate that the TNM staging system model

3.6 The risk stratification analysis of patient prognosis

In order to further validate the reliability and the accuracy of these models, and to understand the survival risk of the patients in time, so as to make the diagnosis and treatment plans timely, we evaluated the prognosis of the patients through risk stratification analysis. we used the X-tile software, which divided them into low, medium, and high risk subgroups according to their total scores from the nomograms.

Specifically, in terms of OS prognosis, the low-risk group scored 5 ~ 177.5, the median age was 57 years old (interquartile range, 49–65 years old), including 467 females and 677 males, and the median follow-up time was 29 months (interquartile range, 17–43 months); the medium-risk group scored 180 ~ 260, the median age was 65 years old (interquartile range, 58–75 years old), including 172 females and 233 males, and the median follow-up time was 17 months (interquartile range, 9–27 months); and the high-risk group scored 262.5 ~ 437.5, the median age was 75 years old (interquartile range, 64–83 years old), including 137 females and 126 males, and the median follow-up time was 4 months (interquartile range, 1–13 months). Compared with the low risk group, the HR of the medium risk group is 2.62(95%CI: 2.28, 3.01), and the HR of the high risk group is 7.37(95%CI 6.31, 8.62), P < 0.01.

In terms of CSS prognosis, the low-risk group scored 0 ~ 147.5, the median age was 56 years old (interquartile range, 49–63 years old), including 377 females and 561 males, and the median follow-up time was 30 months (interquartile range, 19–44 months); the medium-risk group scored 150 ~ 242.5, the median age was 64 years old (interquartile range, 56–73 years old), including 233 females and 305 males, and the median follow-up time was 18 months (interquartile range, 10–31 months); and the high-risk group scored 245 ~ 417.5, the median age was 75 years old (interquartile range, 63–83 years old), including 124 females and 115 males, and the median follow-up time was 4 months (interquartile range, 1–13 months). Compared with the low risk group, the HR of the medium risk group is 2.57(95%CI 2.42, 2.949), and the HR of the high risk group is 8.21(95%CI 6.93, 9.72), P < 0.01. The results showed significant differences among the risk groups, indicating that the established prognostic models have high accuracy and application reliability, and the patients assigned to the low-risk subgroup showed a more favorable prognosis, emphasizing the potential for satisfactory risk stratification of the constructed nomograms (Fig. 9).

Fig. 9
figure 9

Risk score grouping and survival curve analysis according to these models. A and C show the risk score group for OS and CSS prognosis, and B and D show the survival risk of each groups for OS and CSS prognosis

4 Discussion

Liver metastasis caused by CRC has become a major global health problem and has a poor clinical prognosis [21]. Surgical treatment is still the first choice for CRLM patients, and simultaneous resection is often advocated for to improve the survival and prognosis of these patients [22, 23]. However, for patients with unresectable metastases, a treatment plan is usually negotiated by a multidisciplinary treatment team (MDT). There are many debates surrounding the treatment of primary lesions, but the methods for controlling local tumours have improved survival time in patients with unresectable liver metastases for CRC, including primary lesion resection, radiofrequency ablation (RFA), thermal ablative therapy, and microwave ablation (MWA) [24,25,26]. Therefore, for CRLM patients undergoing local surgery for primary lesions, it is necessary to acquire accurate prognostic information to help clinicians make better clinical decisions and consultations. Currently, a survival prediction model has not been established for these subgroup patients. Therefore, establishing a practical and convenient survival prediction model for individualized survival evaluation of these patients will be more significant.

A nomogram is a method to present the probability of occurrence of clinical events of statistical prediction models by graphical methods that has the characteristics of high prediction accuracy, flexibility, and simplicity [27]. Nomograms are widely applied to predict the clinical prognosis of cancer patients, such as liver cancer [28], prostate cancer [29], bladder cancer [30], and gastric cancer [31]. In this study, we screened eligible patients from the SEER database and established nomogram models to predict the OS and CSS of CRLM patients by local surgery at 1, 3, and 5 years. These nomograms were composed of a variety of independent prognostic risk factors, including age, grade, size, primary location, chemotherapy, LNR, CEA, and marital status. Compared with the nomogram models of CSS, the marital status risk indicator was included only in the OS prediction model, which highlights the importance of marital status in the evaluation of the overall risk of death of patients.

A prospective cohort study based on randomized clinical trials (RCTs) involving colorectal cancer patients revealed that the disease-free survival (DFS), recurrence-free survival (RFS), and overall survival (OS) of patients who are divorced, separated, or widowed are notably worse than those of married patient [32]. Within the context of cancer, the biological burden of chronic stress may lead to impaired tumour immune surveillance function or DNA damage, and higher adaptive load is associated with an increase in overall cancer-specific mortality rate [33,34,35]. What’s more, married patients generally exhibit a more positive outlook. The difference in the role of marital status in OS and CSS, it may be that married patients have more life and mental support, which enhances their ability to resist the risk of other diseases. So, when making cancer treatment, the marital status should be considered.

What’s more, in our study, the C-index, ROC curve, calibration curve, and decision curve analysis were applied to evaluate the accuracy of survival prediction and the net benefit in clinical application for the nomograms. First, the calibration curves showed that these nomograms had good accuracy in predicting OS and CSS at 1, 3, or 5 years in both the training and the validation cohorts. Then, the C-index analysis was used to validate the precision of many prediction models, and it assignments greater than 0.7 are widely considered of practical importance. In the study, the C-indices of the nomogram for OS were 0.747 (95% CI 0.724–0.769) in the training cohort and 0.744 (95% CI 0.721–0.766) in the validation cohort. It was superior to the TNM staging system, with C-indices of 0.614 (95% CI 0.587–0.641) in the training cohort and 0.626 (95% CI 0.599–0.652) in the validation cohort. The C-index of the nomogram of CSS was 0.706 (95% CI 0.682–0.730) in the training cohort and 0.707 (95% CI 0.683–0.0731) in the validation cohort. It was superior to the TNM staging system, in which the C-indices were 0.609 (95% CI 0.583–0.636) in the training cohort and 0.622 (95% CI 0.596–0.648) in the validation cohort. Besides, we found that the area under curve (AUC) values of the prognostic nomogram for the 1 year, 3 years and 5 years OS were 0.843, 0.769, and 0.781 in the training cohort, and were 0.811, 0.751, and 0.748 in the validation cohort. The AUC values of the prognostic nomogram for the 1 year, 3 years and 5 years CSS were 0.839, 0.761 and 0.778 in the training cohort, and were 0.812, 0.754 and 0.746 in the validation cohort, respectively. Which all have higher assignments than the AJCC-TNM evaluation system, it is indicated the superior predictive capabilities of our nomograms. In addition, the DCA curves showed that both nomogram prediction models showed a higher net clinic prognostic benefit than the TNM staging system. In order to further clarify the clinical reliability and accuracy of these models, we perform risk scoring on patients in the training cohort, and set high risk, medium risk and low risk groups. The results showed that the risk stratified groups had significant survival differences, indicating that these models have high clinical accuracy and application reliability. It will help to timely identify and judge the survival risk of patients, and make a diagnosis and treatment plan. Above all, these results clearly demonstrated that the established nomograms could successfully predict OS and CSS in CRLM patients with primary lesion surgery at 1, 3 and 5 years and had good discrimination and calibration capabilities. Our study provides the first nomograms that can predict OS and CSS in CRLM patients with unresectable after primary lesion surgery using a large population-based database. These prognostic models provide instruments to inform clinical decisions, such as patient stratifications, consultations, and therapeutic suggestions.

Despite the given advantages of the nomograms presented in this study, there are still some limitations related to this study. First, the nomograms were developed based on retrospective data from the SEER database, and many incomplete cases regarding the population characteristics and pathological information were selectively deleted, which may have resulted in selection bias. Second, the SEER database lacks data about other important prognostic factors, such as the tumour metastasis time, metastasis lesion size, number and therapy of metastatic lesions, specific chemotherapy regimens, comorbidities, and some molecular indicators. In addition, we randomly divided the data selected from the SEER database into the training cohort and validation cohort at a ratio of 1:1. Although this is a common method of nomogram construction and validation, there is no available external verification. Hence, we will still pay attention to the data from multiple medical centres to conduct external cohort validation in the future.

5 Conclusions

We developed population-based nomogram prognostic models that can provide individual predictions of OS and CSS for CRLM patients undergoing primary lesion surgery. Providing statistical, usable, and patient-friendly tools will help clinicians more precisely and effectively establish personalized treatment strategies and determine survival prognosis for CRLM patients undergoing primary lesion surgery.