Introduction

Colorectal cancer (CRC) has become the third most common cancer and the second leading cause of cancer-related mortality worldwide. Colon cancer (CC) accounts for a large proportion of CRC [1, 2]. The increased implementation of screening has resulted in an increase in the number of newly diagnosed patients with early-stage CC [3]. Although this is generally considered to provide an opportunity for curative-intent treatment, the prognosis of some patients remains poor. Of particular interest, the incidence of early-onset CRC (defined as CRC occurring under the age of 50 years) has been increasing in many countries [4,5,6]. This has resulted in a heavy cancer burden in younger adults. Hence, predicting the prognosis of these patients warrants investigation.

The Tumour, Node, Metastasis (TNM) staging system is regarded as providing a helpful prognostic index for CC patients, being useful for predicting their clinical outcomes from the point of view of tumor biology and anatomy [7]. Even so, it may not be the optimal prognostic indicator. The roles of other risk factors, such as race [8], age [9], sex [10], tumor site [11], tumor size, and chemotherapy administered [12], that affect the prognosis of CC patients should not be ignored. In other words, it is necessary to use a combination of possible influencing factors to predict the survival of cancer patients more accurately.

The Surveillance, Epidemiology, and End Results (SEER) database contains much information about cancer-related risk factors and patients’ survival. It is crucial to synthesize this information wisely. Nomograms, being a statistical prognostic model, can integrate diverse biologic and clinical variables to generate an individual’s probability of experiencing a clinical event, thus facilitating achieving the goal of providing personalized medicine [13]. To the best of our knowledge, no researchers have used data drawn from the SEER database to construct a nomogram model for predicting the prognosis of patients with early-onset stage I–II CC.

In this study, we aimed to establish a novel model that includes multiple variables and thus more accurately predicts the survival of patients with early-onset, early-stage CC. This nomogram should enable clinicians to make better treatment decisions for such individuals.

Methods

The data were obtained from the SEER Program, which is dedicated to collecting and providing cancer statistics with the aim of reducing the cancer burden in the USA. We used data collected from 2012 to 2015. These data included baseline patient and tumor characteristics and survival information. The inclusion criteria for this study were (a) age under 50 years; (b) surgery performed; (c) postoperative pathological diagnosis of stage I–II CC without distance metastasis; and (d) ≥ 12 regional nodes examined. The exclusion criteria were (a) no prior tumor; (b) unknown histological grade; (c) unknown marital status; (d) unknown race; (e) death from other tumors and unknown cause of death; and (f) survival time recorded as zero. Ultimately, our study cohort comprised 3528 patients with early-onset stage I–II CC.

These following variables were extracted and included in the analysis: baseline patient characteristics (race, sex, age at diagnosis, survival [months], marital, and vital status), tumor features (tumor site, pathological grade, tumor size, TNM stage, and T stage), and treatment strategy (chemotherapy). Staging was in accordance with the seventh edition of the American Joint Committee on Cancer (AJCC) TNM classification. Race was classified as white, black, or other. Sex was stratified as male or female. Two age groups were created: ≤ 35 and > 35 years. Pathological grades I–IV were categorized as well differentiated, moderately differentiated, poorly differentiated, and undifferentiated. Additional study variables comprised tumor site (left or right side), chemotherapy (no or yes), marital status (married or unmarried), tumor size (≤ 5 cm or > 5 cm) and T stage (T1, T2, T3, T4). Overall survival (OS) time was defined as the time from diagnosis to death from any cause.

All eligible patients were randomly allocated to a training (n = 2469) or validation group (n = 1059) in a 7:3 ratio. The training group was used to construct the nomogram and the validation group for validation. Univariate and multivariate regression analysis were applied to identify the factors that significantly affected the patients’ OS (p < 0.05). The nomogram model was created using R software (version 3.6.1) and the identified significant variables. The performance and predictive accuracy of the nomogram were evaluated by the concordance index (C-index). The C-index ranges from 0.5 to 1.0, where the larger the value, the more accurately the nomogram model predicts outcomes. Calibration plots were drawn at 3 and 5 years to compare the predicted with the actual OS. Decision curve analyses (DCA) were performed to evaluate the clinical practicability of the nomogram. The median score calculated from the nomogram among the training cohort was set as the cutoff value. Thus, all eligible patients were classified into two groups (low versus high score). Kaplan–Meier curves and the log-rank test were used to compare the OS between groups. We used IBM SPSS Statistics, Version 25.0 (SPSS) to perform all univariate and multivariate regression analyses and constructed the graphs using R software and related packages. P values less than 0.05 were considered to denote statistical significance.

Results

Patient’s baseline characteristics

The patients’ baseline characteristics are summarized in Table 1. A total of 3528 patients with early-onset stage I–II CC were included in our study: 2469 patients in the training cohort and 1059 in the validation cohort. There were no significant differences in assessed characteristics between the two groups (all p > 0.05). In the entire cohort, 52% of patients (n = 1834) were male, 89.5% (n = 3159) were aged > 35 years, 73.5% (n = 1834) were white, and 55.8% (n = 1969) were married. More than half the patients had tumors bigger than 5 cm and located on the left side. The cancers were pathological grades I/II in 3090 (87.6%) and stage T3/T4 in 2046 (58.0%) patients, and 763 patients (21.6%) had received chemotherapy.

Table 1 Baseline characteristics of patients in the training and validation cohorts

Identification of significant prognostic factors by univariate and multivariate analysis

The results of univariate and multivariate analysis in the training cohort are shown in Table 2. Univariate analysis identified race, age, marital status, tumor grade, tumor size, T stage, and chemotherapy as significant predictors of OS (all p < 0.05). Multivariate analysis of these factors identified race, marital status, and T stage as independent prognostic factors. Accordingly, these variables were used to construct the nomogram model.

Table 2 Results of univariate and multivariate analysis of potential prognostic factors in the training cohort

Construction and validation of the nomogram

In accordance with the results of multivariate analysis, race, marital status, and T stage were used to build a nomogram for predicting the 3- and 5-year OS (Fig. 1). Each predictor was assigned a score, ranging from 0 to 100. The nomogram showed that T stage was the dominant contributor to the OS, followed by race and marital status. Total scores for specific patients were calculated by adding the scores for each variable. The chances of 3- and 5-year OS were obtained by drawing a vertical line through the location of the total score on the horizontal axis. The C-index of the nomogram for the training cohort was 0.724. The calibration curves showed good consistency in the probability of 3- and 5-year OS between the observed and nomogram-predicted outcomes in the training cohort (Fig. 2A, B). Further, the DCA curves for the training cohort showed that the nomogram model was practical and effective (Fig. 3A). We then used the same procedure to verify the nomogram model in the validation cohort. The C-index in the validation cohort was 0.692. Likewise, the calibration curves (Fig. 2C, D) and the DCA curves (Fig. 3B) in the validation cohort showed that the nomogram was robust and applicable.

Fig. 1
figure 1

Nomogram for predicting overall survival of patients with early-onset stage I–II colon cancer

Fig. 2
figure 2

Calibration curves predicting 3- and 5-year OS in the training and validation group. A Calibration curve predicting 3-year OS in the training group. B Calibration curve predicting 5-year OS in the training group. C Calibration curve predicting 3-year OS in the validation group. D Calibration curve predicting 3-year OS in the validation group.OS, overall survival

Fig. 3
figure 3

Results of decision curve analysis of OS-associated nomogram in training and validation groups. A Results of decision curve analysis curve of 5-year OS in the training group. B Results of decision curve analysis curve of 5-year OS in the validation cohort. OS, overall survival

Comparison of survival differences between groups with different scores based on the nomogram

After determining that the nomogram had good predictive value, we wanted to distinguish the patients’ OS according to their scores. Accordingly, we stratified the patients into two groups based on the cutoff value, that is, the median of the total scores in the training cohort. In the training cohort, patients with low-risk scores (score < 73.15) had a better OS than those with high-risk scores (score ≥ 73.15) (P < 0.001) (Fig. 4A). Likewise, we determined that the survival curves differed significantly in the validation set (p < 0.001) (Fig. 4B).

Fig. 4
figure 4

Survival curves of OS for risk classification based on the nomogram risk score. A In the training group. B In the validation group. OS, overall survival

Discussion

As is well known, the incidence of early-onset CRC is on the rise. The reasons for this trend remain unclear. Moreover, some patients with early-stage disease do not achieve a satisfactory outcome despite undergoing surgery. We therefore selected eligible patients from the SEER database with the aim of developing and validating a prognostic nomogram model for patients with early-onset stage I–II CC and established that this nomogram has good prognostic value.

In our study, univariate and multivariate analysis identified T stage, race, and marital status as the most significant predictors of OS. It is well established that, in patients with early-stage solid tumors without lymph node or distant metastases, the T stage of the TNM staging system makes a major contribution to determining prognosis [14, 15]. Previous research has shown that T stage is an independent predictor among many variables in patients with CRC. That is, the higher the T stage, the lower the survival rate [8, 16]. Li et al. found that the T stage has greater weight than the N stage in the TNM staging system for CRC; that is, the T stage affects survival from CRC more significantly than does the N stage [17]. Consistent with this, according to our nomogram, of the studied variables, T stage had the greatest impact on OS. In other words, the higher the T stage, the worse the OS.

In addition, our nomogram identified that race is significantly associated with survival, patients in the “other” category having a higher survival rate than those categorized as white or black. Previous research on advanced CC has had similar results [18]. However, a SEER-based study on early hepatocellular carcinoma found that those categorized as white have better survival rates than those categorized as black or other [19]. We speculate that this discrepancy may be related to factors such as the type of cancer, genetics and genomic context of different selected patients.

Another significant variable identified by our nomogram was marital status; this is consistent with the findings of other studies that married patients have survival advantages [20, 21]. We also found that married patients have a higher chance of survival than unmarried patients. A stable family may provide better care and psychological support, enhancing quality of life and improving survival.

The prognostic risk of patients with early-onset early-stage CC can be quantified relatively on the basis of these three variables. To our knowledge, few studies have focused on and explored this question. However, variables not included in the model should not be ignored. They may also affect prognosis under certain conditions that are yet to be determined [12, 22].

Our study had some limitations. First, it was retrospective; the data came from a public database and had not been validated in the real world. Second, some potentially relevant details, such as molecular markers, molecular pathological features of tumor, surgical procedures, inflammatory and tumor indicators, and specifics of postoperative treatment, were not available, possibly resulting in bias. Finally, the nomogram and risk classification system should be further verified in another institution.

Conclusions

In this paper, we identified predictors of prognosis and used them to develop a useful a nomogram model for predicting the OS of patients with early-onset, stage I–II CC. This nomogram has the potential to help clinicians make treatment decisions. However, external validation is still required.