Introduction

Among all malignant tumors, colorectal cancer (CRC) has the third-highest incidence rate and the second-highest mortality rate [1]. Distant metastasis is the main cause of death for those affected by CRC and, currently, around half of all CRC patients are diagnosed at stage IV (metastatic colorectal cancer, mCRC) [2, 3]. Subsequently, considerable financial investment has been made to develop treatments that can both screen for and reduce the incidence of cancer mortality [4]. Notably, mCRC has a better prognosis than other gastrointestinal metastatic cancers [1]. However, CRC is a heterogeneous disease and prognosis varies between patients, so the development of prognostic risk models could further improve treatment strategies for stratified management. Most prognostic risk models currently available have been built using data from the whole CRC population or from those with postoperative stage II and stage III CRC [5,6,7], thus a new prediction model for mCRC is required.

The American Joint Committee on Cancer (AJCC) system of tumor (T), nodes (N), and metastases (M) is a widely accepted approach for classifying cancer risk in patients with colorectal cancer [8]. Resected specimens are examined for a pathological stage (pTNM), while radiographic and endoscopic examinations are used to assign a clinical stage (cTNM). In addition, a post-neoadjuvant pathological stage (ypTNM) is employed to further stratify patients who have had systemic or radiation treatment prior to surgery for a colorectal primary tumor. Given its substantial influence on prognosis, recurrence, survival, and clinical management, the size of solid tumors in the TNM staging system is considered a valid indicator of its prognostic relevance [9]; however, there is debate as to its efficacy in assessing the risk of colorectal cancer [10, 11]. In addition to spreading horizontally, CRC can also invade deeply into the layers of the colon wall. The depth of tumor invasion, rather than tumor size, is employed in the current AJCC colorectal cancer T stage. Retrospective studies have also shown a direct link between tumor size or maximum horizontal tumor diameter and improved survival in CRC patients [12,13,14,15,16].

Even though the majority of these studies employed small sample sizes, further research is necessary owing to the potential for tumor size to serve as an indicator of CRC survival. This study employed the largest cohort to date to explore the effect of tumor size on mCRC patients.

Materials and methods

Patients’ selection

After being granted access to study data files with the reference number 12271-Nov2019, information on patients with newly diagnosed CRC from 2010 to 2015 was taken from the Surveillance, Epidemiology and End Results (SEER) database. The SEER data collection (https://seer.cancer.gov/seerstat/), sponsored by the National Cancer Institute, contains information on the incidence and survival characteristics of malignancies among 26% of the population and 18 cancer registries in the USA.

Overall survival (OS) information was also collected, together with clinicopathological characteristics and pertinent therapy details. The following criteria were met by the patients: (1) patients had to be between the ages of 18 and 75; (2) they had to have stage IV CRC pathologically determined; and (3) CRC was the only primary cancer. Patients with insufficient staging, missing metastatic information, diagnoses made only through autopsy, and patients whose prognosis was unknown were also disqualified. In addition, the tumor size indicated in this study was the primary tumor size.

Statistical analysis

X-tile 3.6.1, SPSS 26.0, and R 4.1.1 were the application software used in this study’s statistical analysis. Baseline characteristics were shown. Categorical data were given as frequencies with percentages, while continuous variables were expressed as the median [IQR]. Categorical data were compared using a χ2 test. The Cox regression model was used to analyze the survival difference, and then a nomogram model was created using the results of cox regression. The nomogram’s discrimination power was confirmed by the receiver-operator characteristic (ROC) curve, which also exhibited prediction and decision curve analysis (DCA) of nomogram for predicting patients’ OS at 1, 3, and 5 years. All statistical analyses were conducted bilaterally, and a P value of 0.05 or less was regarded as statistically significant.

Results

Baseline demographic and clinicopathological characteristics

The baseline demographic and clinicopathological characteristics of the study population are shown in Table 1. A total of 7995 patients were included in our cohort (Fig. 1). The median age of these patients was 57.63 (IQR 54.42–60.84). 58.5% (n = 5268) were male, and nearly third of fourths were white (n = 5982, 74.8%). More patients had tumor in left colon (n = 3132, 39.2%), while others were in right colon (n = 1417, 17.7%), transverse colon (n = 590, 7.4%) and rectum (n = 2856, 35.7%). In this cohort, 72.6% (n = 5801) were histopathologically moderately differentiated and 94.6% (n = 7567) were adenocarcinoma. Radiation was given to 16.5% (n = 1321) of all patients, and 83.2% (n = 6652) of all patients accepted chemotherapy. 82.4% (n = 6591) of all patients showed CEA-positive results. Among all patients with distant metastases, 75.4% (n = 6031) of patients had liver metastasis, 20.2% (n = 1614) of patients had lung metastases, 3.5% (n = 280) had bone metastases, and 0.9% (n = 70) had brain metastases. The training cohort and validation cohort were then divided at a ratio of 7:3, and as there were no statistically significant differences between the 2 patient groups (P > 0.05), they were considered comparable (Table 2).

Table 1 Baseline clinicopathologic characteristics of mCRC patients with different metastases
Fig. 1
figure 1

Flow chart of patients’ selection from SEER database. SEER Surveillance, Epidemiology, and End Results

Table 2 Clinicopathologic characteristics of training cohort (n = 5597) and validation cohort (n = 2398)

Metastasis characteristics based on different tumor sizes and metastasis sites

The best cutoff value calculated by X-tile was < 4.6 cm, 4.6–6.9 cm and > 6.9 cm, as shown in Fig. 2A; liver metastases were the most common metastatic site among the three groups. As the tumor size was large, the proportion of bone metastases and brain metastases increased. Brain metastases were the least frequent metastatic pattern among tumor size groups. As shown in Fig. 2B, in the liver, lung and bone metastases group, the proportion with tumor size < 4.6 cm was obviously higher than those with tumor size ≥ 4.6 cm while brain metastases group was more likely have a primary tumor size between 4.6 and 6.9 cm. The results showed that the tumor size of liver metastases and lung metastases was shown in the relatively similar tumor size, that the bone metastases had the smallest tumor size, and that the brain metastases had the largest tumor size after using x-tile to look for the most effective cutoff value of tumor size in specific metastases. The K–M curves showed that the patients’ prognosis was worse as tumor size increased (Fig. 3).

Fig. 2
figure 2

Percentage and cases of each distant metastases site. A, C Different tumor size based on different metastases sites. B, D Different metastases sites based on different tumor size

Fig. 3
figure 3

Kaplan–Meier survival curves drawn by different cutoff value calculated by X-tile of all mCRC patients (A) and patients with liver metastases (B), lung metastases (C), bone metastases (D) and brain metastases (E)

Prognostic factors affecting the OS of patients

According to univariate Cox analysis (Table 3), age, race, primary site, tumor size, grade, histology, T stage, N stage, radiation, chemotherapy, CEA level and metastases site were prognostic factors affecting OS in training cohort (all P < 0.05). The results of multivariate Cox analysis showed that radiation was not an independent prognostic factor while other 11 factors were effective (all P < 0.05).

Table 3 Univariate and multivariate Cox regression analyses for overall survival in training cohort

Construction of nomogram prediction model

Based on the selected independent prognostic factors affecting patients’ OS, we constructed nomogram model to predict patient OS (Fig. 4). In patients with mCRC, nomogram predicted OS at 1, 3, and 5 years. All factors were given a score between 0 and 100 based on the amount they contributed to the nomogram. The scores for each category were added to create a final score for each patient, with chemotherapy, T stage, and metastases site showing the greatest effects on prognosis.

Fig. 4
figure 4

Nomogram predicting 1-, 3-, and 5-year OS rates of mCRC patients

Verification of the nomogram prediction model

ROC curve of training cohort’s nomogram was drawn to predict the OS of patients at 1, 3, and 5 years, and then calculated their AUC values to be 0.766, 0.726, and 0.746 (Fig. 5A). At the same time, the AUC values of validation cohort at 1, 3, and 5 years were 0.772, 0.717, and 0.734 (Fig. 5B). The TNM staging-based ROC curve was then built in training cohort and validation cohort, and the AUC values for the 1-year period were 0.608 (Fig. 5C) and 0.616 (Fig. 5D), respectively. These result showed that the prediction model developed in this study was advantageous to the model based on the TNM staging system. In addition, the calibration diagrams for the two cohorts’ prediction curve and ideal curve fit together well, indicating the model’s high degree of accuracy (Fig. 6).

Fig. 5
figure 5

ROC curves and AUC values for 1-, 3- and 5-year overall survival predictions in training cohort (A) and validation cohort (B); ROC curve of training cohort (C) and validation cohort (D) compared between the prognostic model and TNM staging model and other prognostic factors in predicting OS

Fig. 6
figure 6

Calibration plots of the nomogram describing 1-, 3- and 5-year OS in training cohort (AC) and validation cohort (DF)

Decision curve analysis

Decision curve analysis (DCA) was performed at 1, 3 and 5 years of OS in training cohort (Fig. 7A) and validation cohort (Fig. 7B). In both data cohorts, the nomogram demonstrated good clinical value, with a greater clinical utility value in predicting OS at 3 and 5 years and only moderately less at 1-year OS.

Fig. 7
figure 7

The decision curve analysis (DCA) of nomogram for predicting patients’ OS at 1, 3, and 5 years in training cohort (A) and validation cohort (B)

Discussion

CRC is one of the most common malignant tumors globally, accounting for 10% of newly diagnosed cases of malignancy [1]. The TNM staging system proposed by AJCC is based on the depth of local invasion, without taking into consideration the size of the primary tumor. Some studies have suggested that tumor size may be a useful supplement to the TNM staging system to enhance the accuracy of prognostication for CRC [12,13,14,15,16]. Compared to invasion depth, the tumor size can be obtained via imaging examinations, which are simple, safe, non-invasive and accurate. However, there is no unified optimal cutoff value to stratify tumor size. Shiraishi et al. [17] analyzed 95 patients with T4 stage CRC and determined the optimal threshold of 5.0 cm for tumor size via ROC curves, confirming that tumor size was significantly associated with prognosis. Deng et al. [18] analyzed clinical pathological parameters of 1250 colorectal cancer hepatopulmonary metastases (CRCHPM) patients and constructed a reliable model with 7 independent prognostic factors. The tumor size threshold selected in the study was 5.5 cm and it was observed that the larger the tumor, the poorer the prognosis and the greater the risk of liver and lung metastasis. Saha et al. [19] conducted a subgroup analysis of 300,386 CRC patients with thresholds of 2.0, 4.0, and 6.0 cm, and found that the 5-year survival rate decreased significantly with the increase in tumor size, and this trend was observed regardless of lymph node metastasis. Another study that stratified the metastatic potential and risk of disease recurrence of 1538 CRC patients showed that tumor size exceeding 5.0 cm was associated with an increased rate of disease recurrence [20]. Our study results confirmed previous findings, although in our overall study population, tumor size had less significant impact on prognosis compared to TNM staging, significant findings were observed across different AJCC stages. Compared to smaller primary tumors, those over 6.9 cm in mCRC patients were associated with worse survival rates independent of other variables. One possible explanation for the negative effect of tumor size on T stage may be the inaccuracy of tumor size calculation. It is conceivable that, for tumors with advanced stages, accurately measuring tumor size is difficult as the invasion extent of the bowel wall may be larger than the maximal diameter of the cancerous extent on the mucosa. Thus, the maximal horizontal diameter may not accurately reflect the extent of tumor growth with advanced infiltration. On the other hand, for tumors confined to the submucosa, growth into the lumen may be the main pattern, and the tumor size measured at this stage may be the dominant index that depicts the growth of tumor. Nevertheless, we believe there are other potential causes for the impact of tumor size that need further investigation.

The distant metastasis of tumors, including CRC, is thought to comprise a multistep process of three phases: (1) Invasion phase: in situ tumor cells increase their invasiveness through epithelial–mesenchymal transition (EMT) processes, penetrating the surrounding tissues and migrating to the blood vessels or lymphatic vessels, creating circulating tumor cells (CTCs); (2) circulation phase: platelets adhere directly to the surface of CTCs, creating micro-thrombi structures that reduce the recognition and clearance of the immune system; (3) settlement phase: CTCs settle in distant organs, forming a pre-metastatic niche, a microenvironment characterized by immune suppression, enabled by the secretion of cytokines or exosomes from the primary tumor site [21]. The classical metastasis model suggests that distant metastasis may take place with the development of time and/or tumor size increase [22], which is in accord with the insights from this study that primary tumors in the colon or rectum kept growing, becoming more invasive, accumulating damaging mutations and eventually gaining the capacity for distant metastasis, leading to mCRC. Some have postulated that, similar to breast cancer, mCRC belongs to an early dissemination model [23], which we refer to as the “genius tumor”, as these tumors possess metastasis-related mutant alleles at an early stage. However, due to the lack of multi-omics molecular characterization of CRC in this study, it was impossible to distinguish between patients belonging to the growth-accumulation model and those to the early dissemination model. We also observed that the larger the tumor size of bone or brain metastatic patients, the poorer the prognosis, though the P value was over 0.05. This might be due to the fact that these patients have a poor prognosis themselves, for example, the prognosis of brain metastatic patients is a median survival time of 3–6 months upon diagnosis and no effective therapy [24].

The relationship between tumor size and prognosis of CRC patients has long been of great concern to clinicians. This study, based on a large-scale clinical data analysis, found that tumor size is an independent prognostic factor for mCRC patients. Nevertheless, certain limitations of the study could not be overlooked. Despite the large size of SEER database, our stratification by tumor size and metastatic site resulted in relatively small subgroups, reducing statistical power to detect small differences. This may help explain why we failed to detect significant associations between tumor size and metastatic site in the subgroups of bone and brain metastatic patients. In addition, information on missing molecular characteristics such as BRAF mutation status, microsatellite status, adjuvant therapy, or pathology techniques was not included in the SEER database. Furthermore, although this study was conducted in large populations, the proportion of Asian patients was still too small, thus necessitating further multicentral research to validate the results of this study.

Conclusion

Our findings suggested that patients with mCRC who had lung or liver metastases have a poorer prognosis when their primary tumors are larger. We developed an accurate prognostic risk assessment model for such patients, allowing them to estimate their overall score via a nomogram and calculate their chances of survival. This plays a significant role in providing clinical guidance. Further prospective research is necessary to determine the role of tumor size in clinical staging models for management selection.