Prognostic impact of tumor size on patients with metastatic colorectal cancer: a large SEER-based retrospective cohort study

Given the poor prognosis of metastatic colorectal cancer (mCRC), this research aimed to investigate the correlation between tumor size and prognosis, and develop a novel prediction model to guide individualized treatment. Patients pathologically diagnosed with mCRC were recruited from the Surveillance, Epidemiology, and End Results (SEER) database between 2010 and 2015, and were randomly divided (7:3 ratio) into a training cohort (n = 5597) and a validation cohort (n = 2398). Kaplan–Meier curves were used to analyze the relationship between tumor size and overall survival (OS). Univariate Cox analysis was applied to assess the factors associated with the prognosis of mCRC patients in the training cohort, and then multivariate Cox analysis was used to construct a nomogram model. The area under the receiver-operating characteristics curve (AUC) and calibration curve were used to evaluate the predictive ability of the model. Patients with larger tumors had a worse prognosis. While brain metastases were associated with larger tumors compared to liver or lung metastases, bone metastases tended to be associated with smaller tumors. Multivariate Cox analysis revealed that tumor size was an independent prognostic risk factor (HR 1.28, 95% CI 1.19–1.38), in addition to the other ten variables (age, race, primary site, grade, histology, T stage, N stage, chemotherapy, CEA level and metastases site). The 1-, 3-, and 5-year OS nomogram model yielded AUC values of more than 0.70 in both the training and validation cohorts, and its predictive performance was superior to that of the traditional TNM stage. Calibration plots demonstrated a good agreement between the predicted and observed 1-, 3-, and 5-year OS outcomes in both cohorts. The size of primary tumor was found to be significantly associated with prognosis of mCRC, and was also correlated with specific metastatic organ. In this study, we presented the first effort to create and validate a novel nomogram for predicting 1-, 3- and 5-year OS probabilities of mCRC. The prognostic nomogram was demonstrated to have an excellent predictive ability in estimating individualized OS of patients with mCRC.


Introduction
Among all malignant tumors, colorectal cancer (CRC) has the third-highest incidence rate and the second-highest mortality rate [1]. Distant metastasis is the main cause of death for those affected by CRC and, currently, around half of all CRC patients are diagnosed at stage IV (metastatic colorectal cancer, mCRC) [2,3]. Subsequently, considerable financial investment has been made to develop treatments that can both screen for and reduce the incidence of cancer mortality [4]. Notably, mCRC has a better prognosis than other gastrointestinal metastatic cancers [1]. However, CRC is a heterogeneous disease and prognosis varies between patients, so the development of prognostic risk models could further improve treatment strategies for stratified management. Most prognostic risk models currently available have been built using data from the whole CRC population or from those with postoperative stage II and stage III CRC [5][6][7], thus a new prediction model for mCRC is required.
The American Joint Committee on Cancer (AJCC) system of tumor (T), nodes (N), and metastases (M) is a widely accepted approach for classifying cancer risk in patients Qi Zhang and Baosong Li contributed to this work equally and should be regarded as the co-first authors. with colorectal cancer [8]. Resected specimens are examined for a pathological stage (pTNM), while radiographic and endoscopic examinations are used to assign a clinical stage (cTNM). In addition, a post-neoadjuvant pathological stage (ypTNM) is employed to further stratify patients who have had systemic or radiation treatment prior to surgery for a colorectal primary tumor. Given its substantial influence on prognosis, recurrence, survival, and clinical management, the size of solid tumors in the TNM staging system is considered a valid indicator of its prognostic relevance [9]; however, there is debate as to its efficacy in assessing the risk of colorectal cancer [10,11]. In addition to spreading horizontally, CRC can also invade deeply into the layers of the colon wall. The depth of tumor invasion, rather than tumor size, is employed in the current AJCC colorectal cancer T stage. Retrospective studies have also shown a direct link between tumor size or maximum horizontal tumor diameter and improved survival in CRC patients [12][13][14][15][16].
Even though the majority of these studies employed small sample sizes, further research is necessary owing to the potential for tumor size to serve as an indicator of CRC survival. This study employed the largest cohort to date to explore the effect of tumor size on mCRC patients.

Patients' selection
After being granted access to study data files with the reference number 12271-Nov2019, information on patients with newly diagnosed CRC from 2010 to 2015 was taken from the Surveillance, Epidemiology and End Results (SEER) database. The SEER data collection (https:// seer. cancer. gov/ seers tat/), sponsored by the National Cancer Institute, contains information on the incidence and survival characteristics of malignancies among 26% of the population and 18 cancer registries in the USA.
Overall survival (OS) information was also collected, together with clinicopathological characteristics and pertinent therapy details. The following criteria were met by the patients: (1) patients had to be between the ages of 18 and 75; (2) they had to have stage IV CRC pathologically determined; and (3) CRC was the only primary cancer. Patients with insufficient staging, missing metastatic information, diagnoses made only through autopsy, and patients whose prognosis was unknown were also disqualified. In addition, the tumor size indicated in this study was the primary tumor size.

Statistical analysis
X-tile 3.6.1, SPSS 26.0, and R 4.1.1 were the application software used in this study's statistical analysis. Baseline characteristics were shown. Categorical data were given as frequencies with percentages, while continuous variables were expressed as the median [IQR]. Categorical data were compared using a χ 2 test. The Cox regression model was used to analyze the survival difference, and then a nomogram model was created using the results of cox regression. The nomogram's discrimination power was confirmed by the receiver-operator characteristic (ROC) curve, which also exhibited prediction and decision curve analysis (DCA) of nomogram for predicting patients' OS at 1, 3, and 5 years. All statistical analyses were conducted bilaterally, and a P value of 0.05 or less was regarded as statistically significant.

Metastasis characteristics based on different tumor sizes and metastasis sites
The best cutoff value calculated by X-tile was < 4.6 cm, 4.6-6.9 cm and > 6.9 cm, as shown in Fig. 2A; liver metastases were the most common metastatic site among the three groups. As the tumor size was large, the proportion of bone metastases and brain metastases increased. Brain metastases were the least frequent metastatic pattern among tumor size groups. As shown in Fig. 2B, in the liver, lung and bone metastases group, the proportion with tumor size < 4.6 cm was obviously higher than those with tumor size ≥ 4.6 cm while brain metastases group was more likely have a primary tumor size between 4.6 and 6.9 cm. The results showed that the tumor size of liver metastases and lung metastases was shown in the relatively similar tumor size, that the bone metastases had the smallest tumor size, and that the brain metastases had the largest tumor size after using x-tile to look for the most effective cutoff value of tumor size in specific metastases. The K-M curves showed that the patients' prognosis was worse as tumor size increased (Fig. 3).

Prognostic factors affecting the OS of patients
According to univariate Cox analysis (Table 3), age, race, primary site, tumor size, grade, histology, T stage, N stage, radiation, chemotherapy, CEA level and metastases site were prognostic factors affecting OS in training cohort (all P < 0.05). The results of multivariate Cox analysis showed that radiation was not an independent prognostic factor while other 11 factors were effective (all P < 0.05).

Construction of nomogram prediction model
Based on the selected independent prognostic factors affecting patients' OS, we constructed nomogram model to predict patient OS (Fig. 4). In patients with mCRC, nomogram predicted OS at 1, 3, and 5 years. All factors were given a score between 0 and 100 based on the amount they contributed to the nomogram. The scores for each category were added to create a final score for each patient, with chemotherapy, T stage, and metastases site showing the greatest effects on prognosis.

Verification of the nomogram prediction model
ROC curve of training cohort's nomogram was drawn to predict the OS of patients at 1, 3, and 5 years, and then calculated their AUC values to be 0.766, 0.726, and 0.746 (Fig. 5A). At the same time, the AUC values of validation cohort at 1, 3, and 5 years were 0.772, 0.717, and 0.734 (Fig. 5B). The TNM staging-based ROC curve was then built in training cohort and validation cohort, and the AUC values for the 1-year period were 0.608 (Fig. 5C) and 0.616 Fig. 1 Flow chart of patients' selection from SEER database. SEER Surveillance, Epidemiology, and End Results (Fig. 5D), respectively. These result showed that the prediction model developed in this study was advantageous to the model based on the TNM staging system. In addition, the calibration diagrams for the two cohorts' prediction curve and ideal curve fit together well, indicating the model's high degree of accuracy (Fig. 6).

Decision curve analysis
Decision curve analysis (DCA) was performed at 1, 3 and 5 years of OS in training cohort (Fig. 7A) and validation cohort (Fig. 7B). In both data cohorts, the nomogram demonstrated good clinical value, with a greater clinical utility value in predicting OS at 3 and 5 years and only moderately less at 1-year OS.

Discussion
CRC is one of the most common malignant tumors globally, accounting for 10% of newly diagnosed cases of malignancy [1]. The TNM staging system proposed by AJCC is based on the depth of local invasion, without taking into consideration the size of the primary tumor. Some studies have suggested that tumor size may be a useful supplement to the TNM staging system to enhance the accuracy of prognostication for CRC [12][13][14][15][16]. Compared to invasion depth, the tumor size can be obtained via imaging examinations, which are simple, safe, non-invasive and accurate. However, there is no unified optimal cutoff value to stratify tumor size. Shiraishi et al. [17] analyzed 95 patients with T4 stage CRC and determined the optimal threshold of 5.0 cm for tumor size via ROC curves, confirming that tumor size was significantly associated with prognosis. Deng et al. [18] analyzed clinical pathological parameters of 1250 colorectal cancer hepatopulmonary metastases (CRCHPM) patients and constructed a reliable model with 7 independent prognostic factors. The tumor size threshold selected in the study was 5.5 cm and it was observed that the larger the tumor,  the poorer the prognosis and the greater the risk of liver and lung metastasis. Saha et al. [19] conducted a subgroup analysis of 300,386 CRC patients with thresholds of 2.0, 4.0, and 6.0 cm, and found that the 5-year survival rate decreased significantly with the increase in tumor size, and this trend was observed regardless of lymph node metastasis. Another study that stratified the metastatic potential and risk of disease recurrence of 1538 CRC patients showed    [20]. Our study results confirmed previous findings, although in our overall study population, tumor size had less significant impact on prognosis compared to TNM staging, significant findings were observed across different AJCC stages. Compared to smaller primary tumors, those over 6.9 cm in mCRC patients were associated with worse survival rates independent of other variables. One possible explanation for the negative effect of tumor size on T stage may be the inaccuracy of tumor size calculation. It is conceivable that, for tumors with advanced stages, accurately measuring tumor size is difficult as the invasion extent of the bowel wall may be larger than the maximal diameter of the cancerous extent on the mucosa. Thus, the maximal horizontal diameter may not accurately reflect the extent of tumor growth with advanced infiltration.
On the other hand, for tumors confined to the submucosa, growth into the lumen may be the main pattern, and the tumor size measured at this stage may be the dominant index that depicts the growth of tumor. Nevertheless, we believe there are other potential causes for the impact of tumor size that need further investigation. The distant metastasis of tumors, including CRC, is thought to comprise a multistep process of three phases: (1) Invasion phase: in situ tumor cells increase their invasiveness through epithelial-mesenchymal transition (EMT) processes, penetrating the surrounding tissues and migrating to the blood vessels or lymphatic vessels, creating circulating tumor cells (CTCs); (2) circulation phase: platelets adhere directly to the surface of CTCs, creating micro-thrombi structures that reduce the recognition and clearance of the immune system; (3) settlement phase: CTCs settle in distant organs, forming a pre-metastatic niche, a microenvironment characterized by immune suppression, enabled by the secretion of cytokines or exosomes from the primary tumor site [21]. The classical metastasis model suggests that distant metastasis may take place with the development of time and/or tumor size increase [22], which is in accord with the insights from this study that primary tumors in the colon or rectum kept growing, becoming more invasive, to an early dissemination model [23], which we refer to as the "genius tumor", as these tumors possess metastasisrelated mutant alleles at an early stage. However, due to the lack of multi-omics molecular characterization of CRC in this study, it was impossible to distinguish between patients belonging to the growth-accumulation model and those to the early dissemination model. We also observed that the larger the tumor size of bone or brain metastatic patients, the poorer the prognosis, though the P value was over 0.05. This might be due to the fact that these patients have a poor prognosis themselves, for example, the prognosis of brain metastatic patients is a median survival time of 3-6 months upon diagnosis and no effective therapy [24].
The relationship between tumor size and prognosis of CRC patients has long been of great concern to clinicians. This study, based on a large-scale clinical data analysis, found that tumor size is an independent prognostic factor for mCRC patients. Nevertheless, certain limitations of the study could not be overlooked. Despite the large size of SEER database, our stratification by tumor size and metastatic site resulted in relatively small subgroups, reducing statistical power to detect small differences. This may help explain why we failed to detect significant associations between tumor size and metastatic site in the subgroups of bone and brain metastatic patients. In addition, information on missing molecular characteristics such as BRAF

Conclusion
Our findings suggested that patients with mCRC who had lung or liver metastases have a poorer prognosis when their primary tumors are larger. We developed an accurate prognostic risk assessment model for such patients, allowing them to estimate their overall score via a nomogram and calculate their chances of survival. This plays a significant role in providing clinical guidance. Further prospective research is necessary to determine the role of tumor size in clinical staging models for management selection.
Author contributions QZ: conceptualization, methodology, formal analysis, writing-original draft; BL: data curation, writing-original draft; SZ: visualization, investigation; QH: visualization, writingreview and editing; MZ: writing-review and editing; GL: conceptualization, funding acquisition, resources, supervision, writing-review and editing. The manuscript has been read and approved by all the authors.
Funding This study is supported by grant from the Natural Science Foundation of Tianjin Municipal Science and Technology Commission (No. 21JCQNJC01370) and Foundation of Tianjin Municipal Education Commission (No. 2020KJ154).
Data availability Publicly available datasets were analyzed in this study. These data can be found here: https:// seer. cancer. gov/ seers tat/.

Conflict of interest No potential conflict of interest was reported by the authors.
Ethical approval Through a request presented to the SEER database, our study was granted permission to access all of the data. Since SEER is a public database, the Institutional Review Board's approval is not required.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.