Abstract
Lymph node metastasis (LNM) is one of the crucial factors in determining the optimal treatment approach for colorectal cancer. The objective of this study was to establish and validate a column chart for predicting LNM in colon cancer patients. We extracted a total of 83,430 cases of colon cancer from the Surveillance, Epidemiology, and End Results (SEER) database, spanning the years 2010–2017. These cases were divided into a training group and a testing group in a 7:3 ratio. An additional 8545 patients from the years 2018–2019 were used for external validation. Univariate and multivariate logistic regression models were employed in the training set to identify predictive factors. Models were developed using logistic regression, LASSO regression, ridge regression, and elastic net regression algorithms. Model performance was quantified by calculating the area under the ROC curve (AUC) and its corresponding 95% confidence interval. The results demonstrated that tumor location, grade, age, tumor size, T stage, race, and CEA were independent predictors of LNM in CRC patients. The logistic regression model yielded an AUC of 0.708 (0.7038–0.7122), outperforming ridge regression and achieving similar AUC values as LASSO regression and elastic net regression. Based on the logistic regression algorithm, we constructed a column chart for predicting LNM in CRC patients. Further subgroup analysis based on gender, age, and grade indicated that the logistic prediction model exhibited good adaptability across all subgroups. Our column chart displayed excellent predictive capability and serves as a useful tool for clinicians in predicting LNM in colorectal cancer patients.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Colorectal cancer (CRC) refers to malignant tumors occurring in the proximal colon, distal colon, or rectum [1]. Currently, the standard treatment for CRC is curative surgery, although endoscopic resection may be considered for some early-stage colon cancer patients. However, if lymph node metastasis (LNM) is present, the treatment principles may significantly differ. For early-stage CRC with LNM, endoscopic treatment may not be suitable [2, 3]. On the other hand, neoadjuvant chemotherapy should be considered for advanced-stage CRC with LNM. Therefore, regardless of whether endoscopic or surgical resection is performed, preoperative assessment of LNM in CRC patients is essential. Predicting the presence of LNM in CRC patients before surgery holds great significance, both in treatment selection and prognostic evaluation [4]. Several studies have developed predictive models for LNM in CRC patients; however, these studies have limitations such as small sample sizes, single-center designs, or lack of external validation cohorts [5,6,7].
Currently, the risk factors for LNM remain unclear. To address this, we extracted data from the Surveillance, Epidemiology, and End Results (SEER) database for patients diagnosed with CRC between 2010 and 2019. Subsequently, we constructed a nomogram to predict LNM in CRC patients and evaluated the applicability of the model through external validation.
Methods
Study design and population
The data were extracted from the SEER database, a population-based clinical data repository proposed by the National Cancer Institute, covering approximately 28% of the U.S. population [8]. Using SEER Stat software (Calverton, Maryland), we obtained a list of cases diagnosed with CRC between 2010 and 2019 from the SEER database. Since patient data in the SEER database are publicly available and de-identified, this study was exempt from ethical review.
Inclusion and exclusion criteria
Inclusion criteria: (1) Primary tumor located in the colon or rectum, (2) Pathological type classified as adenocarcinoma, (3) Patients with complete and available clinical baseline data. Exclusion criteria: (1) Multiple primary malignant tumors, (2) Distant metastasis present, (3) Zero survival time, (4) Diagnosed based on death certificates or autopsy reports.
Variable categorization
We extracted data on patients' race, age (< 60 years, 60–70 years, > 70 years), gender, marital status, T stage, tumor location, tumor size (< 7 cm, 7–15 cm, > 15 cm), and CEA levels. LNM served as the endpoint indicator.
Statistical analysis
Data were retrieved from the SEER database using SEER Stat software (version 8.4.2). The data from 2010 to 2017 were divided into training and testing sets in a 7:3 ratio. Data from 2018 to 2019 were used for external validation. Categorical variables were presented as numbers and percentages, and intergroup comparisons were performed using the chi-square test (χ2) or Fisher's exact test.
Univariate and multivariate logistic regression analyses were sequentially conducted to identify independent risk factors for LNM and establish a predictive model. Models were developed using logistic regression, LASSO regression, ridge regression, and elastic net regression methods. The Hosmer–Lemeshow goodness-of-fit test was performed to assess the fitness of the predictive model. Model performance was quantified by calculating the area under the receiver operating characteristic curve (AUC) with its corresponding 95% confidence interval (CI), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The optimal-performing model was selected, and a nomogram was constructed.
All statistical analyses and visualizations were performed using R software (version 4.3.1). The glmnet package was used for constructing LASSO regression, ridge regression, and elastic net regression models with ten-fold cross-validation. Other R packages, such as compareGroups, ResourceSelection, rms, and pROC, were also utilized. A p value < 0.05 was considered statistically significant.
Result
Baseline characteristics
A total of 83,430 cases from the SEER database were included in this study as the training and testing sets for the period between 2010 and 2017, and 18,545 cases from 2018 to 2019 were used as the external validation set (Table 1). Among these 83,430 patients, 43,889 (52.6%) were male. The racial distribution consisted of 65,641 (78.7%) White individuals, 8,776 (10.5%) Black individuals, and 9,013 (10.8%) individuals from other ethnicities. From Table 1, it can be observed that none of the univariate variables showed statistical significance between the testing and training sets. More detailed features are presented in Table 1.
Differences in characteristics between patients with and without LNM
Table 2 displays the characteristics of patients with and without LNM. The results indicate significant differences between patients with and without LNM in terms of age, tumor location, T stage, tumor size, tumor grade, CEA levels, and race (all P < 0.001). The occurrence rate of LNM was higher in patients younger than 60 years compared to other age groups (P < 0.001). CEA-positive patients had a higher LNM occurrence rate (P < 0.001). As tumor grade increased, the proportion of LNM also increased (P < 0.001). LNM in rectal cancer was significantly higher than in colon cancer. Additionally, the occurrence rate of LNM increased with larger tumor diameters.
Risk factors associated with LNM in CRC patients
The results of univariate and multivariate logistic regression analyses are presented in Table 3. Multivariate logistic regression analysis revealed that T stage, CEA levels, tumor size, and tumor grade were independent risk factors for LNM in CRC patients. The risk of LNM in rectal cancer patients was 1.4 times higher than in colon cancer patients (OR 1.40; 95% CI 1.34–1.46). Compared to patients with tumor diameter < 7 cm, the risk of LNM occurrence in patients with tumor diameters of 7–15 cm and > 15 cm was 1.20 times (OR 1.20; 95% CI 1.01–1.43) and 1.36 times (OR 1.36; 95% CI 1.16–1.61), respectively. Compared to patients with grade I tumors, the risk of LNM occurrence in patients with grade II, III, and IV tumors was 1.31 times (OR 1.31; 95% CI 1.21–1.41), 2.32 times (OR 2.32; 95% CI 2.13–2.52), and 2.39 times (OR 2.39; 95% CI 2.11–2.72), respectively. Compared to T1 stage, the risk of LNM occurrence in patients with T2, T3, and T4 stages was 1.18 times (OR 1.18; 95% CI 1.68–2.03), 5.40 times (OR 5.40; 95% CI 4.96–5.89), and 8.89 times (OR 8.89; 95% CI 8.08–9.80), respectively. White individuals had the lowest risk of LNM compared to Black individuals and individuals from other races. CEA-positive patients had a 1.28 times higher LNM occurrence rate compared to CEA-negative patients (OR 1.28; 95% CI 1.24–1.33). The occurrence rates of LNM in patients aged 60–70 years and > 70 years were 0.76 times (OR 0.76; 95% CI 0.73–0.80) and 0.55 times (OR 0.55; 95% CI 0.53–0.58), respectively, compared to patients younger than 60 years.
Model comparison and selection
We constructed models using logistic regression, LASSO regression, ridge regression, and elastic net regression. Table 4 presents the AUC values of these models in the training and testing sets. In the testing set, the logistic regression, LASSO regression, ridge regression, and elastic net regression models achieved AUCs of 0.708 (95% CI 0.704–0.712), 0.707 (95% CI 0.703–0.711), 0.708 (95% CI 0.704–0.712), and 0.708 (95% CI 0.702–0.714), respectively. There were no significant differences in AUC among these models (P > 0.05). Although all models performed similarly, the logistic regression model was more clinically interpretable. Therefore, the logistic regression model was selected (Fig. 1).
Nomogram for predicting LNM in CRC patients
Table 5 displays the performance of the logistic regression model. In the testing set, the logistic regression model achieved an AUC of 0.708 (95% CI 0.704–0.712), accuracy of 0.637 (95% CI 0.63–0.641), sensitivity of 0.736 (95% CI 0.730–0.742), specificity of 0.569 (95% CI 0.564–0.574), PPV of 0.539 (95% CI 0.534–0.545), and NPV of 0.759 (95% CI 0.753–0.764). The Hosmer–Lemeshow goodness-of-fit test indicated good calibration of the predictive model (χ2 = 10.207, P = 0.251). Furthermore, during external validation, the model achieved an AUC of 0.709 (95% CI 0.701–0.716), indicating its good applicability to external validation data (Fig. 2, Table 5).
Further validation based on different subgroups
Further validation was conducted based on gender, age, and tumor grade (Table 6). In the testing set, the logistic regression predictive model exhibited good performance for male and female patients, as well as patients aged < 60 years, 60–70 years, and > 70 years, and those with grade I tumors. The AUC values for these subgroups were 0.705 (95% CI 0.696–0.714), 0.711 (95% CI 0.702–0.720), 0.685 (95% CI 0.673–0.696), 0.708 (95% CI 0.696–0.720), 0.704 (95% CI 0.694–0.714), and 0.737 (95% CI 0.714–0.760), respectively. In the external validation dataset, the predictive model also demonstrated good applicability to these subgroups, with AUC values of 0.710 (95% CI 0.700–0.720), 0.707 (95% CI 0.696–0.718), 0.696 (95% CI 0.683–0.710), 0.710 (95% CI 0.696–0.724), 0.705 (95% CI 0.694–0.717), and 0.746 (95% CI 0.720–0.771).
Model fit analysis
The calibration curves of the nomogram (Fig. 3A–C) demonstrated high consistency between predicted and observed survival probabilities in both the training and validation cohorts. Additionally, the decision curve analysis (DCA) curves (Fig. 3D–F) indicated the good clinical utility of our model.
Discussion
Colorectal cancer (CRC) is the third leading cause of cancer-related deaths worldwide, with over 1.85 million new cases and 850,000 deaths annually [9]. Lymph node metastasis (LNM) is an important prognostic factor in CRC and influences the selection of treatment options. Studies have shown that the 5-year survival rate for CRC patients with positive lymph nodes ranges from 30 to 60%, significantly lower than that for patients without lymph node involvement (5-year survival rate: 70–80%) [10]. Current treatment options for CRC include endoscopic resection, surgical resection, radiotherapy, chemotherapy, targeted therapy, and immunotherapy. Endoscopic treatment is mainly used for early-stage CRC but is not suitable for patients with LNM. For non-early-stage CRC patients, surgical resection is the primary consideration, but if LNM is present, preoperative adjuvant therapy needs to be evaluated [3]. Therefore, LNM is a crucial determinant in choosing the appropriate treatment approach and serves as an important prognostic factor for CRC recurrence and distant metastasis [11,12,13]. Predicting LNM can provide more accurate personalized treatment strategies, which is of paramount importance for CRC patients.
Numerous models have been developed to predict LNM in CRC patients; however, they have limitations and certain shortcomings [14]. In this study, we established a model based on the SEER database and conducted internal and external validations. We compared four models—logistic regression, lasso regression, ridge regression, and elastic net regression—and analyzed significant variable factors. The results indicated that LNM was associated with tumor location, grade, patient age, tumor size, T stage, race, and CEA level.
Previous studies and experience have shown that larger tumor size, deeper infiltration into the intestinal wall, and later stage are associated with a higher probability of lymph node metastasis [15, 16]. Our study revealed that compared to patients with tumor size < 7 cm, the risk of LNM increased by 1.20 times and 1.36 times in patients with tumor sizes of 7–15 cm and > 15 cm, respectively. The probability of lymph node metastasis also increased with higher T stages. Many previous studies have demonstrated the close correlation between tumor size and the risk of LNM [17], confirming tumor size as an independent prognostic factor. This may be related to the high expression of CCR7, as Yan C et al. found significantly higher CCR7 expression in tumors with LNM compared to those without, and CCR7 expression showed a positive correlation with tumor size. These findings align with our research results, indicating that as T stage advances and tumor size increases, the likelihood of lymph node metastasis, especially when reaching T4 stage or tumor diameter > 15 cm, significantly increases. Therefore, careful consideration should be given to the selection of treatment options.
Studies have shown that there are differences in the occurrence rates of lymph node metastasis (LNM) between rectal cancer and colon cancer [18, 19]. Our study indicates that the risk of LNM in rectal cancer patients is 1.4 times higher than that in colon cancer patients. This may be attributed to different tumor biology and anatomical characteristics. Therefore, it is recommended that for early-stage rectal cancer, radical resection rather than local excision seems to be a more reasonable approach, as the involvement of lymph nodes, which are more prone to metastasis in rectal cancer, may be the main cause of local recurrence after surgery. Similarly, rectal cancer patients may require adjuvant chemotherapy more often after local excision [20].
A study in the United States has demonstrated that younger CRC patients have a higher risk of lymph node positivity compared to older patients in an equal environment [21,22,23]. However, our study shows that among patients aged < 60 years, 60–70 years, and > 70 years, the probability of lymph node positivity decreases with increasing age, which is consistent with our research.Based on the above studies, we should exercise caution in endoscopic treatment for young early-stage CRC patients.
CEA plays a crucial role in the biological phenomena of tumor cells, including adhesion, immune response, and apoptosis [24]. Previous research and experience have shown that CEA levels are associated with lymph node positivity and prognosis in patients with CRC [25]. Our study indicates that CEA-positive patients have a 1.6 times higher likelihood of LNM compared to CEA-negative patients. This may be related to the mechanisms of CEA, as it enhances the metastatic potential of CRC through various pathways. In addition to being considered a pro-angiogenic molecule, CEA protects metastatic cells from death, alters the microenvironment of blood sinuses, promotes the expression of adhesion molecules, and enhances the survival of malignant tumor cells [26].
We developed a nomogram based on the SEER database to predict LNM in CRC patients and conducted internal and external validations. Furthermore, we further evaluated the performance of the model in different subgroups. This predictive tool for the likelihood of LNM in CRC patients can guide clinicians in selecting more appropriate treatment strategies. However, this study still has some limitations. Firstly, the data used in the study are solely derived from the SEER database and lack relevant information on patients from other medical regions. Secondly, certain imaging-related data that may contribute to the prediction of lymph nodes are lacking in the SEER database. We hope that further research will address these limitations.
Conclusion
In this study, we developed a nomogram for predicting lymph node metastasis (LNM) in CRC patients. We identified tumor location, grade, age, tumor size, T stage, race, and CEA as independent predictive factors for LNM in CRC patients. This tool can predict the likelihood of LNM in CRC patients, which may aid clinicians in formulating appropriate treatment strategies.
Data availability
The data for this study are publicly available from the Surveillance, Epidemiology, and End Results database (https://seer.cancer.gov/).
References
Ren Q, Chen Y, Shao X, Guo L, Xu X (2023) Lymph nodes primary staging of colorectal cancer in 18F-FDG PET/MRI: a systematic review and meta-analysis. Eur J Med Res 28(1):162
Ichimasa K, Kudo SE, Miyachi H, Kouyama Y, Misawa M, Mori Y (2021) Risk stratification of T1 colorectal cancer metastasis to lymph nodes: current status and perspective. Gut and liver 15(6):818–826
Shinji S, Yamada T, Matsuda A, Sonoda H, Ohta R, Iwai T et al (2022) Recent advances in the treatment of colorectal cancer: a review. J Nippon Med Sch Nippon Ika Daigaku Zasshi. 89(3):246–54
Resch A, Langner C (2013) Lymph node staging in colorectal cancer: old controversies and recent advances. World J Gastroenterol 19(46):8515–8526
Huang YQ, Liang CH, He L, Tian J, Liang CS, Chen X et al (2016) Development and validation of a radiomics nomogram for preoperative prediction of lymph node metastasis in colorectal cancer. J Clin Oncol Off J Am Soc Clin Oncol 34(18):2157–2164
Kajiwara Y, Oka S, Tanaka S, Nakamura T, Saito S, Fukunaga Y et al (2023) Nomogram as a novel predictive tool for lymph node metastasis in T1 colorectal cancer treated with endoscopic resection: a nationwide, multicenter study. Gastrointest Endosc 97(6):1119–28.e5
Liu Y, Sun Z, Guo Y, Liu C, Tian S, Dong W (2023) Construction and validation of a nomogram of risk factors and cancer-specific survival prognosis for combined lymphatic metastases in patients with early-onset colorectal cancer. Int J Colorectal Dis 38(1):128
Li X, Zhou H, Zhao X, Peng H, Luo S, Feng J et al (2022) Establishment and validation for predicting the lymph node metastasis in early gastric adenocarcinoma. J Healthc Eng 2022:8399822
Biller LH, Schrag D (2021) Diagnosis and treatment of metastatic colorectal cancer: a review. JAMA 325(7):669–685
Ong ML, Schofield JB (2016) Assessment of lymph node involvement in colorectal cancer. World J Gastroint Surg 8(3):179–192
Greene FL, Stewart AK, Norton HJ (2002) A new TNM staging strategy for node-positive (stage III) colon cancer: an analysis of 50,042 patients. Ann Surg. 236(4):416–21 (discussion 21)
Gunderson LL, Sargent DJ, Tepper JE, Wolmark N, O’Connell MJ, Begovic M et al (2004) Impact of T and N stage and treatment on survival and relapse in adjuvant rectal cancer: a pooled analysis. J Clin Oncol Off J Am Soc Clin Oncol 22(10):1785–1796
Akasu T, Takawa M, Yamamoto S, Ishiguro S, Yamaguchi T, Fujita S et al (2008) Intersphincteric resection for very low rectal adenocarcinoma: univariate and multivariate analyses of risk factors for recurrence. Ann Surg Oncol 15(10):2668–2676
Guo K, Feng Y, Yuan L, Wasan HS, Sun L, Shen M et al (2020) Risk factors and predictors of lymph nodes metastasis and distant metastasis in newly diagnosed T1 colorectal cancer. Cancer Med 9(14):5095–5113
Nustas R, Messallam AA, Gillespie T, Keilin S, Chawla S, Patel V et al (2022) Lymph node involvement in gastric adenocarcinoma. Surg Endosc 36(6):3876–3883
Yan C, Zhu ZG, Yu YY, Ji J, Zhang Y, Ji YB et al (2004) Expression of vascular endothelial growth factor C and chemokine receptor CCR7 in gastric carcinoma and their values in predicting lymph node metastasis. World J Gastroenterol 10(6):783–790
Xu Y, Chen Y, Long C, Zhong H, Liang F, Huang LX et al (2021) Preoperative predictors of lymph node metastasis in colon cancer. Front Oncol 11:667477
Nascimbeni R, Burgart LJ, Nivatvongs S, Larson DR (2002) Risk of lymph node metastasis in T1 carcinoma of the colon and rectum. Dis Colon Rectum 45(2):200–206
Okabe S, Shia J, Nash G, Wong WD, Guillem JG, Weiser MR et al (2004) Lymph node metastasis in T1 adenocarcinoma of the colon and rectum. J Gastroint Surg Off J Soc Surg Aliment Tract. 8(8):1032–9 (discussion 9-40)
Wang H, Wei XZ, Fu CG, Zhao RH, Cao FA (2010) Patterns of lymph node metastasis are different in colon and rectal carcinomas. World J Gastroenterol 16(42):5375–5379
Alexander MS, Lin J, Shriver CD, McGlynn KA, Zhu K (2020) Age and lymph node positivity in patients with colon and rectal cancer in the US military health system. Dis Colon Rectum 63(3):346–356
Wang L, Hollenbeak CS, Stewart DB (2010) Node yield and node involvement in young colon cancer patients: is there a difference in cancer survival based on age? J Gastroint Surg Off J Soc Surg Aliment Tract 14(9):1355–1361
Khan H, Olszewski AJ, Somasundar P (2014) Lymph node involvement in colon cancer patients decreases with age; a population based analysis. Eur J Surg Oncol J Eur Soc Surg Oncol Br Assoc Surg Oncol 40(11):1474–1480
Hammarström S (1999) The carcinoembryonic antigen (CEA) family: structures, suggested functions and expression in normal and malignant tissues. Semin Cancer Biol 9(2):67–81
Liu H, Cui Y, Shen W, Fan X, Cui L, Zhang C et al (2016) Pretreatment magnetic resonance imaging of regional lymph nodes with carcinoembryonic antigen in prediction of synchronous distant metastasis in patients with rectal cancer. Oncotarget 7(19):27199–27207
Campos-da-Paz M, Dórea JG, Galdino AS, Lacava ZGM, de Fatima Menezes Almeida Santos M (2018) Carcinoembryonic antigen (CEA) and hepatic metastasis in colorectal cancer: update on biomarker for clinical and biotechnological approaches. Recent Patents Biotechnol 12(4):269–79
Funding
This research was funded by the National Natural Science Foundation of China, grant numbers: 82060161.
Author information
Authors and Affiliations
Contributions
XN: Writing—original draft, conceptualization, methodology, visualization, software; JC: Writing—review and editing, validation, supervision, data curation, investigation.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Ethical approval
Since the data from SEER are publicly available and deidentified, this study was exempt from local institutional review board review.
Informed consent
For this type of study, formal consent is not required.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Niu, X., Cao, J. Predicting lymph node metastasis in colorectal cancer patients: development and validation of a column chart model. Updates Surg (2024). https://doi.org/10.1007/s13304-024-01884-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13304-024-01884-6