1 Introduction

With the evolution of modern lifestyles, the incidence of colon cancer is rising annually, presenting an escalating challenge to public health. According to the Global Tumor Epidemiology Survey, approximately 1.8 million new cases of colon cancer are diagnosed each year. Currently, colorectal cancer ranks among the three most prevalent malignant tumors globally [1].

Early diagnosis and treatment are pivotal in the management of tumors. Recent advancements in surgical techniques, coupled with the continuous evolution of comprehensive therapies such as chemotherapy and radiotherapy, have markedly improved the 5-year survival rate for patients with advanced colon cancer. Furthermore, clinicians are increasingly exploring novel treatment modalities, including immunotherapy and molecularly targeted agents, to extend survival and enhance quality of life. Immunotherapy, which harnesses the patient’s own immune system to combat cancer cells, and molecularly targeted drugs, which precisely address specific molecular targets within cancer cells, represent innovative therapeutic approaches that offer renewed hope in the fight against colon cancer. Notwithstanding these noteworthy advancements, distant metastasis endures as a paramount quandary in the domain of tumor therapeutics. Screening endeavors have brought to light the disconcerting statistic that 20% of colon cancer patients have manifested distant metastatic progression originating from their neoplastic lesions, with the hepatic locale prevailing as the predominant site of metastatic colonization, a predilection ascribed to the distinctive circulatory network intrinsic to the gastrointestinal (GI) tract [2]. Additionally, highly malignant tumors can also metastasize to bones by altering the bone microenvironment or via hematogenous spread. Although the incidence of colon cancer metastasizing to bone is relatively infrequent, research indicates that the survival rate for colon cancer patients with bone metastases is merely 20%, with current therapeutic strategies primarily aimed at symptomatic relief rather than curative treatment [2, 3]. Furthermore, bone metastases are linked to skeletal-related events such as spinal cord compression, pathological fractures, and malignant hypercalcemia [4, 5]. Patients may experience loss of mobility and persistent bony pain, which not only restricts daily activities but also heightens the risk of psychological issues, including depression [6].

Currently, the diagnosis of bone metastasis in colon cancer predominantly relies on imaging modalities such as bone scans and computed tomography (CT). While these techniques are crucial for the early detection and assessment of bone metastases, they present several challenges. Firstly, these imaging tests not only elevate the workload for medical professionals, necessitating extensive analysis and interpretation of imaging data, but also demand frequent follow-up and monitoring. Secondly, the financial burden on patients and their families is significantly heightened, particularly due to the high frequency and substantial cost of these tests. This financial strain encompasses not only direct medical expenses but also indirect costs, including transportation, accommodation, and potential loss of income resulting from frequent medical visits. [7]. In addition, some examination protocols are invasive and radioactive, which can cause harm to patients. Therefore, some clinical researchers have opted for traditional regression models to predict the occurrence of bone metastasis in colon cancer. However, these conventional models exhibit relatively poor discriminatory power and calibration, failing to accurately reflect the incidence of tumor-induced bone metastasis. Traditional regression approaches often struggle with complex variable relationships and multidimensional data, resulting in limitations in predictive accuracy. In contrast, artificial intelligence, particularly through machine learning algorithms, demonstrates significant promise in the medical field. As a crucial branch of AI, machine learning algorithms excel in uncovering intricate relationships and underlying patterns between variables by conducting in-depth analysis and learning from extensive datasets. These algorithms not only identify subtle data patterns but also integrate diverse information to generate precise predictions, thereby enhancing the capability to forecast future disease occurrences. The proficiency of machine learning algorithms in managing high-dimensional and non-linear data offers a distinct advantage in predicting bone metastasis in colon cancer, thereby improving early warning systems and enabling more personalized treatment strategies.

In this study, we employed machine learning algorithms to develop a predictive model for tumor bone metastasis following CME, by analyzing the clinical data of patients with right colon cancer. This approach aims to assist clinicians in devising precise and personalized treatment strategies in a timely manner, thereby enhancing the quality of postoperative survival for patients.

2 Materials and methods

2.1 Study objectives and subjects

The principal aim of this investigation was to fashion a prognostic model forecasting the incidence of bone metastasis subsequent to Complete Mesocolic Excision (CME) in patients afflicted with colon cancer. The ancillary goal was to delineate discernible high-risk factors associated with BM in colon cancer.

In this study, we utilized clinical data obtained from the databases of two medical institutions, namely, Wuxi People's Hospital affiliated with Nanjing Medical University and Wuxi Second People's Hospital. Our study encompassed patients from the decade spanning 2010 to 2020, hailing from both medical centers, who conformed to the stipulated inclusion criteria. The inclusion criteria for the study cases were as follows: (A) patients who had undergone open mesocolic excision (CME) or laparoscopic-assisted CME; (B) the surgical team consisted of senior surgeons with the ability to independently perform CME; (C) patients who had been diagnosed with colon cancer through imaging and postoperative pathological biopsy; and (D) patients who had been diagnosed with BM from colon cancer through surgical exploration and pathological biopsy. The exclusion criteria for the cases were as follows: (a) patients who had a concurrent diagnosis of other malignancies; (b) patients who were diagnosed with severe cardiovascular or cerebrovascular disease; (c) patients with severe organic diseases such as liver and kidney diseases; and (d) patients with incomplete medical records, missing clinical data, or lost to follow-up. We tracked the progress of these patients starting from the inception of their postoperative phase and continued the follow-up until January 2023. Postoperative monitoring was upheld for a minimum of 3 years for all patients. Positive cases were categorized as individuals who exhibited the development of BM during the course of the follow-up, while negative cases pertained to those who did not manifest BM throughout the follow-up period. This study was conducted in accordance with the Declaration of Helsinki and was approved by the Ethics Committee of Wuxi People's Hospital and Wuxi Second People's Hospital (Jiangnan University Medical Center), with approval number KY22085.

2.2 Surgical program and key steps

The surgical protocol encompasses the subsequent pivotal maneuvers: in the primary phase, the surgeon unveils the ileocolic vein, subsequently proceeding to the isolation of veins within the mesentery. Following this, via an abdominopelvic exploration, the surgeon meticulously ascertains the precise location, dimensions, and the presence of distant metastasis associated with the tumor. Once the surgical strategy is conclusively established, the surgeon then incises the mesentery to ensure thorough clearance of the roots, including the lymph nodes and adipose tissue. During this process, due attention is devoted to the mesenteric vasculature, including the ileocolic and right colonic arteries.

Subsequently, the surgeon gains access to the intramesenteric space, commonly referred to as Toldt's hiatus, situated medially to the mesentery of the right transverse colon, and incrementally liberates it toward the hepatic region. While executing this step, the ureter, reproductive vessels, and other associated anatomical structures are adeptly exposed and meticulously safeguarded. The culmination of the procedure involves the precise division of the gastrocolic ligament, the hepatic colonic ligament, and the right septal colonic ligament, guaranteeing the complete liberation of the colon. The ultimate step entails the resection of the right half of the colon, succeeded by the anastomosis of the terminal ileum and the transverse colon.

This meticulously executed surgical intervention assures the comprehensive removal of the tumor and its adjacent lymph nodes, thus augmenting the overall outcome and prognostic outlook of the procedure [8]. It is pertinent to note that the surgical technique and protocols remained consistent across both medical centers throughout the study, with no alterations observed over the course of time.

2.3 Study design and data collection

The data included 38 preoperative variables (within 24 h before the day of surgery), intraoperative variables, and postoperative variables (occurring 48 h after the initial surgery). Preoperative variables collected included patient demographic characteristics (gender, age, history of smoking, alcohol abuse, and body mass index), basic clinical characteristics (American Society of Anesthesiologists score, nutrition risk screening 2002 score, history of surgery, adjuvant chemotherapy history, and adjuvant radiotherapy history), basic medical history (anemia, diabetes, hypertension, hyperlipidemia, and coronary artery disease), laboratory tests (albumin, carbohydrate antigen 125, carbohydrate antigen 72–4, alkaline phosphatase, and carbohydrate antigen 19–9), and tumor characteristics (T-stage, N-stage, peripheral nerve invasion, tumor size, tumor number, tumor type, and distant metastasis of tumor). Intraoperative variables collected included surgical approach, operative time, intraoperative bleeding, number of lymph nodes dissected, and whether it was an emergency procedure. Postoperative variables collected included laboratory test indices (procalcitonin, C-reactive protein, neutrophil-to-lymphocyte ratio, and serum amyloid A). The outcome indicator for this study was BM from colon cancer.

2.4 Statistical analysis

In the present study, statistical analysis was carried out using SPSS software and R software. The following steps were taken for the construction and evaluation of the clinical prediction models: Data preprocessing. The study utilized colon cancer patients from Wuxi People's Hospital during the period of January 2010 to January 2020 as the internal validation set and colon cancer patients from Wuxi Second People's Hospital during the same period as the external validation set. The internal validation set was further divided into a training set (70%) and a test set (30%) using random selection. Categorical variables were compared using the chi-square test, while continuous variables with a normal distribution were analyzed using the t test. For continuous variables that did not follow a normal distribution, the rank sum test was used. A p value of less than 0.05 was considered statistically significant.

2.5 Development and evaluation of predictive models for machine learning algorithms

(A) The data from the internal validation set were subjected to both univariate and multivariate regression analyses. Variables that showed significance in the univariate analysis were subjected to logistic regression analysis to identify their independent influence on BM from colon cancer. Four machine learning models, including extreme gradient boosting (XGBoost), random forest (RF), support vector machine (SVM), and k-nearest neighbor algorithm (KNN), were employed to assess the significance of each factor and rank them based on their weight of influence. Variables that ranked among the top ten in all four models and showed significance in both univariate and multivariate analyses were selected. (B) Evaluate and build prediction models. The filtered clinical variables were incorporated into four machine learning algorithms, including SVM, RF, XGBoost, and KNN. The four models were evaluated using three criteria: discrimination, calibration, and clinical utility, with the best performing model being selected for further analysis. A receiver operating characteristic (ROC) curve was used to calculate the area under the curve (AUC) value and gauge the model's predictive ability. The calibration curve was plotted to assess the agreement between the predicted and actual results, and decision curve analysis (DCA) was performed to determine the benefit to patients from interventional treatment. Internal validation was conducted through a k-fold cross-validation methodology. (C) External validation of the best model. The generalizability and predictive efficiency of the model was assessed by applying it to an external validation set and plotting ROC curves. (D) Model interpretation. The contribution of each feature in the sample to the prediction was obtained through Shapley value-based Shapley additive explanation (SHAP). The ranking of risk factors' importance was depicted through the SHAP summary plot, and the prediction results of individual samples were analyzed and interpreted through the SHAP force plot.

3 Results

3.1 Patient selection

In accordance with the predefined inclusion and exclusion criteria, the study population underwent meticulous screening. Ultimately, a total of 1,151 patients diagnosed with colon cancer were deemed eligible for inclusion in the study. Among this cohort, 73 individuals (constituting 6.34% of the total) were diagnosed with BM associated with colon cancer (Table 1 and Fig. 1). The internal validation set consisted of 823 colon cancer patients, of whom 57 (6.93%) had BM, while the external validation set had 328 colon cancer patients, of whom 16 (4.88%) had BM. The raw data provided in the study are shown in Table S1. The internal dataset is divided into training and validation sets in a 3:7 ratio, with the random seed set to 42 to facilitate the model construction. We compared the baseline characteristics between the two cohorts (Table S2). Additionally, we compared the baseline characteristics of the internal and external datasets to assess the model's generalizability (Table S3).

Table 1 Preoperation and intraoperative information
Fig. 1
figure 1

Flow diagram of patients included in the study

3.2 Screening for risk factors for postoperative BM from colon cancer

The results of univariate and multivariate analyses showed that alkaline phosphatase (ALP) level, history of smoking, history of anemia, tumor size, depth of tumor invasion, tumor lymph node invasion, tumor liver metastasis, tumor lung metastasis, and postoperative neutrophil-to-lymphocyte ratio (NLR) were independent influencing factors for postoperative BM from colon cancer (P < 0.05) (Table 2). XGBoost, RF, SVM and KNN models screened risk factors for postoperative BM from colon cancer, including ALP ≥ 360 U/L, tumor > 5 cm, T3 and T4 stage tumors, tumor lymph node invasion, tumor lung metastasis, and postoperative NLR ≥ 3 (Fig. 2A–D). After comprehensive analysis, the variables included in the prediction model were ALP level, tumor size, depth of tumor invasion, tumor lymph node metastasis, tumor lung metastasis, and postoperative NLR level.

Table 2 Univariate and multivariate analyses of variables related to BM
Fig. 2
figure 2

The variable ranking plots of the four models. A Variable importance ranking diagram of the XGBoost model. B Variable importance ranking diagram of the RF model. C Variable importance ranking diagram of the SVM model. D Variable importance ranking diagram of the KNN model

3.3 Model building and evaluation

The results of the ROC curve analysis revealed that the XGBoost model had a high AUC value of 0.973 in the training set, while the AUC value in the validation set was 0.922, which demonstrated the best performance compared to the other three models (Table 3). Furthermore, the calibration curves of all four models resembled the ideal curves, indicating that there was high agreement between the predicted and actual results. Additionally, the decision curve analysis (DCA) curves showed that all four models had a net clinical benefit when compared to both the full treatment and no treatment plan (Fig. 3A–D). The study evaluated the generalization ability of the four models using k-fold cross-validation. A sample of 123 cases (15.00%) from the internal validation set was taken as the validation set, and the remaining samples were used as the training set and subjected to tenfold cross-validation. The XGBoost algorithm had an AUC value of 0.9454 ± 0.0309 in the validation set and an AUC value of 0.9386 in the test set, with an accuracy of 0.9435 (Fig. 4A–C). The RF algorithm showed an AUC value of 0.8027 ± 0.1190 in the validation set and an AUC value of 0.7856 in the test set, with an accuracy of 0.8629. The SVM algorithm exhibited an AUC value of 0.8347 ± 0.0687 in the validation set and an AUC value of 0.8405 in the test set, with an accuracy of 0.9355. Finally, the KNN algorithm had an AUC value of 0.8367 ± 0.0812 in the validation set and an AUC value of 0.8335 in the test set, with an accuracy of 0.9274. After comprehensive comparison, the XGBoost algorithm was selected to construct the model in this study.

Table 3 Evaluation of the performance of the four models
Fig. 3
figure 3

Evaluation of the four models for predicting BM. A ROC curves for the training set of the four models. B ROC curves for the validation set of the four models. C Calibration plots of the four models. The 45° dotted line on each graph represents the perfect match between the observed (y-axis) and predicted (x-axis) complication probabilities. A closer distance between two curves indicates greater accuracy. D DCA curves of the four models. The intersection of the red curve and the all curve is the starting point, and the intersection of the red curve and the None curve is the node within which the corresponding patients can benefit

Fig. 4
figure 4

Internal validation of the XGBoost model. A ROC curve of the XGBoost model for the training set. B ROC curve of the XGBoost model for the validation set. C ROC curve of the XGBoost model for the test set. D External validation of the XGBoost model

3.4 Model external validation

The disease prediction model achieved a high degree of accuracy, as evidenced by the external validation set's AUC value of 0.83 (Fig. 4D).

3.5 Model explanation

The SHAP summary plot revealed that risk factors for BM from colon cancer following CME were ranked as tumor lymph node metastasis, ALP levels greater than or equal to 360 U/L, tumor lung metastasis, T3 and T4 stage tumors, postoperative NLR greater than or equal to 3, and tumors larger than 5 cm (Fig. 5). Additionally, the SHAP force plot demonstrated the predictive analysis of the study model for four colon cancer patients with BM. Patient one had a predicted probability of 0.005 for bone metastasis, with ALP levels greater than or equal to 360 U/L being the factor that increased the likelihood. Patient two had a predicted probability of 0.316 for BM, with the factors that increased likelihood being ALP levels greater than or equal to 360 U/L, tumor lymph node metastasis, and postoperative NLR greater than or equal to 3. Patient three had a predicted probability of 0.225 for BM, with the factors that increased likelihood being tumor lymph node metastasis, postoperative NLR greater than or equal to 3, and ALP levels greater than or equal to 360 U/L. Finally, patient four had a predicted probability of 0.128 for BM, with the factors that increased likelihood being postoperative NLR greater than or equal to 3 and tumor lymph node metastasis (Fig. 6A–D).

Fig. 5
figure 5

SHAP summary plot. Risk factors are arranged along the y-axis based on their importance, which is given by the mean of their absolute Shapley values. The higher the risk factor is positioned in the plot, the more important it is for the model

Fig. 6
figure 6

SHAP force plot. The contributing variables are arranged in the horizontal line, sorted by the absolute value of their impact. Blue represents features that have a negative effect on disease prediction, with a decrease in SHAP values; red represents features that have a positive effect on disease prediction, with an increase in SHAP values. A Predictive Analysis of Patient I. B Predictive Analysis of Patient II. C Predictive Analysis of Patient III. D Predictive Analysis of Patient IV

4 Discussion

Within the purview of this study, we harnessed four discrete machine learning algorithms with the objective of crafting a risk prognostication model for BM arising from colon cancer subsequent to CME. Notably, the XGBoost algorithm not only evinced remarkable precision but also demonstrated attributes of resilience and scalability [9]. The RF algorithm, a decision tree-based machine learning approach, assembles multiple decision trees through random selection of training samples and features, and subsequently derives prediction outcomes through either voting or averaging. While the RF algorithm has displayed potential in mitigating overfitting, it falls short in addressing regularization concerns when juxtaposed with the XGBoost algorithm, which may lead to the potential overfitting of models. In contrast, the SVM algorithm operates as a machine learning technique that leverages maximal margin classification to differentiate between samples of diverse categories by establishing a hyperplane with maximum separation. On the other hand, the KNN algorithm, a distance-based machine learning method, forecasts the category of a novel data point by identifying its nearest neighbors within the training dataset. Although both algorithms exhibited substantial predictive accuracy throughout this study, they are characterized by elevated computational complexity and susceptibility to stability issues when confronted with large-scale datasets containing numerous features. In this context, the XGBoost algorithm proves to be better aligned for multidimensional investigations, offering a substantial reduction in computational demands and training time in comparison to the SVM and KNN algorithms [10]. Hence, following an exhaustive comparative analysis of the four machine learning algorithms, this study elected to employ the XGBoost algorithm in the construction of a predictive model for ascertaining the incidence of BM arising from colon cancer.

Several studies have confirmed the efficacy of machine learning algorithms in clinical diagnosis and prognosis [9, 10]. In addition, machine learning techniques have demonstrated superior accuracy in predicting adverse outcomes in disease progression when compared to conventional diagnostic methods. The present study also leveraged machine learning algorithms to develop a predictive model, which has several benefits for patients, including avoiding unnecessary testing, reducing financial burdens on families, and minimizing the side effects of diagnostic procedures. Furthermore, this model can aid clinical decision-making by accurately identifying high-risk patients and facilitating timely intervention, ultimately leading to improved patient outcomes.

BM generally manifest in the advanced stages of cancer, subsequent to the dissemination of tumor cells to other regions of the body. The current study's SHAP analysis results indicate that patients with lung metastasis from colon cancer have a higher risk of developing BM. From an anatomical perspective, the lungs are abundant in blood circulation and exhibit loose tissue. As such, colon cancer with lung metastasis may propagate to the systemic skeletal system via both pulmonary and systemic circulation [3]. Furthermore, such tumors may also metastasize to the pleura via lung capillaries and subsequently disseminate to chest bone tissues such as the ribs or scapula. Additionally, metastatic lung cancer cells generate bone-resorbing elements and other chemicals that breakdown bone tissue, promoting the dissemination of cancer cells [11]. The current study also revealed that colon cancer with larger tumor size, deeper invasion, and lymph node metastasis exhibited a higher likelihood of developing BM. These tumor cells are more malignant, possess a greater capacity to proliferate and spread and are more prone to hematogenous dissemination. Additionally, the unique blood circulation pathway between the digestive system and the skeletal system is implicated in colon cancer BM. Specifically, most of the blood from the colorectum returns to the inferior vena cava, while a small portion enters the spinal venous system, particularly the Batson plexus. The Batson plexus is an important structure that connects the digestive system with the bone and is situated in close proximity to the rectum, lower lumbar vertebrae, and sacral flanks. However, it lacks venous valves and features narrow vessels with sluggish blood flow [3]. Therefore, tumor cells that have disseminated through blood circulation can use their adhesive properties to form metastases in the spine. Moreover, each spinal segment has a thoracoabdominal vena cava that provides nourishment and support. Owing to the low venous pressure in this region, tumors with a high degree of invasion can directly enter the vertebral system through the bloodstream [12]. On the other hand, there exists an extensive network of lymph nodes in the colonic mesentery, and malignant tumors can spread continuously following invasion, rendering it difficult for the surgeon to accurately determine the extent of tumor spread with naked-eye observation alone. Even if the surgeon adheres to the principle of radical surgery, residual cancer cells may rise to BM. Additionally, during intraoperative lymph node dissection, colon cancer cells may enter the venous system and metastasize to distant bone tissue through the body circulation [13]. It is worth noting that clinicians frequently employ various adjuvant treatments, including radiotherapy and chemotherapy, in the management of patients with malignant tumors. However, these treatments can disturb the delicate balance of the skeletal system, rendering it more vulnerable to the proliferation and metastasis of cancer cells. Some prospective studies have corroborated the findings of the current study by identifying malignancy of the tumor as a significant risk factor for the development of BM in individuals with colorectal cancer [14].

Previous research has discovered that certain inflammatory factors have the ability to suppress the human immune response, increase patients' immune tolerance to chemotherapeutic agents, and promote tumor metastasis and recurrence [15, 16]. Some medical professionals have even begun to use biomarkers such as interleukin and tumor necrosis factor-alpha (TNF-α) to predict poor prognoses, including BM, in cases of colon cancer [17]. However, these screening methods require expensive and complicated techniques, which can place additional financial burdens on patients and their families. Consequently, the present study employed easily obtainable and cost-effective inflammation-related parameters and determined that patients with higher NLR were at an increased risk of developing BM from colon cancer after undergoing surgery. Neutrophils have the capacity to eliminate cancer cells directly and indirectly by recruiting other immune cells to the tumor microenvironment. However, elevated numbers of neutrophils infiltrating the tumor microenvironment may undergo transformation into tumor-associated neutrophils-2 (TAN-2) in response to a variety of factors [18]. These TAN-2 cells produce hepatocyte growth factor, vascular endothelial growth factor, and matrix metalloproteinase-2 (MMP-2), which contribute to the breakdown of basement membrane glycoproteins and extracellular matrix proteins by tumor cells. Additionally, TAN-2 promotes the proliferation of tumor vascular endothelial cells. Therefore, an increase in neutrophil count may also suggest a negative prognosis in colon cancer patients [19, 20]. On the other hand, lymphocytes play a significant role in cancer immunosurveillance. Reduced lymphocyte counts may indicate poor biological behavior of the tumor. A retrospective study conducted by Corrado on 603 patients with colon cancer revealed that elevated NLR levels were highly correlated with reduced survival rates in colorectal cancer patients, which further supports the findings of the present study [21,22,23].

This study focuses on the use of four samples to explain the rationale behind predicting models for BM from colon cancer. In sample 2, the analysis of disease prediction identified elevated ALP levels as important risk factors. ALP is a hydrolytic enzyme that plays a role in the replication process of various genetic materials. Its main distribution is in vital organs of the digestive, urinary, and skeletal systems, and clinicians often use ALP levels to predict liver and biliary tract diseases [24]. Additionally, we propose that ALP possesses the capability to reflect BM. The progression of bone metastasis in colorectal cancer involves tumor cell-mediated bone resorption and osteoblast impairment, where osteoblasts synthesize substantial amounts of alkaline phosphatase to initiate the repair of damaged bone [25]. In a study conducted by Hung et al., encompassing 10,800 colorectal cancer patients, it was observed that patients with elevated levels of ALP exhibited reduced survival rates postsurgery [26].

The current investigation furnished a comprehensive evaluation of the model encompassing aspects of discrimination, calibration, and clinical applicability. We posit that this study may serve as an exemplar of a predictive model, transferable to the prognostication of postoperative complications across a spectrum of moderate to challenging abdominal procedures. However, the study has some limitations. The present investigation encompassed a diverse array of risk factors associated with BM, albeit without a specific emphasis on patient imaging data. Due to constraints in human resources, this study's model did not fully incorporate several prevalent biomarkers associated with colorectal cancer and bone metastasis. Future research will focus on an in-depth exploration of these biomarkers to develop more precise predictive models. Conversely, the study was constrained by the relatively limited number of cases and the inclusion of patients with a singular tumor type, constituting inherent limitations. In forthcoming research endeavors, we envisage the incorporation of a more expansive patient cohort, encompassing a broader spectrum of tumor types from various medical centers, thus facilitating predictive analyses pertaining to BM. Furthermore, it is important to acknowledge that while machine learning algorithms demonstrated heightened accuracy, they yielded models characterized by heightened complexity and reduced interpretability. The entire computational and decision-making process of these models operates within a figurative "black box," potentially lacking the same level of intuitiveness and transparency as conventional logistic regression models. In addition, it is pertinent to note that the current study was conducted retrospectively, which carries certain inherent limitations including the potential for selection bias, distributional bias, and retrospective bias.

5 Conclusion

The current study successfully developed a prediction model for the risk of bone metastasis in colon cancer patients after CME using the XGBoost machine learning algorithm. The model exhibits high accuracy in predicting risk and offers significant clinical utility, enabling prompt diagnosis by physicians. The results highlight that BM remains a significant issue among colon cancer patients and is strongly associated with several factors, including ALP levels, tumor size, invasion depth, lymph node metastasis, lung metastasis, and postoperative NLR levels.