Introduction

Currently, breast cancer (BC) is the most common malignant tumor that endangers women’s health. According to 2022 cancer statistics, BC has the highest proportion of malignancies diagnosed in American women, accounting for 31% of new cases, and is also the second leading cause of cancer-related death1. With the continuous improvement of BC survival rate, the number of patients with breast cancer metastasis is also increasing2,3. Numerous studies have shown that BC exhibits metastatic heterogeneity with distinctive metastatic precedence to diverse organs, thereby resulting in significant differences in prognoses and therapy response of BC patients4,5,6. It is well known that bone is the most common site for distant metastases of BC, with nearly 75% of distant metastasis being bone metastasis (BM)7. Given the complexity of metastatic BC therapies, the treatment of BC with bone metastasis (BCBM) is limited to cytotoxic chemotherapies, endocrine therapies, and targeted therapies8. Furthermore, although the 5-year overall survival rate of BC patients without metastasis is greater than 80%9, distant metastases significantly reduces this rate to only about 25%10. Strikingly, the 5-year overall survival rate for BM is even lower, at a measly 22.8%11. Studies have also revealed that bone-related events caused by BM, such as bone fracture, hypercalcemia, or spinal cord compression, have a significantly negative impact on the prognosis of BCBM patients12,13,14. Therefore, it is crucial to identify patients who may have bone metastasis and predict their survival rate. This information can guide the subsequent examination, treatment, and management of the patients’ clinical outcomes.

Over the years, tumor-node-metastasis (TNM) staging system, proposed by the American Joint Committee on Cancer (AJCC), and pathological classification, proposed by the World Health Organization (WHO), have been considered as prognostic evaluation systems for BCBM15,16. It is worth noting that these systems only incorporate predictors such as tumor infiltrating depth, invasive site, proliferative marker, gene expression assays, and response to neoadjuvant therapy. In recent years, many prediction models for BCBM have been developed, with factors such as age, sex, race, treatment, and grade being the predictors17,18. However, these models have specific room for improvement in practicality and accuracy. This study aims at establishing a more accurate clinical model, with as many valid variables as possible.

With regard to model development, although nomogram is currently the most commonly used prediction model, Machine learning is favored by more and more medical workers because of its practicality, innovation and accuracy. This study, a reliable BCBM prediction model is developed through the horizontal comparison of multiple indicators based on demographic characteristics, pathological information, and survival data retrieved from the Surveillance, Epidemiology, and End Results (SEER) database. Furthermore, we developed two web pages as an extension of our research. These web pages enable clinicians to obtain precise and quantitative assessments of the likelihood of bone metastasis and the 5-year survival rate for breast cancer by inputting simple data. Finally, our aim is to stratify the risk of possible bone metastasis and poor prognosis in breast cancer patients. Help clinicians make decisions, reduce the unnecessary medical burden of patients, and greatly improve the quality of patients’ life.

Materials and methods

Study population

We collected data from on patients diagnosed with BC between 2010 and 2016 from the SEER database using the SEER*Stat software version 8.3.8.1. Notably, the SEER database, supported by the National Cancer Institute (NCI), covers about 30% of the United States population based on data collected by nearly 18 large cancer registries across the United States19. This study did not require approval by the ethics committee, as well as patient consent and agreement because the data was publicly available and there was no specific personal information.

Variables, including age, race, sex, laterality, marital status, grade, AJCC TNM stage, surgery, radiotherapy, chemotherapy, tumor size, bone metastases, liver metastases, lung metastases, brain metastases, breast subtype, ER, PR, and HER2 were extracted from the SEER database. Patients were included according to the following criteria: (1) breast cancer confirmed by biopsy or pathology; (2) age at diagnosis ≥ 20 years; and (3) diagnosed between 2010 and 2016. The following patients were excluded: (1) patients only diagnosed via autopsy or death certificate; (2) breast cancer was not the first primary malignant tumor; and (3) cases with unknown variables. Since our goal is to predict the probability of bone metastasis and the survival outcome of breast cancer patients after metastasis. The focus set by machine learning is bone metastasis and prognosis after bone metastasis. The prognosis was replaced by 5-year survival rate. Considering that some patients were followed for a short period of time and their exact 5-year survival status could not be obtained, we excluded patients who had been followed up for less than 5 years and survived. This was done to maintain rigor of the study and reduce possible selection bias.

Feature selection and validation strategy

To minimize the negative impact of overfitting, we performed feature selection to remove irrelevant or redundant invalid features. In short, we adopted the analytical thinking commonly used in most articles. Firstly, the univariate analysis was carried out, and the variables with statistically significant differences (P < 0.05) in univariate analysis were incorporated into multivariate analysis, and the variables with statistically significant differences were selected as risk factors. In the diagnostic model, univariable and multivariable logistic regression were performed to screen for risk factors. In the prognostic model, Cox proportional hazards regression were applied to screen for prognostic factors. In addition, to further optimize the model using fivefold cross-validation, we carried out five repeated experiments on the training data and determined the best parameters for each model in the training cohort by grid search method. Finally, the importance of each feature was ranked by Shapley additive explanations (SHAP). In order to prevent the model from being biased and help decision makers understand how to use our model correctly, we need to know the influence of each feature on the final result. To solve this problem, SHAP was developed to analyze the impact of each feature on the predicted results.

The overall dataset collected from the SEER database was randomly divided into two cohorts in a ratio of 7:3, namely training cohort and validation cohort. Metrics such as area under the curve (AUC), accuracy, precision, recall, and F1-score were then used to evaluate the reliability of six machine learning models. Next, calibration curves were constructed and used to compare discrimination between the distinct models. Decision curve analysis (DCA) is a novel algorithm that is commonly used to estimate the net benefit value of a model under different thresholds. Compared to the evaluation indicators mentioned above, DCA could better reflect clinical efficacy of predictive models. After conducting a comprehensive comparison of diverse machine learning models, we chose the model with best prediction ability as the final predictive model. To further confirm the applicability of the selected model, it was evaluated in the validation cohort.

Machine learning algorithms

Python software was used to build machine learning predictive models. It should be noted that the scikit-learn 0.24.1 package is a very important machine learning library in Python filed, which supports four major machine learning algorithms: classification, regression, reduced dimension and clustering. It also includes three modules: feature extraction, data processing and model evaluation. This retrospective study, mainly included six common machine learning algorithms of this package.

Logistic regression is a generalized linear regression analysis model. Although the dependent variables of logistic regression can be dichotomized or multi-classified, dichotomous ones are more common and easier to explain. Logistic regression is mainly used in epidemiology to explore the risk factors of a disease and predict the probability of occurrence of a disease according to the risk factors.

The decision tree classification algorithm is an instance-based inductive learning method, which can extract a tree-like classification model from the given unordered training samples. The complexity of the predictive classification algorithm is only associated with the number of layers of the decision tree, which is linear, and the data processing efficiency is very high, which is suitable for the occasion of real-time classification. In machine learning, a decision tree is a predictive model which represents a mapping relationship between features and tags. Each node in the tree represents an object, whereas each fork path represents a possible attribute value. Finally, each leaf node corresponds to the value of the object represented by the path from the root node to the leaf node.

Random forest, as the name suggests, establishes a forest in a random way. There are many decision trees in the forest, and there is no correlation between each decision tree in the random forest. It adopts the re-sampling technique of bootstrap to repeatedly and randomly select B samples from the original training sample set with N as the training set, and the other samples as the test set.

Extra tree, an algorithm similar to random forest, uses a series of decision trees to make the final prediction of the class or category to which the data point belongs. However, the difference between extra tree and random forest is that it uses the entire original sample instead of subsampling and replacing the data like a random forest. Another difference is the way nodes are segmented. Although the random forest always chooses the best possible segmentation, the extra tree chooses random segmentation. However, both extra tree and random forest are programmed to optimize the final results.

Bayesian classifier is a general term for a class of classification algorithms, all of which are based on Bayes' theorem. The classification principle of Bayesian classifier is to use a priori probability and Bayesian formula to calculate a posteriori probability, and then select the classification result corresponding to the maximum posterior probability. This study used the Gaussian Naive Bayesian (Gaussian NB) model in the Bayesian classifier.

Extreme gradient boosting (XGBoost) model was developed by the Guestrin group in 2016. Given its fast and accurate properties, the model quickly became famous in machine learning related competitions, and is now widely used in the industrial field. It is an improvement on gradient boosting decision tree (GBDT), which has the remarkable feature of efficiently and flexibly processing missing data and assembling weak predictive models to build accurate predictive models. Notably, XGBoost is more original and better compared with traditional machine learning algorithms.

Ethical approval and consent to participate

We confirm that all methods were carried out in accordance with relevant guidelines and regulations. The data for this study were obtained from the database. Sample collection, research design was approved by the Ethics Committee of Zhejiang Provincial People's Hospital. We confirming that informed consent was obtained from all subjects and/or their legal guardian(s).

Statistical analysis

SPSS 25.0 and R 4.0.5 software were used for data description and statistical analysis. Categorical variables are expressed as percentages, whereas continuous variables are expressed as means or medians. Continuous variables conforming to normal distribution were analyzed using Student’s t-test and are presented as mean ± standard deviation. On the other hand, continuous variables not conforming to normal distribution were analyzed using the Mann–Whitney U test and are presented as median ± interquartile range. Categorical data were tested using the Chi-square test or Fisher’s exact test. Univariable and multivariable logistic regression analyses were performed to identify risk factors for BM, and variables with P < 0.05 in multivariable logistic regression analyses were finally included in the diagnostic model. Similarly, univariate and multivariate Cox proportional hazard regression analysis were performed to identify predictors of prognosis for BM, and variables with P < 0.05 in multivariate Cox proportional hazard regression analyses were finally incorporated into the prognostic model. P < 0.05 was considered statistically significant.

Results

Population features

A total of 283,373 BC patients were extracted from the SEER database (198,364 patients in the training cohort and 85,009 patients in the validation cohort) based on the strict inclusion and exclusion criteria, among which 3492 BCBM patients (2448 patients in the training cohort and 1044 patients in the validation cohort) were screened out (Fig. 1). Table 1 shows the demographic and pathological characteristics of BC patients. It was evident that most baseline features were not significantly different between the training and validation cohorts. The results in Table 1 show that the rate of BCBM was about 2.3%, the predominated age was 40–79 years old, and the incidence rate of white people was also much higher than that of other races. Table 2 shows the baseline characteristics of BCBM patients. It was found that the median tumor size of BCBM was obviously larger (44 mm vs 18 mm), the risk of distant metastases was significantly increased, and the proportion of Luminal A BC reached 65.7% (Table 2). All features, both in the diagnostic model and the prognostic model, were analyzed by the Pearson correlation test, and the correlation heat map proved that the variables were independent of each other (Supplementary Fig. 1).

Figure 1
figure 1

Flow chart of patient screening.

Table 1 Demographic and clinicopathological characteristics in breast cancer patients.
Table 2 Demographic and clinicopathological features in breast cancer patients with bone metastasis.

Risk and prognostic factors for BCBM

Univariable logistic regression analysis demonstrated that BC patients who underwent surgery had a significantly lower risk of developing BM, suggesting that surgery was the most prominent protective factor (OR = 0.023, 95% CI 0.021–0.024) (Table 3). There were six main risk factors for BM, including grade, T stage, N stage, brain metastasis, lung metastasis, and liver metastasis. The most salient risk factor was brain metastasis (OR = 86.763, 95% CI 70.737–106.884). Furthermore, the multivariable logistic regression analysis revealed that receiving surgery was still a strong protective factor (OR = 0.048, 95% CI 0.044–0.053), and the most statistically significant risk factors were T stage (OR = 6.137, 95% CI 5.294–7.119), N stage (OR = 6.648, 95% CI 5.810–7.603) and liver metastasis (OR = 9.341, 95% CI 8.141–10.723).

Table 3 Univariable and multivariable Logistic regression of risk factors of bone metastasis in breast cancer patients.

Table 4 shows the statistical results of univariate and multivariate Cox regression analyses. Univariate Cox regression showed that surgery (OR = 0.500, 95% CI 0.457–0.548) was a significant protective factor for the prognosis of BCBM. On the other hand, age, marital status, grade, T stage, brain metastases, lung metastases, liver metastases, and breast subtype were risk factors for the prognosis of BCBM. Moreover, the multivariate Cox regression analysis found that age, race, marital status, grade, surgery, radiotherapy, chemotherapy, brain metastases, lung metastases, liver metastases, breast subtype, ER, and PR were independent predictors of BCBM prognosis. Among them, the most prominent protective factors were surgery (OR = 0.569, 95% CI 0.517–0.628) and ER (OR = 0.482, 95% CI 0.339–0.686). In addition, advanced age, increased tumor grade, and concomitant distant metastases (brain, lung, or liver) worsened the survival outcomes of BCBM patients.

Table 4 Univariate and multivariate cox regression of prognostic factors of bone metastasis in breast cancer patients.

Predictive performance of the machine learning models for diagnosis and prognosis

Diagnostic model

Herein, six machine learning models were developed and evaluated through learning curves, AUC, PR curves, and calibration curves. With the continuous increase of learning samples, the learning ability of the model tended to be stable, and finally XGBoost stood out from all models (Fig. 2). Figure 3 shows the evaluation curves of the six models. Considering the superiority of machine learning, the AUC values of all models exceeded 0.80, and the AUC values of XGBoost reached an astonishing 0.987 and 0.940 in the training cohort and the validation cohort, respectively. However, given that the distribution of positive and negative events in the dataset was uneven, AUC alone was not sufficient to explain the performance of the model. Therefore, the PR curve was generated to make up for the inadequacy of the receiver operating characteristic (ROC) curve, thereby further evaluating the strengths and weaknesses of the model. From Fig. 3C,D, it is evident seen that the average precision of the accuracy of the XGBoost model was higher than that of other models. Finally, the calibration curve was drawn to compare the discrimination of each model, with results showing that the XGBoost model still maintained the best state. Table 5 summarizes the evaluation index values ​​of all models. Collectively, these results showed that XGBoost had the most outstanding comprehensive performance, with the highest AUC (0.987), accuracy (0.947), precision (0.948), recall (0.947), and F1-score (0.947).

Figure 2
figure 2

Learning curves of models with training data. (A) XGBoost; (B) Random Forest; (C) Decision Trees; (D) Extra Trees; (E) Gaussian NB; (F) Logistic regression.

Figure 3
figure 3

ROC curves of diagnostic models developed by training cohort (A) and validation cohort (B); PR curves of models developed by training cohort (C) and validation cohort (D); calibration curves of models developed by training cohort (E) and validation cohort (F).

Table 5 The performance of diagnostic models on six ML algorithms.

Prognostic model

Five-fold cross-validation was applied to evaluate the performance of the machine learning prediction model, and the results obtained after 5 repetitions and the average ROC curve of the different generated ROC curves were used as the evaluation metric. Based on the model we built, the XGBoost model performed the best in five-fold cross-validation with an average AUC of 0.79 (Fig. 4). It was evident that the XGBoost model displayed excellent accuracy both in the training set and validation set. Furthermore, DCA suggested that the XGBoost model exhibited a better clinical application value in both training set and validation set, and it performed significantly better than the traditional AJCC staging system (Fig. 5). In the training cohort, the XGBoost model scored the highest with an AUC of 0.880, an accuracy of 0.890, a precision of 0.870, a recall of 0.890 and a F1-score of 0.860. The XGBoost model also scored the highest in the validation cohort, with an AUC of 0.800, an accuracy of 0.880, a precision of 0.840, a recall of 0.880, and a F1-score of 0.840 (Fig. 6). Finally, a heatmap was generated to indicate the prediction effect of the distinct models (Supplementary Fig. 2).

Figure 4
figure 4

Ten-fold cross-validation results of the six machine learning models in the training group.

Figure 5
figure 5

The ROC curves of prognostic models based on machine learning in training set (A) and validation set (B). The decision curves of prognostic models based on machine learning in training cohort (C) and validation cohort (D).

Figure 6
figure 6

Prediction performance of seven models.

Characteristic importance in the machine learning models

The SHAP diagram was utilized to more intuitively express the importance of each feature of the model. According to the multivariable logistic regression and multivariate Cox proportional hazards regression results, we included 15 and 13 features in the diagnostic and prognostic models, respectively. The SHAP plot was then used to rank these important features, indicating the degree of influence of different features on diagnosis and prognosis. From Fig. 7, it is not difficult to find that the higher the SHAP value of a feature, the greater the probability of BM in BC patients. Blue indicated that the eigenvalues were small, purple indicated that the eigenvalues were close to mean value, and red indicated that the eigenvalues were large. Taking the most striking feature in the figure as an example, we found that the incidence of BM was significantly reduced in patients who underwent surgery. Figure 8 shows that surgery still remained the most important feature, with results indicating that the 5-year survival rate of patients who underwent surgery was extremely increased.

Figure 7
figure 7

Feature importance ranking by SHAP values in diagnostic model based on the XGBoost algorithm. (A) The features are sorted according to the sum of the SHAP values of all patients, and SHAP values are used to represent the distribution of the influence of each feature on the output of the XGBoost model. Red indicates that the value of the feature is higher, whereas blue indicates that the value of the feature is lower. The X-axis represents the effect of SHAP values on the output of the model. The higher the value of X-axis, the greater the likelihood of delayed mitigation. (B) The standard bar chart is drawn and sorted using the average absolute value of each feature shape value in the XGBoost model.

Figure 8
figure 8

Feature importance ranking by SHAP values in prognostic model based on XGBoost algorithm. (A) The features are sorted according to the sum of the SHAP values of all patients, and SHAP values are used to represent the distribution of the influence of each feature on the output of the XGBoost model. Red indicates that the value of the feature is higher, whereas blue indicates that the value of the feature is lower. The X-axis represents the effect of SHAP values on the output of the model. The higher the value of X-axis, the greater the likelihood of delayed mitigation. (B) The standard bar chart is drawn and sorted using the average absolute value of each feature shape value in the XGBoost model.

Discussion

According to the latest cancer statistics1, BC has replaced lung cancer as the most common cancer in the America. Consequently, BC treatment and handling the corresponding complications have brought a heavy medical burden to the society, which is a major problem associated with human cancer. In particular, the quality of life and survival rate of BC patients are significantly reduced after they develop BC. This is mainly attributed to the occurrence of skeleton-related events (SREs) after BM. Studies have revealed that the severe vertebral invasion, pathological fractures, bone pain, and other SREs pose a serious threat to the prognosis of patients with BM20,21. A previous survey found that the cumulative incidence of SREs in patients with BM is about 45.1%22. In the present study, the incidence of BM in BC patients was about 2.3%, which reflects the difficulty of diagnosing patients with BM and the harmfulness of BM. Therefore, it is necessary to effectively screen patients who are prone to develop BM and have poor prognosis after BM. Bone scan combined with CT is the gold standard for detecting BM, and is also the preferred method recommended in current guidelines23,24,25. A recent European prospective study showed that [18F] FDG PET/MRI and MRI were significantly better than CT or bone scintigraphy for the detection of BM in newly diagnosed BC patients26. However, these tests have some drawbacks, such as radiation damage and high cost, and not all patients are willing to undergo BM testing. Thus, to more effectively address these issues, this study aimed at developing two facile clinical models for early detection of high-risk BCBM patients and prediction of BCBM prognosis.

With the rapid development of artificial intelligence technology, machine learning is increasingly being applied in the field of biomedicine, and it also has great potential in future clinical practice27,28. In 2022, an article published in the journal Nature by Stephen-John Sammut et al. presented a study encompassing clinical information, pathology, genomics, and transcriptomics of 168 patients with breast cancer undergoing chemotherapy. They successfully predicted the complete response of chemotherapy patients using a multi-group machine learning approach (AUC = 0.87)29. This groundbreaking study demonstrates the significant medical value that mature machine learning models can offer in clinical practice, enabling the provision of more accurate assistance to doctors and patients through alternative methods.

Nevertheless, despite significant advancements in building and utilizing various models, there is still considerable scope for improvement. Li et al. developed a deep learning algorithm that predicts bone metastasis in breast cancer by incorporating MRI radiological features from 96 cases of metastatic breast tumors and 192 cases of non-metastatic breast tumors. The predictive performance of the model is evaluated using statistical morphology and grayscale characteristics, employing metrics such as AUC, sensitivity, and specificity30. Nonetheless, due to the high demand for front-end MRI images and data, this model cannot be widely adopted. Thio et al. integrated survival data from thousands of cancer patients with extensive bone metastasis to develop a survival prediction model31. However, the majority of the data utilized for model development and verification originates from laboratory sources, including biochemical data, blood routine data, protease data, and more. While accurately predicting the survival rate, it also imposes more stringent demands on the types of data used. Our machine learning model is specifically designed to predict the occurrence of bone metastasis in breast cancer patients and prognosticate patients with bone metastasis. All the parameters required for the model are derived from routine clinical practice, making them more accessible than specific images or laboratory data. This model can also be utilized by hospitals in remote areas or by junior clinicians to guide the comprehensive treatment planning of breast cancer patients, enabling early intervention to prevent and address potential clinical adverse events. Secondly, it employs multiple strategies, such as preventing overfitting and utilizing shrinkage and column subsampling techniques, to enhance algorithmic generalization and learning speed. The XGBoost algorithm, which has demonstrated high accuracy and ease of use in numerous studies32,33,34, is referenced in this model.

This study used univariable and multivariable logistic regression analyses to screen fifteen independent risk factors, including age, race, sex, grade, T stage, N stage, surgery, radiotherapy, chemotherapy, tumor size, brain metastasis, lung metastasis, liver metastasis, breast subtype, and PR. According to the order of importance of the SHAP diagram, the features that contributed prominently were surgery, N stage, and T stage. Next, univariate and multivariate Cox proportional hazards regression analyses were applied to screen thirteen independent prognostic factors, including age, race, marital status, grade, breast subtype, surgery, radiotherapy, chemotherapy, brain metastases, liver metastases, lung metastases, ER, and PR. All features were also ranked by importance, with results showing that surgery, liver metastases, and lung metastases were the three factors strongly associated with prognosis. However, some features that were considered meaningful in the multivariable logistic regression analysis and multivariate Cox proportional hazards regression analysis had a SHAP value of zero in importance ranking. This may further reflect the superiority of machine learning. Specifically, it can better eliminate unnecessary features unlike traditional linear regression analysis, which has the problem of overfitting. Machine learning enables us to obtain more accurate predictive models by continuously improving operational efficiency and self-improvement.

This study found that BC patients who did not undergo surgery were at high risk of developing BM. Yao et al.17 also suggested that surgery was an independent risk factor for BCBM. Despite the hazard of radiation damage, we still recommend bone scans to examine BM in unoperated BC patients. We also found that T stage and N stage were strong predictors of BM. Studies have demonstrated that the increase of T and N stages of malignant tumors indicates the increase of tumor volume, and the expansion of the degree and extent of involvement of adjacent tissues and lymph nodes, which are the manifestations of further development of malignant tumors35,36. It is well known that the TNM staging system proposed by the AJCC is a widely used prognostic system37. However, previous studies have shown that the accuracy of using the TNM staging system alone to predict metastases is not high, and thus researchers often obtain better prediction results through comprehensive analysis of multiple factors38,39. Interestingly, surgery was also the most prominent feature with regard to prediction of BCBM prognosis. Although metastatic BC remains an incurable disease, surgery to remove the primary tumor is associated with improved survival in patients with distant metastatic BC at diagnosis. One study reported that patients who underwent primary surgery had significantly longer median survival than those who did not, and primary tumor resection for primary BCBM reduced the risk of death by approximately 40%40. A randomized controlled trial conducted in Turkey found that the 3-year OS was similar in patients with and without primary BC surgery. However, at a median follow-up of 5 years, patients who underwent surgery had a prolonged median OS by approximately 9 months41. In addition, a trial conducted in India, revealed that the OS of patients with de novo metastatic BC was not improved after surgery for their primary BC42. Scholars in Europe concluded that surgical treatment of the primary tumor in patients with de novo metastatic BC could not benefit majority of them43. A retrospective study by Gong et al.44 identified surgery as an independent prognostic factor for BCBM, which is consistent with our findings. Therefore, whether the primary tumor of BCBM should be operated is still controversial, which calls for further multicenter prospective studies for verification. Liver and lung metastases play an important role in predicting the prognosis of BCBM. This study found that BCBM patients with liver metastasis or lung metastasis had a poor prognosis, and their 5-year survival rate was lower than that of other types of BCBM patients. We comprehensively considered all meaningful features to predict the prognosis of BCBM and achieved good predictive performance.

The ultimate purpose of building models is to be more convenient for clinical application and help clinicians make decisions. Consequently, based on the XGBoost algorithm, we built two accessible online websites (https://share.streamlit.io/lry4000/bone_metastasis/main) and (https://share.streamlit.io/lry4000/sc5_new/main ). Specifically, a streamlined web page structure enables users to input data more efficiently. The clinical parameters mentioned in the article are displayed on the right side of the webpage, allowing users to input corresponding clinical data based on the actual condition of the patients. The system will instantly generate the predicted probability of bone metastasis for the patient. The results can be presented in various formats and shared with a broader range of clinical participants. The second web page, which predicts the survival rate, follows a similar usage process.

There are some limitations in our study. First, this is a multicenter retrospective study involving only patients from the United States, and thus it inevitably suffers from selection bias. Therefore, there is a need for external data from other countries to validate the reproducibility of our results. Second, although our model achieved good clinical performance on the basis of SEER database, it is essential to further confirm the reliability of the model through prospective studies. Third, the SEER database does not include blood routine, biochemical indicators, and Charlson Comorbidity Index (CCI), which may lead to the model missing some important features.

Conclusion

This study introduced the XGBoost-based machine learning model, for the first time, to construct the diagnosis system and survival prediction system for BCBM patients. We sorted the importance of different features using the demographic characteristics and pathological indicators screened from the SEER database. Furthermore, ROC curves, learning curves, precision curves, calibration plots, and decision curves were used to evaluate performance of the model, and an external verification cohort was established to further verify the model. Finally, we have developed two sample and convenient network applications for helping clinicians better achieve clinical decision-making.