Background

As the most common malignancy in women, breast cancer (BC) accounts for 30% of all cancers in females [1]. While its mortality rate ranks fourth, the number of new deaths is increasing most significantly [2]. This may be attributed to a continual decline in fertility rates and an increase in body weight [3]. In China, there are over 410,000 new cases of breast cancer and over 110,000 related deaths annually [4]. In recent years, the incidence of distant metastasis recurrence in BC patients remains high, serving as an adverse prognostic indicator.

For distant metastasis, the bone is the most common site, with over 60% of BC patients experiencing bone metastasis [5]. In Western countries, approximately 3.5–10% of all newly diagnosed breast cancer patients are diagnosed with distant metastasis [6]. For initially diagnosed metastatic breast cancer, this implies fewer treatment options and shorter survival times. Stage IV breast cancer is also a significant public health concern in many developed countries, exacerbated in resource-poor areas due to a lack of screening programs and early detection methods, resulting in many patients presenting with metastases at diagnosis [7]. Furthermore, while the 5-year overall survival rate for BC patients without metastasis exceeds 80% [8], distant metastasis significantly reduces this rate to only around 25% [9]. Notably, the 5-year overall survival rate for bone metastasis (BM) is even lower, at only about 22.8% [10]. Therefore, early identification of breast cancer bone metastasis has become a crucial issue that clinicians must address.

Currently, the identification and diagnosis of bone metastasis primarily rely on imaging techniques such as X-rays, bone scintigraphy, computed tomography, magnetic resonance imaging (MRI), and positron emission tomography-computed tomography. Among these, X-ray examination is the most widely used and cost-effective method in China. However, despite its high specificity, X-rays have low sensitivity, making it difficult to detect early metastatic lesions [11]. Moreover, other imaging tests suffer from unequal distribution of medical resources, equipment limitations, and high costs. Even in some developed regions, over-testing may occur without prior evaluation, leading to prolonged average hospital stays and increased hospital costs.

For clinicians, treating diseases requires individualization, advocating for precision medicine. Currently, precision medicine has evolved around four concepts: predictiveness, personalization, prevention, and participation [12]. Big data analysis techniques are becoming essential in clinical practice [13], indicating the need to utilize advanced technology to analyze vast amounts of medical data and provide recommendations for individualized treatment. Many studies have used machine learning techniques to investigate clinical risk factors associated with cancer metastasis to achieve early detection [14,15,16]. In recent years, several breast cancer bone metastasis (BCBM) prediction models have been developed using factors such as age, gender, race, treatment, and grade as predictive factors [17, 18]. However, these models still have specific areas for improvement in practicality and accuracy. This study aims to establish a more accurate clinical model, incorporating as many effective variables as possible.

Regarding model development, although nomograms are currently the most commonly used predictive models, machine learning is increasingly favored by medical professionals for its practicality, innovation, and accuracy. This study is based on common inpatient laboratory indicators in the real world, requiring no related pathological examination or imaging assessment, thus reducing the threshold for model establishment. Through horizontal comparison of multiple indicators, a reliable BCBM prediction model has been developed.

Ultimately, our goal is to stratify the risk of bone metastasis in breast cancer patients, assisting clinicians, especially primary breast specialists, in making decisions to alleviate unnecessary medical burdens on patients and greatly improve their quality of life.

Materials and methods

Patient population

This retrospective study included data from two medical centers, approved by the institutional review boards of both centers. Inclusion criteria were as follows: (1) clear diagnosis of primary breast cancer with de novo bone metastasis; (2) completion of clinical blood biomarker testing before treatment (radiotherapy or chemotherapy) or surgical resection; (3) no history of hypertension, diabetes, or hyperlipidemia; (4) no history of abnormal blood indicators related to liver, kidney, or cardiovascular function; (5) no history of other diseases. Exclusion criteria were as follows: (1) occurrence of distant metastasis after treatment (surgical resection or chemotherapy); (2) incomplete clinical blood biomarker data, including tumor markers (Alpha-fetoprotein (AFP), Carcinoembryonic Antigen (CEA), Cancer Antigen 125 (CA125), Cancer Antigen 153 (CA153), and Cancer Antigen 199 (CA199)), liver function tests, kidney function tests, lipid profile, or cardiovascular function tests; (3) age less than 18 years old; (4) occurrence of metastasis in sites other than bones.

The study involved breast cancer cases from two research centers. One center included 176 cases, randomly divided at an 8:2 ratio into training (123 cases) and test (53 cases) cohorts. Another center provided 63 cases as an external validation cohort. The internal validation cohort (test cohort) consisted of data from the same medical center as the training cohort, characterized by similar clinical treatment processes and data collection standards, which facilitated the evaluation of the model’s robustness and performance in similar clinical environments. The external validation cohort came from a geographically proximate but different medical center, validating the model’s generalizability across different institutions and patient populations. The purpose of selecting these two validation cohorts was to comprehensively assess the reliability and applicability of the model under diverse conditions. The distribution details of the study are provided in Table 1. The workflow of the model in this study is illustrated in Fig. 1.

Table 1 Clinical blood markers in the training, test, and Test1 cohorts
Fig. 1
figure 1

The workflow of LightGBM model in this study

Feature extraction and selection

The features included from clinical blood biomarkers comprised tumor markers (AFP; CEA; CA125; CA153; CA199), liver function indicators (total bilirubin, direct bilirubin, indirect bilirubin, total protein, albumin, globulin, albumin-globulin ratio, gamma-glutamyl transferase, prealbumin, aspartate transaminase (AST), alanine transaminase (ALT), AST/ALT ratio, alkaline phosphatase, cholinesterase, and total bile acid), kidney function indicators (urea, creatinine, uric acid, blood bicarbonate concentration, cystatin C, potassium ion, sodium ion, chloride ion, calcium ion, and inorganic phosphorus), lipid profile (total cholesterol, triglycerides, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, apolipoprotein A1, apolipoprotein B, A1/B ratio, and lipoprotein (a)), and cardiovascular function indicators (creatine kinase, creatine kinase isoenzyme MB (CK-MB), lactate dehydrogenase (LDH), and α-Hydroxybutyrate dehydrogenase (α-HBDH)).

All extracted features underwent the following operations: first, Z-Score standardization (mean = 0, standard deviation = 1) was applied to normalize each feature, preprocessing the data to fit a standard normal distribution. Then, statistical analysis was conducted using Spearman rank correlation coefficient (ρ) to measure the correlation between two variables. ρ is a non-parametric statistical measure of the strength of a monotonic relationship between two variables. When ρ approaches 1 or -1, it indicates a strong correlation between the variables. We chose ρ > 0.9 as the threshold for high correlation. High correlation means that the variables exhibit very consistent trends, which can lead to multicollinearity issues. Highly correlated features can introduce redundant information, increase model complexity, and affect the stability and interpretability of the model. When the Spearman correlation coefficient between features was > 0.9, one of the features was retained, as keeping only one variable with a correlation coefficient greater than 0.9 helps reduce redundancy and improve the model’s generalizability.

Finally, feature dimension reduction was conducted using L1 regularization of the Least Absolute Shrinkage and Selection Operator (LASSO) regression. The LASSO method penalizes the absolute values of regression coefficients, thereby inducing some coefficients to be zero, facilitating feature selection and generating a sparse model. In LASSO regression, the choice of lambda (λ) is critical as it controls the strength of the penalty applied to regression coefficients. A higher lambda increases the penalty, leading more coefficients to shrink to zero, simplifying the model but posing a risk of underfitting. Conversely, a lower lambda reduces the penalty, potentially including more features but risking overfitting to the training data. Our 10-fold cross-validation process helped identify a lambda value that generalizes well to unseen data. We selected the lambda parameter by performing 10-fold cross-validation on the training set, choosing the value that minimized mean squared error. This approach ensures an optimal balance between model complexity and predictive performance, aiding in preventing overfitting.

Development and validation of models

In this study, the LightGBM machine learning algorithm was employed to construct models for breast cancer with and without bone metastasis as binary outcome variables, using the selected features for dimension reduction. Model construction was completed based on 5-fold cross-validation in the training set. After model construction, validation was conducted in both internal and external testing cohorts. Model performance was evaluated using metrics such as the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value, and negative predictive value. Subsequently, decision curve analysis (DCA) was performed to reflect the net benefit at different threshold probabilities in the training and internal and external validation cohorts, evaluating the clinical efficiency of the model.

Statistical analysis

Clinical baseline features were analyzed using t-tests, chi-square tests, or Fisher’s exact tests with SPSS software (version 25.0, IBM). The t-test was used for continuous variables with homogeneity of variance, represented as x ± s, while the chi-square test or Fisher’s exact test was used for categorical variables, represented as ratios. A two-tailed p-value < 0.05 indicated statistical significance. Spearman rank correlation tests, z-score normalization, univariate regression analysis, multivariate regression analysis, output of feature importance for LightGBM models, and LASSO regression analysis were performed using Python software (version 3.7.17; http://www.python.org). ROC curves and clinical decision curves were also plotted.

Results

Patient characteristics

This study involved a total of 239 female breast cancer patients from two research centers. One center contributed 123 cases to the training cohort, 53 cases to the testing cohort, and the other center provided 63 cases for the test1 cohort. In the baseline characteristic analysis of the study population, statistically significant differences were observed in one, two, or three cohorts for various blood biomarkers including CEA, CA153, total bilirubin, direct bilirubin, indirect bilirubin, albumin, globulin, albumin/globulin ratio, gamma-glutamyl transferase, total bile acid, prealbumin, aspartate transaminase, alanine transaminase, aspartate/alanine ratio, alkaline phosphatase, magnesium ion, creatine kinase, LDH, α-HBDH, total cholesterol, apolipoprotein A1, apolipoprotein B, and lipoprotein a. A summary of patient clinical blood biomarker features is presented in Table 1.

Feature selection

Feature data were normalized, and features with a Spearman correlation coefficient > 0.9 were retained. The heatmap illustrating the correlation analysis of features is shown in Supplementary Fig. 1. Dimension reduction was performed by eliminating features with zero coefficients using LASSO regression. The optimal λ value was determined based on the minimum mean squared error, and the Lasso regression model was fitted accordingly (Fig. 2a). After feature dimension reduction, 15 features were selected for each cohort (Fig. 2b).

Fig. 2
figure 2

Illustrates the process of feature selection using the least absolute shrinkage and selection operator (LASSO) regression model. (a) LASSO coefficients for different λ values, where vertical dashed lines indicate the number of features corresponding to the optimal λ value. (b) After feature selection using LASSO regression, the nonzero coefficient features are showed

Model construction and validation

The LightGBM machine learning algorithm was utilized to construct predictive models for breast cancer bone metastasis using the selected features. The ROC curve results of the LightGBM model are shown in Fig. 3a. The ROC of the LightGBM model in the training, test, and test1 cohorts were 0.945 (95% CI 0.910–0.981), 0.892 (95% CI 0.813–0.971), and 0.908 (95% CI 0.836–0.980), respectively. The ROC of the combined model in the training, test, and test1 cohorts were 0.955 (95% CI 0.934–0.976), 0.835 (95% CI 0.739–0.931), and 0.918 (95% CI 0.856–0.981), respectively. Other performance parameters are presented in Table 2.

Fig. 3
figure 3

Evaluation of Receiver Operating Characteristic curves for the LightGBM models constructed in both the training (a), test (b) and test1 (c) cohorts were presented

Table 2 Performance of models for predicting discrimination between breast cancer with bone metastasis and breast cancer without bone metastasis in training, test, and test1 cohorts

The DCA curves of the LightGBM model in the training, test, and test1 cohorts are shown in Fig. 4. The results indicate that the LightGBM model demonstrates good net benefits in identifying breast cancer bone metastasis across all three cohorts.

Fig. 4
figure 4

Clinical decision curves analysis (DCA) for the LightGBM models constructed in the training (a), test (b), and test1 (c) cohorts were demonstrated

Feature importance analysis and logistic regression analysis

To identify features crucial for predicting bone metastasis in the LightGBM model, feature importance analysis was conducted, as shown in Fig. 5a. The top 5 features with relatively high impact on the labels in the LightGBM model were CEA, creatine kinase, albumin/globulin ratio, apolipoprotein B, and CA153. Univariate and multivariate regression analyses were performed on the features involved in the model, with odds ratios and p-values displayed in Fig. 5b and c. In the univariate regression analysis, p-values of albumin-globulin ratio, total cholesterol, lipoprotein a, CA153, gamma-glutamyl transferase, α-HBDH, alkaline phosphatase, and creatine kinase were < 0.05, suggesting potential associations with breast cancer metastasis. Among these, lipoprotein a, CA153, gamma-glutamyl transferase, α-HBDH, alkaline phosphatase, and creatine kinase were positively correlated, while white blood cell count and total cholesterol were negatively correlated.

Fig. 5
figure 5

Feature importance analysis of LightGBM model (a) and univariate (b) and multivariate (c) logistic regression analysis of variables (features) involved in LightGBM model

In the multivariate analysis, albumin-globulin ratio and total cholesterol had p-values < 0.05 and were negatively correlated. CK-MB, CA153, and alkaline phosphatase were positively correlated.

Discussion

In this study, we utilized the LightGBM algorithm to construct a predictive model for identifying breast cancer patients with bone metastasis based on relatively easily accessible clinical blood biomarker features. The model demonstrated favorable performance in both internal and external testing cohorts. Our predictive model effectively distinguished breast cancer patients with bone metastasis from those without, providing clinicians with additional evidence to facilitate more efficient triage management in breast cancer diagnosis and treatment.

Most previous studies on predicting breast cancer distant metastasis have focused on assessing the risk of metastasis occurrence. Delpech et al. developed and validated nomograms for predicting bone metastasis in early-stage breast cancer patients based on clinical and pathological variables, with C-indexes of 0.69 and 0.73 in the training and validation cohorts, respectively [19]. Similarly, Xu et al. constructed nomograms for predicting bone metastasis in breast cancer patients based on clinical and pathological variables, with C-indexes of 0.714 and 0.705 in the training and validation cohorts, respectively [20]. Zhang et al. incorporated MRI and ultrasound features into prognostic nomograms for predicting distant metastasis in breast cancer, achieving C-indexes of 0.882 and 0.812 in the training and validation cohorts, respectively [21]. Additionally, Wang et al. utilized gene expression data from the National Center for Biotechnology Information Gene Expression Omnibus to construct prognostic nomograms for predicting lung metastasis risk in breast cancer, achieving C-indexes of 0.862 and 0.772 in the training and validation cohorts, respectively [22].

However, fewer predictive models have been developed specifically for diagnosing breast cancer distant metastasis. Wen-Cai et al. developed a web-based predictor using the XGBoost model to forecast the risk of bone metastasis in breast invasive ductal carcinoma patients based on factors such as diagnostic age, race, gender, grade, T/N staging, breast subtype, and marital status. The XGBoost model exhibited the best predictive performance among six different machine learning algorithms, with an AUC of 0.888, accuracy of 0.803, sensitivity of 0.801, and specificity of 0.837 [23]. Similarly, based on the Surveillance, Epidemiology, and End Results database, Xuguang et al. constructed diagnostic and prognostic models for breast cancer bone metastasis using the XGBoost algorithm, which achieved the highest accuracy (diagnostic model AUC = 0.98; prognostic model AUC = 0.88) [24]. However, these models often lack commonly available clinical indicators such as blood routine and biochemical parameters, which may limit their real-world applicability and require further validation.

This study represents the first attempt to construct a diagnostic predictive model for breast cancer bone metastasis using relatively easily accessible clinical blood biomarkers reflecting heart, liver, and kidney function. These biomarkers are typically part of routine admission tests for patients, providing real-time physiological information and offering cost-effective and easy-to-operate advantages compared to pathological examinations, imaging studies, or genetic tests. Additionally, our model underwent external validation at another research center and demonstrated satisfactory performance, with an AUC of 0.908. This external validation not only enhanced the credibility of our research findings but also demonstrated the model’s robustness and generalizability across different datasets.

In contrast to the relatively high-performing XGBoost model [25], this study employed the LightGBM machine learning algorithm. LightGBM exhibited greater flexibility and efficiency in feature processing and model construction, capable of handling complex nonlinear relationships better, thereby enhancing the model’s predictive accuracy and generalization capability. Despite achieving an AUC of over 0.9 in predicting breast cancer bone metastasis in our study, direct comparison of these AUC values with those of other models is not appropriate due to differences in variables and machine learning algorithms used. This diagnostic predictive model based on clinical blood biomarkers offers a novel and cost-effective approach for early detection of breast cancer bone metastasis. It not only contributes to improving personalized treatment management for breast cancer patients but also enhances the accuracy and efficiency of early intervention in clinical practice. Future research could further expand sample sizes and conduct multicenter validations to further verify the model’s robustness and broad applicability, thereby advancing its clinical implementation.

CK-MB was identified as one of the most important features in the LightGBM model prediction. As a creatine kinase isoenzyme, CK-MB exists mainly in the myocardium and skeletal muscle and has been found to be elevated in the serum of late-stage cancer patients compared to early-stage patients [26]. Previous studies have shown that serum CK-MB activity is significantly higher in patients with metastatic tumors compared to primary tumors [27]. However, further research is needed to elucidate why CK-MB elevation occurs in breast cancer patients with distant metastasis [26] and whether the elevated CK-MB originates from tumors or other sources [28]. α-HBDH, another important feature in our model, is an LDH isoenzyme that has been associated with prognosis in various malignant tumors [29,30,31]. In early breast cancer diagnosis, α-HBDH, CEA, and CA125 have been shown to have certain value when used in combination [32]. CA153, a common tumor marker, has predictive capabilities for breast cancer distant metastasis and was also identified as an important feature in our model [33].

Although we successfully constructed a predictive model for breast cancer bone metastasis using clinical blood biomarkers and demonstrated good predictive performance and external validation results, we still face several potential limitations and challenges. Firstly, we only focused on the most common type of breast cancer distant metastasis—bone metastasis. Thus, we did not consider other types of distant metastasis such as brain metastasis and post-treatment breast cancer metastasis [34]. Secondly, although our external validation set originates from different medical centers within the same geographical region, these data still have limitations. Similar patient demographics and treatment protocols may restrict the model’s generalizability globally. Future research should incorporate more extensive multi-center, geographically diverse external validation sets to further validate the model’s performance across diverse populations and enhance its generalizability and reliability. Additionally, while we selected relatively accessible clinical blood biomarkers as input variables for the predictive model, the specificity and sensitivity of these biomarkers may not fully cover all complex scenarios of breast cancer bone metastasis. In clinical practice, it may be necessary to combine more biomarkers or other clinical features to further optimize the model’s predictive ability. Furthermore, although the LightGBM algorithm performs well in handling complex nonlinear relationships, its sensitivity to data quality and feature selection needs attention. The quality of data, standardization, and feature selection significantly impact model performance. Future research needs to further optimize these aspects to enhance the stability and reliability of the model. Lastly, with advances in technology and medical research, new biomarkers and technologies continue to emerge, which may pose new challenges and opportunities for the construction and application of existing models. Therefore, continuous technological innovation and data updates are crucial for the ongoing optimization and widespread application of the model.

Conclusions

In conclusion, this study successfully developed and validated artificial intelligence clinical models and comprehensive models for predicting breast cancer bone metastasis based on clinical blood biomarkers. Particularly, the LightGBM model exhibited high accuracy and potential clinical utility in predicting and identifying breast cancer bone metastasis. In China’s healthcare system, patients with advanced cancer stages are often referred to economically developed regions for treatment, while underdeveloped regions may experience delayed diagnosis due to a lack of early cancer screening. Therefore, the model has the potential to mitigate disease misdiagnosis caused by a lack of imaging technology in underdeveloped regions and improve the clinical decision-making skills of primary care physicians, thereby providing patients with more timely treatment. Similarly, in developed regions, the model can reduce the demand for expensive or invasive imaging techniques. This study highlights the prospect of using easily accessible clinical blood biomarkers for developing artificial intelligence predictive tools.