Background

Hip surgery has been considered as an effective way for patients with hip diseases to relieve pain, improve function and enhance the quality of life. As the geriatric population grows, there has been an increase in the prevalence of degenerative arthritis and hip fracture. The demand for primary total hip arthroplasty is expected to increase to 572,000 procedures by 2030 and the demand for hip revision surgeries is estimated to double by 2026 in the United States [1]. Perioperative blood loss is common in surgical procedures, particularly in orthopedic cases [2]. Previous findings have shown that hip replacement was the most common procedure associated with blood product transfusion [3, 4]. Some recent studies have revealed the correlation between blood product transfusion and adverse outcomes such as postoperative infection, disease transmission, prolonged duration of hospitalization, and increased morbidity [5,6,7,8,9,10,11].

Previous transfusion strategies that rely on clinical experience are widely used in hip surgery, which often lead to the overuse of blood products and unnecessary healthcare costs. Optimizing the assessment and management of blood transfusion has become an urgent medical problem. The new guidelines and strategies for perioperative blood transfusion management continue to develop, however, it has remained a challenge for surgeons and anesthesiologists [12, 13]. Although lower preoperative hemoglobin or anemia have been recognized as major predictors, the requirement for transfusion is still not predicted accurately [14,15,16].

Given the rapid development of artificial intelligence, machine learning has also expanded in medicine, such as clinical prediction [17,18,19,20,21]. Machine learning refers to algorithms that learn to perform tasks from data, explore combinations, and predict outcomes [22]. It has a better performance in handling enormous data with complex and nonlinear relationships than statistical methods [23]. Recently, there are several studies conducted to predict blood transfusion in craniofacial surgery, liver transplantation surgery, and orthopedic surgery by developing predictive models based on machine learning [24,25,26]. The above studies generally agree that machine learning algorithms have advantages in predicting the risk of blood transfusion.

To date, few studies have existed that evaluate the application of machine learning prediction models in hip surgery. Our study aimed to develop machine learning models and identify risk factors for perioperative blood transfusion in hip surgery.

Methods

Patients

This retrospective cohort study has been approved by the Institutional Review Board of the Peking Union Medical College Hospital (Ethics Approval Number: S-K1757) and informed consent was waived because of retrospective analysis. The study was conducted following the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) [27]. The subjects were patients undergoing hip surgery, from January 2013 to October 2021 in the Peking Union Medical College Hospital. In total, 2431 hip surgeries were included in this study. The flowchart with inclusion and exclusion details was displayed (Fig. 1).

Fig. 1
figure 1

The flowchart of the study

Data collection

All variables were selected based on previous studies, clinical experience, and data availability through system extraction and manual collection. Data were sourced from the electronic medical record system, the anesthesia information system, the clinical data repository, and the transfusion medicine system including demographic characteristics, preoperative laboratory tests, and surgical information (Table 1). Patient demographic characteristics included age, body mass index (BMI), sex, ASA Physical Status, hypertension, diabetes, coronary heart disease, anemia, and medications. ASA Physical Status was assessed by an anesthesiologist before surgery. Anemia was defined for men and women: preoperative hemoglobin < 120 and < 110 g/L, respectively. Medications included anticoagulant or antiplatelet history defined as receiving heparin, warfarin, factor Xa inhibitor, aspirin, or platelet P2Y12 receptor blocker within one week before surgery. Preoperative laboratory tests represented the most recent values before surgery including preoperative hemoglobin, platelet, activated partial thromboplastin time (APTT), prothrombin time (PT), D-dimer, fibrinogen, alanine aminotransferase (ALT), total bilirubin, direct bilirubin, albumin, creatinine, and urea. Surgical information included anesthesia approach, diagnosis, emergency or elective surgery, surgery type, autotransfusion, tranexamic acid use, and operation time. Autotransfusion represented intraoperative cell salvage. Osteoarthritis included primary osteoarthritis and secondary osteoarthritis. The primary outcome was perioperative RBC transfusion that referred to allogeneic RBC transfusion intraoperatively or within 72 h postoperatively. In accordance with the restrictive strategy recommended in guidelines, the transfusion threshold in our institution are: (1) hemoglobin concentration < 80 g/L; (2) hemoglobin concentration < 100 g/L for those with preexisting cardiovascular disease or obvious clinical symptoms [28].

Table 1 Patient characteristics of all the data

Data preprocessing

Patients with missing data were eliminated. Missing data were defined as any unknown details for demographic characteristics, preoperative laboratory tests, surgical information, or perioperative blood transfusion. Standardization was performed using the StandardScaler in the continuous variables. Categorical variables were converted to 0 and 1 as input for machine learning models.

Model training

The first 70% of the data were divided for model training and the latter 30% were divided for model testing based on the time of surgery. The training set was used to construct models, and the testing set was used to evaluate model performances. A univariate analysis was performed in the training set. Only those positive variables that were with a P value < 0.05 were considered in the prediction models. Fourteen machine learning models were developed using preoperative variables to predict perioperative blood transfusion, including logistic regression, Ridge Classifier, Random Forest Classifier, Gradient Boosting Classifier, CatBoost Classifier, Ada Boost Classifier, Naive Bayes, SVM-Linear Kernel, Extra Trees Classifier, Light Gradient Boosting Machine, Linear Discriminant Analysis, K Neighbors Classifier, Extreme Gradient Boosting, and Decision Tree Classifier.

For the logistic regression analysis, the step-forward selection was used to identify the most important variables for predicting the outcome. The goal of step-forward selection is to iteratively add variables to the model, starting from an empty model, based on their performances in improving the model’s fit. For machine learning models, all models were developed with hyperparameter tuning through tenfold cross-validation on the training set to optimize performances.

Model evaluation and explanation

After training in the training set, all the models were evaluated in the testing set. Model performance was compared using metrics of discrimination and calibration. Discrimination was assessed by the area under the receiver operating characteristic curve (AUC). Additionally, the accuracy, recall, precision, and F1 score of models were also assessed. Calibration was measured by the Brier score [29]. The best-performing machine learning model was decided based on the combination of the highest AUC and the lowest Brier score. Decision curve analysis was developed to evaluate the clinical utility of the best-performing model by calculating the net benefit at different threshold probabilities [30].

We used the SHAP values to perform global variables importance analysis, which has been widely used for machine learning model interpretation [31]. The interpretation was based on the SHAP value of each variable, indicating the impact of variables on the prediction. The SHAP summary plot provided the associations between variables and model predictions. We could visualize the relative contribution of each variable and understand how they affected the model output. The contribution of variables was quantified by SHAP values and displayed from high to low values. A positive SHAP value was associated with higher risk of transfusion and a negative one was related to decreased risk of transfusion.

Statistical analysis

All analyses were performed through Python 3.7 with sklearn, pycaret, statsmodels, numpy, pandas, seaborn, matplotlib, and shap packages. Continuous variables were described as median with interquartile range and compared by the Mann-Whitney U test. Categorical variables were represented as frequency with percentage and compared by the chi-square test. A value of P < 0.05 was considered significant.

Results

Patient characteristics

Overall, a total of 2431 hip surgeries were enrolled for analysis (Table 1). 614 (25.3%) hip surgeries received perioperative blood transfusion, including 303 (12.5%) hip surgeries received intraoperative blood transfusion, and 224 (9.2%) hip surgeries received blood transfusion within 72 h after surgery. 87 (3.6%) hip surgeries received intraoperative and postoperative blood transfusion within 72 h after surgery. The average intraoperative blood loss was 1044.7 ± 705.1 ml. All data was divided into training (n = 1701) and test (n = 730) sets. Perioperative blood transfusion was observed in 458 (26.9%) hip surgeries of the training set and 156 (21.4%) hip surgeries of the testing set. The average intraoperative blood loss was 1114.1 ± 737.6 ml in the training set and 882.9 ± 592.5 ml in the testing set.

Univariate analysis

In the univariate analysis of the training set, the transfusion group and non-transfusion group differed in BMI, ASA Physical Status > 2, anemia, femoral head necrosis, developmental dysplasia of the hip, osteoarthritis, rheumatoid arthritis, hemophilic arthritis, ankylosing spondylitis, osteoporosis, hip stiffness, total hip arthroplasty, artificial femoral head replacement, revision surgery, debridement, autotransfusion, operation time, hemoglobin, PT, APTT, D-dimer, fibrinogen, ALT, total bilirubin, albumin and creatinine (Table 2).

Table 2 Univariate analysis of variables in the training set

Multivariate logistic regression analysis

Multivariate logistic regression analysis demonstrated that the following variables were independent risk factors for perioperative blood transfusion: ASA Physical Status > 2 (OR, 1.91; 95%CI, 1.32 to 2.77), autotransfusion (OR, 1.45; 95%CI, 1.02 to 2.06) and longer operation time (OR, 1.02; 95%CI, 1.01 to 1.02) (Table 3). Femoral head necrosis (OR, 0.55; 95%CI, 0.41 to 0.74), osteoarthritis (OR, 0.59; 95%CI, 0.43 to 0.80), and higher preoperative hemoglobin (OR, 0.95; 95%CI, 0.94 to 0.96) were associated with decreased transfusion risk (Table 3).

Table 3 Multivariable logistic regression analysis of variables in the training set

Performance of machine learning models

The testing set of 730 hip surgeries was used to evaluate the predictive abilities of machine learning models. The Ridge Classifier demonstrated the best performance with the highest AUC of 0.85 (95% CI, 0.81 to 0.88) and the lowest Brier score of 0.21. The receiver operating characteristic curve, the precision-recall curve, and the calibration curve of the Ridge Classifier in the testing set were displayed in Fig. 2a and b, and Fig. 3a. The comparison of accuracy, recall, precision, F1 score, and Brier score among all models was also shown in Table 4. The decision curve analysis in the testing set suggested the Ridge Classifier achieved good net benefit for the prediction of perioperative blood transfusion (Fig. 3b).

Fig. 2
figure 2

a The receiver operating characteristic curve of the Ridge Classifier in the testing set. b The precision-recall curve of the Ridge Classifier in the testing set

Fig. 3
figure 3

a The calibration curve of the Ridge Classifier in the testing set. b The decision curve analysis of the Ridge Classifier in the testing set

Table 4 Discrimination and calibration metrics of machine learning models in the testing set

Further model interpretation was implemented using the SHAP values for the Ridge Classifier. For the global variable importance analysis, the SHAP summary plot showed the top 10 most relevant variables (Figs. 4 and 5). Operation time and preoperative hemoglobin had the greatest average effect on the model prediction. Femoral head necrosis, ASA Physical Status > 2, osteoarthritis, total hip arthroplasty, anemia, autotransfusion, preoperative fibrinogen, and preoperative albumin had the lower average effect. We found that patients with longer operation time, lower preoperative hemoglobin, ASA Physical Status > 2, total hip arthroplasty, anemia, autotransfusion, lower preoperative fibrinogen, and lower preoperative albumin were significantly associated with increased transfusion risk.

Fig. 4
figure 4

Summary plot for the importance analysis of the top 10 variables in the Ridge Classifier. Each variable was made up of individual dots, each of which was the SHAP value of a sample. Variables with wide distribution indicated strong contribution to model predictions. For continuous variables, red color represented high values and blue represented low values of variables. For categorical variables, the red color represented the presence of variables and the blue color represented the absence of variables. A positive SHAP value (SHAP value greater than 0) was associated with increased transfusion risk and a negative one (SHAP value less than 0) was related to decreased transfusion risk. Abbreviations: ASA, American Society of Anesthesiologists

Fig. 5
figure 5

Summary plot for global average impact of the top 10 variables in the Ridge Classifier. It was shown using the average absolute value of all SHAP values of each variable. Abbreviations: ASA, American Society of Anesthesiologists

Discussion

To optimize the utilization of blood products, improve outcomes and reduce healthcare costs, it is necessary to highlight the importance of preoperative evaluation and prediction for perioperative blood transfusion. Recent advancements in artificial intelligence are changing perioperative medicine for risk stratification, intraoperative monitoring, and intensive care management [32]. Machine learning has been regarded as a useful tool to process enormous data and accelerate the development of clinical prediction. To our knowledge, it is the first study to develop and test machine learning models to predict perioperative blood transfusion in hip surgery.

Although the calibration performances of models were below expectation, most algorithms showed good discrimination (AUC > 0.8) [29]. The Ridge Classifier had the best comprehensive performance and provided good net benefit for predicting perioperative blood transfusion. Unlike most previous studies, we focused on perioperative blood transfusion rather than intraoperative or postoperative transfusion, which decreased complexity and inconvenience of routine practice [33, 34]. Moreover, existing prediction tools have typically specialized in modeling a limited subset of surgeries or population and did not incorporate diagnoses to develop models [16, 26]. Our goal was to provide prediction models with generalizable application across a variety of procedures and diagnoses in hip surgery. Therefore, the performance might be not good as models for a single set of surgeries. Additionally, the model performances were assessed and compared by discrimination and calibration, accompanying with the accuracy, recall, precision, and F1 score of models, which greatly improved the reliability of results. Furthermore, although some studies have identified the risk factors of blood transfusion, the contribution of features to the transfusion risk was not characterized [16, 35]. This study extended interpretable visualization techniques to assist practitioners in early identification of important factors and potential interventions.

The variables utilized in the models were comprehensive and accessible in the hospital database. We combined patient demographic characteristics, preoperative laboratory tests, and surgical information to predict the likelihood of perioperative blood transfusion. Compared to previous studies, we included a large number of surgery-related variables to explore the association between them and the risk of perioperative blood transfusion. Consistent with previous studies, the machine learning algorithms outperformed logistic regression in our study and identify more risk factors [16, 36]. Machine learning models could extract information from large amounts of data and multiple variables, capture complex no-linear relationships and provide new hypotheses for clinical diagnosis and treatment. By exploring these relevant risk factors for perioperative blood transfusion in hip surgery, our findings could assist surgeons to identify high-risk patients and contribute to decision-making, including preoperative optimization, intraoperative monitoring tools, and postoperative management.

One of the strengths of our study was the visualization of machine learning model interpretation. Many machine learning algorithms produce models without providing the correlation between variables and outcomes. Understanding how machine learning models make predictions contributes to overcoming the drawback of the “black box” models and improving trust in machine learning for physicians [37, 38]. The risk factors screened by the Ridge classifier were consistent with clinical practice and previous literature. In agreement with previous studies, lower preoperative hemoglobin and higher ASA Physical Status have been regarded as important risk factors for transfusion in this study [16, 39, 40]. Additionally, anemia was a strong factor that increased the risk of transfusion. The prevalence of anemia in our cohort could reach 14.2%. Studies have shown that preoperative anemia was related to increased risk of blood transfusion [14, 41]. Evidence from a previous study demonstrated that the treatment of preoperative anemia was associated with decreased perioperative blood transfusion in patients undergoing elective orthopedic and gynecologic surgery [42]. Lower preoperative albumin has been identified as a risk factor for blood transfusion [43, 44]. The serum albumin level was a widely used marker of malnutrition, which has been demonstrated to be associated with postoperative complications and mortality following orthopedic procedures [45, 46]. Patients with lower albumin were more likely to be frail and in poor health status, which may increase transfusion requirements. As modifiable preoperative risk factors, patients may benefit from preoperative interventions to correct anemia and lower albumin level to avoid unnecessary blood transfusion. Preoperative fibrinogen was an important indicator of coagulation function. We found that lower preoperative fibrinogen was associated with increased transfusion risk, which has been examined in spine surgery [47].

As for surgery-specific variables, it was well-accepted that patients with longer operation time were at significantly increased risk of blood transfusion. Our study was consistent with past findings [16, 39]. Moreover, our data showed that the correlation between diagnosis and blood transfusion risk was important, which has been reported in previous studies. A study suggested that hidden blood loss after total hip arthroplasty significantly differed in patients with different diagnoses [48]. They found that blood loss in patients with nonunion of femoral neck fracture was increased in comparison with patients with osteoarthritis, avascular necrosis of the femoral head, and developmental dysplasia of the hip. Evidence from a previous study showed that patients who had ankylosing spondylitis with total bony ankylosis of the hips suffered more blood loss and blood transfusion than patients with hip osteoarthritis [49]. Another study reported that patients with rheumatoid arthritis appeared to have a significantly higher incidence of preoperative anemia and blood transfusion than osteoarthritis patients [50]. However, the types of ankylosing spondylitis and femoral neck fracture were not distinguished in our study. A possible reason for patients with femoral neck necrosis and osteoarthritis having the low transfusion rate may be because of the low difficulty and short duration of surgical procedures. Additionally, patients with osteoarthritis and femoral head necrosis were relatively common in our cohort. Surgeons were more experienced in surgical procedures, which may lead to less intraoperative blood loss. This may be another underlying reason for the results. Furthermore, it was necessary to note that machine learning algorithms could provide the correlation between variables and outcomes, but could not prove the causality. More investigations in large populations are needed to explore the relationship between diagnosis and blood transfusion risk. In terms of surgery types, total hip arthroplasty was regarded as a significant risk factor for blood transfusion. As a common and effective way for patients with hip disease to relieve pain and improve function, the demand for total hip arthroplasty has been rising in recent years [1]. Consistent with previous findings, the data from our study showed that patients with total hip arthroplasty were at the high risk of perioperative blood transfusion [3, 4]. These surgery-related risk factors should be also considered when preparing preoperative plans. Although some of them were not modifiable, effective interventions and strategies should be taken to reduce perioperative blood transfusion and avoid unnecessary health costs.

Recently, some attempts have been adopted to reduce blood transfusion in orthopedic surgery, such as the use of tranexamic acid and autotransfusion. Although several studies demonstrated the association between the use of tranexamic acid and decreased blood transfusion in hip arthroplasty, we did not find any significant difference in our cohort of hip surgery [51, 52]. The reason for this may be that tranexamic acid has been widely used in our institution and the population not using tranexamic acid was too small to detect the difference in the transfusion rates. Previous findings suggested that intraoperative autotransfusion was associated with reduced blood loss and transfusion [53, 54]. However, autotransfusion has been regarded as a risk factor for blood transfusion in our cohort, which may be attributed to the limited samples of autotransfusion in our cohort. Patients selected for autotransfusion were usually the ones who potentially would be at high risk of blood loss and they were more likely to receive perioperative blood transfusion. Although autotransfusion was applied in these patients, it could not decrease the need for allogeneic blood transfusion. For these patients, in addition to autotransfusion, other preoperative interventions should be taken to minimize allogenic blood transfusion.

There were several limitations in our study. First, this study was based on a single institution and lacked external validation. Second, there were some possible bias due to the retrospective nature of this study. Third, patients with missing data were eliminated, which may constrain the results of the analysis. Fourth, as for comorbidities, only hypertension, diabetes, and coronary artery disease were included, limiting further exploration of comorbidities on perioperative blood transfusion. Fifth, the period of data was relatively large and there may be some unknown time-related effects. Finally, it is necessary to acknowledge that the odds of transfusion could not be represented based on present analysis, which is the common limitation in the studies of prediction models [16, 33, 34].

Conclusions

We developed and tested machine learning models with excellent discriminative ability to predict perioperative blood transfusion in hip surgery, which may allow surgeons to support decision-making, improve clinical outcomes and reduce healthcare costs. Identification of risk factors provides opportunities for accurate evaluations and prompt interventions. Further studies are still needed to validate the models in large cohorts and expand clinical implementation.