Introduction

According to the United Nations 2022 Revision of World Population Prospects, the proportion of people over 65 years of age is expected to increase from approximately 9.7% in 2022 to 16.4% in 2050 [1]. The burden of lumbar degenerative diseases (LDD) has increased substantially with the unprecedented aging population. From 2004 to 2015, the volume of elective lumbar fusion procedures for LDD among those over age 65 in the USA increased by 138%, and the costs for elective lumbar fusion increased from $3.7 billion dollars in 2004 to $10.2 billion dollars in 2015 [2]. Transforaminal lumbar interbody fusion (TLIF) was first reported in the early 1980s as a modification of posterior LIF and has become a commonly used surgical procedure for nerve decompression and bone stabilization with excellent and reliable outcomes [3, 4]. Postoperative adverse events (AEs) following lumbar fusion surgery include complications, prolonged hospital stay, and readmission, which increase hospitalization-related expenditures and postoperative dissatisfaction [5, 6]. In previous studies, elderly patients (aged 65 years and older) had more extended hospital stays and about twice the complication rate of younger patients [7, 8]. Identifying elderly patients with high risk of AEs and establishing individualized perioperative management is critical to mitigate added costs and optimize cost-effectiveness to the healthcare system.

Many independent variables are associated with postoperative AEs following lumbar fusion surgery. Variables associated with increased length of stay (LOS) include increased age, morbid obesity, diabetes, opioid use, greater number of comorbid conditions, unemployment, drain use, and blood transfusion [9,10,11]. Older and non-married patients, those with obesity, positive smoking history, longer procedure times, and emergent cases were significantly more likely to be readmitted for complications or physical rehabilitation [9, 10, 12]. Variables associated with complications include cerebrovascular disease, electrolyte disorders, hemi/paraplegia, mass blood loss, and postoperative delayed ambulation [13,14,15,16]. Predictive models are more likely to provide individualized expectations for postoperative outcomes during preoperative consultation than risk factor analysis alone. Furthermore, although postoperative AEs are more common in the elderly population, research regarding developing predictive models for AEs in elderly patients is lacking.

Multivariate logistic regression and machine learning algorithms are now applied widely across medical field to develop predictive models. Compared with models based on machine learning algorithms, logistic regression models have better stability and model interpretability, while machine learning algorithms have advantages in data processing and nonlinear and multivariate forecasting [17]. Here, we sought to develop a predictive tool for postoperative AEs in elderly patients, utilizing multivariate logistic regression, single classification and regression tree (hereafter, “classification tree”), and random forest machine learning algorithms.

Materials and methods

Patient population

This study was a retrospective review of a prospective Geriatric Lumbar Disease Database, which includes basic information (including demographic data, medical disease, laboratory test, and medication history), perioperative data, and follow-up results of consecutive patients aged 65 years and older. The Institutional Review Board approved the study (IRB# 2018086). Due to the nature of this retrospective study, the informed consent from patients was waived. We reviewed data of patients who underwent elective fusion surgery for lumbar degenerative disease between August 2018 and October 2022 and were followed up for more than three months postoperatively. Inclusion criteria were as follows: (i) aged 65 years and over; (ii) patients with elective TLIF surgery. Exclusion criteria were as follows: patients with (i) irreversible loss of mobility before admission; (ii) revision surgery; (iii) preexisting spinal fracture, any spinal infection or any malignancy; (iv) incomplete data; and (v) surgery-related complications including incidental durotomy, nerve injury or spinal cord injury.

Predictive variables

The demographic and clinical data included age, gender, weight, body mass index (BMI), payment type, medical disease (Charlson comorbidity index, cardiovascular disease, diabetes, medication history (glucocorticoid and anticoagulant), osteoporosis, current smoker or drinker, etc.), and laboratory tests (red blood cell count [RBC], hemoglobin, and coagulation function tests indicators). Surgery-related variables included the number of fused segments, estimated blood loss (EBL), operative time, and drainage volume on postoperative day 0 (POD0). Given the importance of early ambulation as a protective factor reported in previous studies, we also include delayed ambulation (bed rest for more than 48 h after surgery) as a predictive variable. For external validation, we reviewed a consecutive cohort of 175 elderly patients who underwent lumbar fusion surgery in another hospital.

Outcome measure

Postoperative AEs included postoperative complications, prolonged LOS, readmission, and reoperation within 90 days after surgery. We excluded patients with intraoperative complications that were related to the surgical technique of surgeons. Prolonged LOS was defined as postoperative hospital stay greater than the 75th percentile. The indications for readmission included medical complications, reoperation, physical rehabilitation, and other unplanned readmissions. The LOS was recorded routinely by hospital administrative staff who were unaware of the study and extracted from the hospital electronic patient record by a research nurse. Other AEs were recorded by another research nurse using predefined criteria for the presence or absence of complications and readmission according to the clinical record, medication record, and follow-up data. Data were taken from the clinical records made by usual care teams unaware of the study. Patients were grouped as either having at least one adverse event (AEs group) or not (No-AEs group).

Statistical analysis

All statistical analyses were conducted using R version 4.2.2 (R Foundation for Statistical Computing), and significance level was set at p < 0.05 for all tests. Continuous data were expressed as means ± standard deviation (SD) and were compared using the two-tailed Student’s t test or the Mann–Whitney U test. Median (quartile 1, quartile 3) was displayed for not normally distributed data. Categorical variables were expressed as frequencies with percentages and analyzed using Fisher’s exact and chi-square tests, as appropriate. Data from the Geriatric Lumbar Disease Database were partitioned with an 80/20 split for training and test datasets. Logistic regression models were built using the function glm of the R package stats. Single classification trees were produced with the rpart package and were pruned to a complexity parameter of 0.02. Random forest classifiers were created using the randomForest R package v4.2.3 [18].

Trained models are validated using the remaining 20% of available data. Operating characteristic curve (ROC) was accomplished using the proc and rocr packages. Area under the curve (AUC) was calculated by applying the model to the testing set. After selecting the most appropriate model, we compared the predicted with the observed probabilities and drew a calibration curve to assess model calibration. Then, we performed a decision curve analysis (DCA) to evaluate the clinical benefit of our model. The nomogram function of the RMS package of the R software was used to generate the nomogram. Finally, online tool based on our model was developed to assess its validity in the clinical setting (external validation).

Results

Patient characteristics in the development and validation dataset

The development set included 1025 patients (mean [SD] age, 72.8 [5.6] years; 632 [61.7%] female), and the external validation set included 175 patients (73.2 [5.9] years; 97 [55.4%] female). The perioperative and follow-up data of participants in the development dataset (including the training and testing datasets) and validation dataset are shown in Table 1. There were no significant differences in demographic data among the three groups. More patients with peripheral vascular disease and osteoporosis were in the validation dataset (p < 0.01). There were also significant differences among the groups in number of fused segments and intraoperative EBL (p < 0.001). The incidence of postoperative AEs was similar among the three groups (p = 0.264).

Table 1 Perioperative and follow-up data of training, testing, and validation dataset

Univariate risk analysis for postoperative AEs

Univariate analyses revealed that age (p < 0.001), BMI (p = 0.015), hemoglobin (p = 0.03), osteoporosis (p = 0.003), INR (p = 0.025) number of fused segments (p < 0.001), lumbosacral fusion (p = 0.014), intraoperative EBL (p < 0.001), operative time (p < 0.001), delayed ambulation (p < 0.001), drainage volume on POD0 (p < 0.001), and diabetes (p = 0.031) were significantly associated with postoperative AEs. All models were developed with the training dataset and evaluated with the testing and external validation datasets (Table 2).

Table 2 Univariate analysis of risk factors for AEs

Logistic regression model

Multivariate analyses revealed that older age (odds ratio [OR] 1.84, p < 0.001), higher BMI (OR 1.28, p = 0.019), more intraoperative EBL (OR 1.22, p = 0.036), longer operative time (OR 1.92, p < 0.001), and delayed ambulation (OR 1.88, p < 0.001) were independent risk factors for postoperative AEs in elderly patients undergoing lumbar fusion surgery (Table 3).

Table 3 Multivariate logistic regression for postoperative AEs

Single classification tree model

The single classification tree revealed that the number of fused segments ≥ 3, age ≥ 79 years, intraoperative EBL ≥ 325 ml, delayed ambulation, and weight ≥ 64 kg as particularly influential predictors for AEs (Fig. 1).

Fig. 1
figure 1

Decision tree for risk of AEs following lumbar fusion surgery

Random forest model

Recursive feature elimination removed 12 variables from the original set of 26 candidate predictors. A random forest model was developed using the remaining 14 variables. Intraoperative EBL, operative time, delayed ambulation, age, number of fused segments, BMI, and RBC count were the most significant variables in the final model (Fig. 2).

Fig. 2
figure 2

Random forest model variable importance

After the three models were developed using the training dataset, every model performance was validated on a separate cohort of 205 patients in the testing dataset. The logistic regression model AUC was 0.73 vs. 0.72 for the random forest model and 0.70 for the single classification tree model (Fig. 3). DCA was performed to calculate the clinical net benefit of each model, and it revealed that the logistic regression model was more benefit than random forest and single classification tree model in predicting postoperative AEs (Fig. 4).

Fig. 3
figure 3

Receiver operating characteristic curves of logistic regression (blue), single classification tree (red), and random forest algorithms (green)

Fig. 4
figure 4

Decision curve analysis comparing the clinical utility of the three models

Finally, we therefore selected the simpler logistic regression model to build our prognostic classifier. The logistic regression model was well-calibrated (observed to expected ratios) in the training and validation cohorts (Fig. 5). The accuracy of the predictive model was 70% (AUC = 0.69) in the sample from an institution that was different and independent from those used for model creation and internal validation (Fig. 6). Then, the logistic regression model data were used to construct a nomogram (Fig. 7).

Fig. 5
figure 5

Calibration curve of the training (A) and testing (B) cohort

Fig. 6
figure 6

Receiver operating characteristic curve showing sensitivity and specificity of the logistic model for predicting occurrence of any adverse events among the external validation cohort

Fig. 7
figure 7

The nomogram based on the multivariate logistic regression model

Discussion

The hospitalized patient experience has become an area of increased focus for hospitals given the recent coupling of patient satisfaction to reimbursement rates for inpatients [19]. Although previous studies focused on identifying risk factors for complications, reducing LOS and readmission rates are equally important to improve the patient experience and reduce costs [6, 20]. Therefore, postoperative AEs should include complications, readmission, and prolonged LOS after spine surgery. In this study, we developed and validated three models (logistic regression, single classification tree and random forest algorithms) for predicting postoperative AEs following lumbar fusion surgery in elderly patients using a prospective Geriatric Lumbar Disease Database. To our knowledge, this is the first study to develop and externally validate a prediction model for postoperative AEs in elderly patients with high predictive accuracy.

The predictive ability of our three models was comparable, with no significant differences in AUC. Then, we performed a decision curve analysis to determine the clinical usefulness of the three risk stratification models. The logistic regression model had a higher net benefit for clinical intervention than the other models. Our final models (logistic regression model) had good calibration and predictive performance in the internal validation cohort (C-statistics, 0.69–0.76), demonstrating that it can accurately predict potential outcomes in new populations with similar characteristics. Finally, we performed an external validation of the most appropriate model and provided an online user-friendly risk prediction tool at https://xuanwumodel.shinyapps.io/Model_for_AEs/.

Multivariable regression analysis revealed that age, BMI, operative time, intraoperative blood loss, and delayed ambulation were independently associated with postoperative AEs. This finding was in line with other spine surgery literature, which suggested that older age, obesity, and surgical trauma were risk factors for postoperative complications [8, 14, 21]. Intraoperative blood loss was also associated with readmission and prolonged length of hospital stay in previous studies [12, 22]. Our findings suggest that operative time and blood loss were more critical risk factors than the number of surgical segments for AEs in lumbar fusion for degenerative disorders. Preoperative risk of AEs may be modified by intraoperative events, most notably blood loss, which was the most relevant single parameter determining the risk of AEs in elderly patients. For older patients with higher BMI, reducing intraoperative blood loss and operative time can decrease the occurrence rate of adverse events. As an important intervention of enhanced recovery after surgery pathway (ERAS), early ambulation had been demonstrated to be associated with better clinical outcomes [5, 23]. Our study also revealed that delayed ambulation (> 48 h) after surgery was a predictor for postoperative AEs when patients with intraoperative complications were excluded. Therefore, how to improve ERAS compliance of elderly patients is an urgent problem to be solved.

Tree-based machine learning algorithms (including the single classification tree and the random forest approach) were chosen based on many desirable properties for the given binary outcome of having an adverse event or not, including the ability to handle hundreds of variables (both categorical and continuous) and ease of construction [14]. The single classification tree algorithm showed that the number of fused segments was the first discriminator for predicting postoperative AEs. Our findings supported that > 3 fusion segments in lumbar surgery was considered to be long-segment fusion that can cause more extensive surgical trauma. Other discriminators of the present classification tree included age, intraoperative blood loss, delayed ambulation, and weight, similar to our regression analysis and some prior studies [13, 18, 24, 25]. In a retrospective multicenter database analysis, Arora et al. [11] performed a decision tree analysis and found that advanced age, obesity, and greater surgical invasiveness were significant variables increasing the likelihood of readmission and prolonged LOS. To reduce the incidence of postoperative AEs, in older patients (aged 79 years or older) who undergo long-segment fusion, hemostatic agents and minimally invasive surgery should be used to reduce intraoperative bleeding.

Random forest is an algorithm that, in many situations, improves on single classification trees. However, the random forest model exhibited a nonsignificant trend to superior discrimination compared to the classification tree model (classification tree AUC = 0.70 vs. random forest AUC = 0.72). This lack of significant difference is likely due to the small size of our testing dataset, as the method utilized to calculate AUC confidence intervals dependent on sample size. Similar to the other two models, the most important variables in the random forest model were operative time, intraoperative blood loss, BMI, age, and delayed ambulation. Preoperative RBC count and drainage volume on POD0 were relatively important predictors for postoperative AEs within 90 days of surgery. These two variables correlate highly with postoperative RBC count and hemoglobin, which could affect discharge planning. This conjecture, however, will require further study.

Previous spinal surgery studies focused solely on comparing different machine learning and regression approaches for postoperative AEs in patients with spine deformity and involved only internal validation of the developed prediction models [13, 25]. Yagi et al. and Passias et al. demonstrated the efficacy of classification trees in preoperative screening and risk stratification of patients likely to have major complications following corrective spine surgery and cervical deformity surgery, respectively [13, 26]. In a retrospective study of 37,852 patients, Jain et al. developed nine models to assess risks of discharge-to-facility, 90-day readmissions, and major medical complications after long-segment lumbar spine fusion with moderate sensitivity and specificity. The authors found that logistic regression models modestly outperformed random forest and elastic net models [25]. Although many predictive models for spinal disorders had been developed, that as yet we have to pay careful attention in deciding which tools to use depending on the outcomes and the setting of interest. Despite the considerable rate of AEs in elderly patients undergoing lumbar fusion surgery, there are few quantitative tools to predict AEs in elderly patients. The present study offers a novel contribution to the field by assessing and comparing the performance of logistic regression models versus tree-based algorithms and validating them using external data.

This study had several limitations. First, our models were created using retrospective data and are relevant to context of the current standard of care. Prospective multicenter studies can guarantee the sustained effectiveness of the model while ensuring a large sample size. However, our retrospective and uncontrolled data theoretically lends to greater generalizability across institutions and surgical teams. Second, there might be significant predictors of outcomes that are not available in our data set, such as insurance status, dependency status, income levels, and details of spinal pathology. In the present study, we found many non-modifiable preoperative characteristics and intraoperative variables were significantly risk factors for postoperative AEs. Of note, our primary goal was to predict the absolute risk of AEs following surgery and not to identify modifiable factors. Third, given that postoperative complications mainly occurred within the first three months after surgery, we included only patients who were followed up for three months or more [27, 28]. Finally, as we move toward value-based care and shared decision-making, there is an increasing need to collect and use patients-reported outcomes not just in research settings, but also in routine clinical care or quality improvement activities. Thus, our future research will develop models for postoperative minimal clinically important difference and satisfaction in elderly patients undergoing lumbar fusion surgery with a long-term follow-up.

Conclusions

This investigation produced three predictive models for postoperative adverse events in elderly patients undergoing lumbar fusion surgery. The predictive ability of our three models was comparable. Logistic regression model had a higher net benefit for clinical intervention than the other models. An online dynamic nomogram calculator was established based on the final logistic regression model. Our predictive tool could inform physicians about elderly patients with a high risk of AEs within the 90 days after surgery.