We aimed to develop an ML algorithm that can predict discharge to a RF/SNF after elective surgery for lumbar stenosis. Our algorithm included age, sex, BMI, functional status, ASA class, level, fusion, diabetes, preoperative hematocrit, and preoperative serum creatinine. The neural network was picked as the best algorithm based on discrimination (c-statistic = 0.752), calibration (intercept = − 1.27 × 10−5; slope = 0.996), and overall performance (Brier score = 0.1257) in the training set and subsequent performance on internal validation.
Our study has several limitations. First, studies using a large clinical database are always affected by miscoding and other inaccuracies. Although widely used, few studies have assessed the actual accuracy of the NSQIP database. Rolston et al. [24] found many internal inconsistencies between procedure CPT codes and postoperative ICD-9 codes in neurosurgery. However, the codes for lumbar stenosis and lumbar surgery are more straightforward so we estimate that potential miscoding will not severely affect our algorithm. Second, certain variables of interest are not always available in the ACS-NSQIP. Considering preoperative patient-reported outcomes are known to be predictors of discharge placement after spine surgery, we consider this a major limitation of our work [25]. While the current AUC of 0.751 is fair, the algorithm could potentially be improved by adding these and other relevant variables. Third, although the ACS-NSQIP database consists of data of 680 US hospitals, these results may not be applicable to all the patients it is intended for due to differences in demographic or clinical characteristics. Fourth, the differences between the algorithms are small, which makes the choice for a neural network somewhat arbitrary. However, settling on an algorithm based on numerical and graphical assessment is the most reproducible method. Finally, it must be emphasized that this study focuses on accurate prediction of a, rather simple, prespecified outcome (here ‘non-home discharge’) in contrast to the explanation of this outcome, which is the focus of the vast majority of medical research. The variables in our model cannot simply be interpreted as independent explanatory variables.
Age, sex, diabetes, functional status, fusion, and preoperative hematocrit have previously been identified in other (explanatory) studies on discharge placement after spine surgery [26,27,28]. The inclusion of most variables in our model can likely be attributed to being independent risk factors for major complications after surgery for lumbar stenosis. Age, diabetes, BMI, functional status, ASA class, preoperative hematocrit, and preoperative creatinine have all been shown to be associated with major complications [29,30,31,32]. Number of levels and fusion are likely surrogates for longer procedural time which is also implicated in postoperative morbidity [30, 31].
The importance of eliminating delayed discharges for patients lies in averting the risks associated with longer hospitalization and the advantages of starting rehabilitation earlier. Umarji et al. [11] found that 58% of patients with a hip fracture acquire nosocomial infections when discharge was delayed beyond 8 days. Hauck et al. [33] found that each additional night in hospital increases the risk by 0.5% for adverse drug events and 1.6% for infections. With regard to rehabilitation, other studies have found worse post-rehabilitation scores for patients with delays in discharge [34, 35]. While those studies did not necessarily focus on elective spine patients, other spine centers have acknowledged the problem and aimed to construct risk scores for predicting discharge placement. McGirt et al. [15] created the Carolina-Semmes grading score for all degenerative lumbar spine surgery based on logistic regression. They included the variables age, ASA class, fusion, Oswestry disability index score, ambulation, and non-private insurance and achieved an area under the curve (AUC) of 0.731. Kanaan et al. [14] used age, prior level of function, and gait distance to create a model for discharge placement after lumbar laminectomy and achieved an AUC of 0.80. Slover et al. [13] stratified spine patients in low, medium, and high risk based on points for age, sex, walking distance, gait aid, community support, and availability of caregiver at home. They did not report an AUC. None of the above-mentioned studies assessed calibration.
Although often overlooked, assessment of calibration is an essential feature of studies creating prediction models. In our study, the neural network and the Bayes point machine had highly similar performance metrics. However, on graphical assessment the calibration of the Bayes point machine was slightly inaccurate between the predicted probabilities of 0.15 and 0.50, which represent a significant part of the study population (Fig. 3). This deviation means the algorithm slightly underestimates the chance of discharge to a RF/SNF, which for some patients would mean no placement has been arranged before surgery—the situation as it is right now. Assessing calibration over the full range of predictions is crucial in ensuring the model is useful [23]. Future studies aiming to create models should always feature a numerical and graphical assessment of calibration. As depicted in the calibration subplot in Fig. 3, the vast majority of patients have a 10–40% chance of discharge to an RF/SNF, as can be expected for an elective spine procedure. The algorithm is meant to trace and designate higher-risk patients so their potential discharge delay might be avoided.
Where hospitals set their threshold to arrange an RF/SNF placement in advance would differ per health system. There are major differences in the availability of RF/SNF beds, insurance regulations, and discharge practices between countries [36,37,38]. Length of stay for deforming dorsopathies ranges from 4.6 to 27 within Europe. American patients are three times more likely to be discharged to RF/SNF than Canadian patients with a hip fracture [39]. While these complex differences do exist, delayed discharges are a problem for patients and hospitals around the world [9]. Mirroring the differences between countries, a wide variety of policies have been implemented internationally to try to lower amount and duration of these delayed discharges [40, 41]. In Great Britain, imposing fines has reduced the number of delayed discharges, but simultaneously rising readmission rates brought up questions about the quality of discharges [42]. Sweden tried making local municipalities financially responsible for the care of elderly [43]. Others focused on developing allocation decision tools or the effect of increasing nursing home supply [12, 44].
At the very core of all these suggested policies, regardless of health system, is the inability to make an accurate assessment of who will need a RF/SNF placement with enough time to set things in motion. An ML algorithm can give an individualized prediction. Thorough external validation needs to be performed along with an assessment of where to place the threshold before these algorithms can be implemented, especially if the algorithm were to be used outside the USA.
Nevertheless, considering the risks for patients and the unnecessary costs involved with longer hospitalization due to delayed discharges, the use of predictive algorithms could be worth the initial effort.