Abstract
Purpose
A common spine surgery procedure involves decompression of the lumbar spine. The impact of the surgeon’s learning curve on relevant clinical outcomes is currently not well examined in the literature. A variety of machine learning algorithms have been investigated in this study to determine how a surgeon's learning curve and other clinical parameters will influence prolonged lengths of stay (LOS), extended operating times (OT), and complications, as well as whether these clinical parameters can be reliably predicted.
Methods
A retrospective monocentric cohort study of patients with lumbar spinal stenosis treated with microsurgical (MSD) and full-endoscopic (FED) decompression was conducted. The study included 206 patients with lumbar spinal stenosis who underwent FED (63; 30.6%) and MSD (118; 57.3%). Prolonged LOS and OT were defined as those exceeding the 75th percentile of the cohort. Furthermore, complications were assessed as a dependent variable. Using unsupervised learning, clusters were identified in the data, which helped distinguish between the early learning curve (ELC) and the late learning curve (LLC). From 15 algorithms, the top five algorithms that best fit the data were selected for each prediction task. We calculated the accuracy of prediction (Acc) and the area under the curve (AUC). The most significant predictors were determined using a feature importance analysis.
Results
For the FED group, the median number of surgeries with case surgery type at the time of surgery was 72 in the ELC group and 274 in the LLC group. FED patients did not significantly differ in outcome variables (LOS, OT, complication rate) between the ELC and LLC group. The random forest model demonstrated the highest mean accuracy and AUC across all folds for each classification task. For OT, it achieved an accuracy of 76.08% and an AUC of 0.89. For LOS, the model reached an accuracy of 83.83% and an AUC of 0.91. Lastly, in predicting complications, the random forest model attained the highest accuracy of 89.90% and an AUC of 0.94. Feature importance analysis indicated that LOS, OT, and complications were more significantly affected by patient characteristics than the surgical technique (FED versus MSD) or the surgeon's learning curve.
Conclusions
A median of 72 cases of FED surgeries led to comparable clinical outcomes in the early learning curve phase compared to experienced surgeons. These outcomes seem to be more significantly affected by patient characteristics than the learning curve or the surgical technique. Several study variables, including the learning curve, can be used to predict whether lumbar decompression surgery will result in an increased LOS, OT, or complications. To introduce the provided prediction tools into clinics, the algorithms need to be implemented into open-source software and externally validated through large-scale randomized controlled trials.
Similar content being viewed by others
Introduction
Decompression of the lumbar spine is among the most common spinal surgeries performed on patients over the age of 65 who suffer from lumbar spinal stenosis (LSS) [1]. Decompression surgery is intended to relieve discomfort and improve function. As a result of spinal surgery, spinal stenosis-affected nerves are relieved of pressure by enlarging the cross section of the spinal canal [2]. It is estimated that approximately 30 percent of the general population suffers from lower back pain. In accordance with the National Center for Health Statistics, the cost of surgical intervention alone amounted to 1.65 billion dollars [3]. In the USA, lumbar spine surgeries for patients 65 and older cost an estimated 306 million dollars over the past few years [3,4,5]. It is anticipated that healthcare expenditures related to degenerative spine diseases will increase significantly by 2050 due to an increase in the elderly population, which is expected to reach 83.7 million by 2050. According to studies, 8–10% of patients who undergo spinal decompression have to repeat the procedure, which results in greater hospital expenses [6, 7]. There is also a 3.1% likelihood of cardiac problems or stroke and a 0.4% likelihood of death within one month following these procedures [3].
Surgical outcomes can be improved through innovation. Innovations in surgery are intended to increase surgical effectiveness and reduce the risks associated with postoperative complications. More and more patients expect to undergo surgery using the most advanced techniques, which are typically complex and difficult to learn. However, there is an associated learning curve with the implementation of surgical innovations, and this learning curve may negatively impact patient outcomes [8]. Analyzing outcome data acquired during an implementation period requires considering surgical learning curves. Due to the complexity of current surgical procedures and the increasing frequency of interventions, surgical learning curves are becoming increasingly significant. A learning curve analysis is becoming increasingly important not only to assist with the interpretation of outcome data during the implementation period but also to identify differences in learning curve length and learning-related morbidity. A patient's additional morbidity during the learning phase could have been avoided if the surgical team had been truly proficient. During the learning phase of technical procedures, patients may experience significant impairment of their results. Nowadays, the differences in effectiveness between newly implemented, innovative surgical procedures are relatively small in general. The impact of the learning phase on patient outcomes may therefore become more relevant. Specifically, this may be true for various types of minimally invasive spinal surgery (MISS).
Working in a narrow surgical corridor with limited visibility of anatomic landmarks is one of the major challenges of MISS. MISS techniques for the spine represent a considerable learning curve for a spine surgeon trained in traditional open techniques [9, 10]. Several techniques for decreasing the MISS learning curve have been developed, including specialized retraction systems, computer-aided navigation technology, and cadaveric training [11]. The surgeon must, however, be prepared for an increased complication rate and prolonged operative times when initially employing MISS procedures [12].
In evaluating the socioeconomic costs and benefits of novel surgical techniques, it is important to consider the operation time, the length of stay (LOS), and the postoperative complication rate. The use of predictive models and machine learning has become increasingly valuable in recent years for predicting patient outcomes based on pertinent characteristic variables [13]. As a result of the development of clinically significant outcome prediction models, society may be able to increase its utilization of healthcare resources [14, 15]. Thus, policymakers and clinicians will be able to determine the most effective allocation of budget resources between different treatment options. Through the application of machine learning and deep learning algorithms, it is possible to gain a better understanding of the acquired data and to predict whether a patient is more likely to experience a prolonged hospital stay, an extended operation time, or complications based upon a variety of clinicopathological characteristics. These algorithms can be incorporated into the hospital's software environment, enabling continuous monitoring of at-risk patients and accomplishment of precision medicine objectives. The learning curve has not been sufficiently considered in statistical algorithms and advanced machine learning algorithms to date. Despite the fact that there have been numerous AI-based predictive models published in the field of spine surgery over the past few years, no study has considered the learning curve as a confounding factor in their algorithms [13].
Since unsuccessful spine surgery has a socioeconomic impact and learning curves play a critical role in modern spine surgery, the present study explored the relationship between surgeon learning curve, operation time, length of stay, surgical complications, and several other clinical variables using artificial intelligence-based algorithms.
Methods
Study design
A retrospective cohort study was conducted between 2016 and 2021 to examine consecutive patients who had been treated for lumbar spinal stenosis using microsurgical or endoscopic interlaminar decompression.
We included patients who had lumbar spinal stenosis treated with either microsurgical or full-endoscopic decompression during the study period. An iLESSYS® system was used for the endoscopic group (Joimax GmbH, Karlsruhe, Germany). All data from patients who fulfilled our inclusion criteria were filtered based on our exclusion criteria after the initial dataset had been collected. The following patients were excluded from the study: (1) Those who were under 18 years of age, those who had spinal tumors, those who had spinal fusions, and those who had refused to allow their data to be used for research purposes.
Data handling
The patient information system was used to collect and extract patient data into a predefined datasheet. Pseudonymization of the data was performed using the “encode” command in Stata Statistical Software Release 15 (StataCorp. 2011, College Station, TX, USA). The study extraction form contained variables that were previously identified as significant for determining hospital length of stay and operation time in our previous studies [16, 17] and from a literature search. Clinical factors and surgery variables included surgery technique (microsurgical vs. endoscopic decompression), the number of targeted levels, the length of hospital stay, the classification of physical status by the American Society of Anesthesiologists, as well as complications associated with the surgery. The following complications were assessed: residual sensorimotor deficits, new sensorimotor deficits, postoperative instability, persistent stenosis requiring revision, and hematomas that required revision. Additionally, this group included demographic data (sex, age, body mass index), data regarding alcohol use nicotine use, and information regarding the type of German health insurance (public or private). C-reactive protein (CRP) levels were measured prior to surgery as part of the laboratory variable group. Additionally, we extracted the names of the surgeons who performed the surgery at the hospital during the study period. The majority of surgeries were performed by five surgeons. All other surgeons who performed fewer than ten cases were grouped under the heading “others.” The surgeons were then provided with an excel spreadsheet that contained the hospital case number, the variables “Years_experience_with_case_surgery_type” and “Number_of_surgeries_with_case_surgery_type_at_time_of_surgery,” as well as information regarding the number of cases a surgeon had already performed during the surgery of the respective technique at the time of the surgery (microsurgical or full-endoscopic). As a result, surgeons included in the study had time to review the required information in the hospital's information system and return the excel sheet for data analysis.
The binary classification task classified both the length of the operation and hospital stay as prolonged if they fell outside of the 75% percentile and normal if they fell inside of the 75% percentile [18, 19]. Unsupervised learning was conducted using three models (two-step clustering, K-means clustering, and Kohonen networks), and the model with the highest silhouette value was selected for the final clustering. In the final two-step clustering model, the Bayesian criterion (BIC) was used as the clustering criterion. When applicable, the Mann–Whitney-U or Chi2-test was used to compare cluster variables. A number of steps were carried out in the data preprocessing process, including the imputation of missing values, the partitioning of the set of training and test data, and the upsampling of the minority classes to achieve class balancing. An upsampling of classes was performed since class imbalance has been observed to significantly affect the performance results of prediction models [20]. OT and LOS classes, as well as complications, were predicted using supervised machine learning and deep learning techniques. The initial step involved the application of 15 models (support vector machine (SVM), k-nearest neighbor (KNN), discriminant analysis, Bayesian network, decision tree, logistic regression, CHAID algorithm, QUEST algorithm, classification and regression tree (C&R), C5.0 node, linear support vector machine (LSVM), Random Trees, Tree-AS, XGBoost Tree, XGBoost) of which the majority were omitted due to incomplete data fit. For the final prediction modeling, the following algorithms were applied: C5.0 node, random forest, CHAID, Tree-AS, KNN, C&R Tree, and SVM. We performed all analyses and data preprocessing steps in SPSS Modeler v18.3 (IBM, Armonk, NY, USA), Python, and SPSS v26 (IBM, Armonk, NY, USA) on a Ryzen 9 5950X 16-Core Processor, 64 GB RAM, NVIDIA Geforce RTX 3090 GPU Windows 10 computer.
Results
A total of 206 patients were included in the cohort. The total number of patients who underwent full-endoscopic decompression (FED) was 63 (male: 36, female: 27), while the total number of patients who underwent microsurgical decompression (MSD) was 143 (male: 69, female: 74). The mean age of the patients was 59.96 ± 16.49 (range: 27–92 years). The baseline characteristics of both groups are summarized and compared in Table 1. In general, there was no difference between the MSD and the FED groups with regard to the majority of study variables. For the MSD group, however, the number of levels accessed was slightly higher. Furthermore, surgeons were more experienced (9.84 years vs. 3.56 years) and had performed more surgeries (596.19 vs. 115.75) using the MSD technique than the FED technique at the time of surgery. As expected, the variables “years of experience with case surgery type” and “number of surgeries with case surgery type at the time of surgery” were significantly correlated (Spearman's rho: 0.731; p 0.001). There was also a positive correlation between the operation time and the LOS (Spearman's rho: 0.264; p < 0.001). Interestingly, there was also a significant positive correlation between years of experience with the type of surgery and the LOS (Spearman's rho: 0.18; p = 0.009). Further analysis of this relationship for both surgical techniques indicated that years of experience with case surgery and LOS were only positively correlated for the MSD (Spearman's rho: 0.237; p = 0.004), while for the FED group, there was an indirect but non-significant relationship (Spearman's rho: − 0.161; p = 0.207). Furthermore, there was a significant and inverse relationship between years of experience with case surgery and the operation time (Spearman's rho: − 0.249; p = 0.049) for the FED group, whereas for the MSD group, this relationship was positively correlated (Spearman's rho: 0.190; p = 0.023). In both groups, OT and LOS showed significant and positive correlations.
After the first explorative and descriptive analyses steps, we performed unsupervised learning using a variety of clustering techniques (two-step clustering, k-means clustering, and Kohonen networks) to identify clusters that represent surgical learning curves (Silhouette value ≥ 0.60) for each of the surgical techniques. For the FED group, K-means clustering using the variables “Years of experience with case surgery type” and “Number of surgeries with case surgery type at time of surgery” provided the highest clustering performance (Silhouette value: 0.698) and was selected as the clustering method. For the MSD group, Two-Step clustering with the variables “Years of experience with case surgery type” and “Number of surgeries with case surgery type at time of surgery” achieved the highest clustering performance (Silhouette value: 0.819) and was selected as the clustering algorithm. There were two clusters representing the early learning curve phase (ELC; n = 137; 66.8%) and the late learning curve phase (LLC; n = 68; 33.2%) in the MSD group. Among the ELC group, the median number of surgeries with case surgery type was 136, while among the LLC group, it was 1610. A comparison of the study variables among the two clusters is shown in Table 2. Surgical procedures performed by the LLC group involved patients who had a higher number of levels, were older and had a higher CRP level preoperatively. However, patients in this group also showed longer OT and longer LOS. There was no significant difference between the groups in terms of BMI. Additionally, neither the sex nor the complication rates differed significantly. Nevertheless, private insurance patients were more likely to undergo surgical procedures in the ELC group.
Furthermore, the FED group consisted of two clusters representing the early learning curve phase (ELC; n = 66; 68.0%) and the late learning curve phase (LLC; n = 31; 32.0%). For the FED group, the median number of surgeries with case surgery type at the time of surgery was 72 in the ELC group and 274 in the LLC group. Table 3 compares study variables between the two clusters. As opposed to the MSD examination, FED patients did not differ significantly between the ELC and LLC groups on most variables. In the LLC group, CRP was significantly higher preoperatively. Additionally, private insurance patients were more likely to undergo surgery in the LLC group. Other variables, such as the complication rate, did not differ significantly between the ELC and LLC.
In the next step, we applied various machine learning and deep learning algorithms to predict the OT and LOS classes as well as the occurrence of complications based on study variables, including the learning curve as a confounding factor. In Table 4, performance measures for the best algorithms are presented for each outcome (OT, LOS, and complications). The most important predictors of OT were preoperative CRP (0.201; 95% CI 0.182–0.219), age (0.142; 95% CI 0.129–0.155), and BMI (0.137; 95% CI 0.126–0.149). The learning curve variable reached place 5 (number of surgeries with case surgery type at time of surgery [0.114; 95% CI 0.105–0.124]). The study group (MSD vs. FED) ranked place 9 on the important predictor list (0.051; 95% CI 0.043–0.058). Further, the random forest model indicated that LOS (0.160; 95% CI 0.146–0.175), OT (0.142; 95% CI 0.130–0.155), BMI (0.127; 95% CI 0.116–0.138), and age (0.125; 95% CI 0.115–0.136) were the most significant predictors of complications. Neither the learning curve (6th most important predictor; [0.092; 95% CI 0.083–0.100]) nor the surgical technique (FED vs. MSD; 11th most important predictor [0.020; 95% CI 0.020–0.031]) had a significant impact on complications. The most important predictors of LOS were age (0.239; 95% CI 0.220–0.258), preoperative CRP (0.154; 95% CI 0.139–0.168), and BMI (0.126; 95% CI 0.113–0.138). There was not any relevant impact of the surgical technique on the LOS (the 15th most important predictor [0.013; 95% CI 0.009–0.016]). The learning curve variables reached place 5 (number of surgeries with the type of surgery at the time of surgery [0.102; 95% CI 0.092–0.112)) in the predictor list. In general, the results suggest that the patient characteristics have a greater impact on LOS, OT, and complications than the surgical technique (FED versus MSD) or the surgeon's learning curve. Figures 1 and 2 illustrate the variability across the folds for the machine learning algorithms. In addition to the 95% CI shown in Table 4, the standard deviation of the cross-validation folds' performance metrics has been calculated for each algorithm and prediction category. These values indicate the variability of each algorithm's performance across the folds, with lower values suggesting more consistency. For predicting Complications, the C&R Tree algorithm showed the most consistent performance in terms of Accuracy, with a low standard deviation (Std Dev = 2.60), while the KNN algorithm displayed the greatest variability (Std Dev = 5.05). In terms of AUC, the C&R Tree again was the most consistent (Std Dev = 0.028), and the CHAID algorithm had the highest variability (Std Dev = 0.063). In the prediction of length of stay (LOS), the random forest algorithm was found to be the most consistent for accuracy (Std Dev = 2.45), with Tree-AS showing the greatest variability (Std Dev = 4.90). For AUC, the Tree-AS algorithm was the most consistent (Std Dev = 0.029), while the random forest showed the most variability (Std Dev = 0.079). Regarding the prediction of operation time (OT), the C&R Tree algorithm exhibited the most consistent Accuracy (Std Dev = 2.96), while Tree-AS had the highest variability (Std Dev = 5.74). For AUC, the CHAID algorithm was the most consistent (Std Dev = 0.027), with the KNN algorithm having the highest variability (Std Dev = 0.070). These results suggest that for this dataset, the C&R Tree algorithm tends to have more consistent performance across different folds in predicting complications, both in terms of Accuracy and AUC. The random forest algorithm shows similar consistency in predicting the length of hospital stays. However, the variability of the algorithms' performance also indicates that certain algorithms, like Tree-AS and KNN, may be more sensitive to the specific data presented in each fold, which could be due to factors such as the algorithms' complexity, parameter settings, or the nature of the data itself.
Discussion
This study is the first to present evidence related to the use of AI-based algorithms, specifically focusing on the impact of the providers themselves on LOS, OT, and complications. These results are useful for the implementation phases of MISS since the association between these important parameters remains unclear. Data obtained from lumbar decompression surgery patients suggests that OT, complication rates, and length of stay can be reliably predicted. Furthermore, our institution's results indicate that a median of 72 surgical cases performed in the early learning curve led to similar clinical outcomes as those performed by more experienced surgeons. In the MSD group, the OT and LOS were higher due to more complex procedures (higher number of levels, older patients) being performed by more experienced surgeons. According to the AI-based analyses, the longer OT and LOS in the MSD group are likely to be related to the patient characteristics. Despite the fact that we only used a small number of cases from one institution, the algorithms demonstrated satisfactory performance metrics. The findings of our study can thus serve as a basis for the development of more accurate models through the use of larger, multicenter prospective studies.
In assessing surgeon progress through the procedural learning curve, clinical outcomes such as the complication rate, LOS, and OT are considered to be among the most relevant parameters. MIS approaches to the spine are limited by the absence of clear anatomic landmarks. As noted by Jhala and Mistry [21], complications were associated with a lack of familiarity with endoscopic image orientation and a suboptimal approach to the surgical target during the endoscopic discectomy learning curve period. In this initial series of patients, multiple durotomies were performed on the wrong level, and facet joint structures were unintentionally removed. Several authors have argued that an ideal entry point and trajectory during the surgical approach are the key to overcoming the MISS learning curve [22,23,24]. The results of a systematic review found that all complications, adverse events, and conversions to open procedures occurred during the first 30 procedures when these parameters were reported as a function of the chronologic case number [10]. In a previous study, we found that 20 FED surgeries significantly reduced complication rates and operation times [16]. In accordance with Zelenkov et al. 's [25] learning curve assessment, the plateau of the learning curve would be reached within the first 20 patients of full-endoscopic interlaminar and transforaminal surgery. It was noted by Lee et al. [26] that complication rates were higher and operation times were longer in the first cohort of patients treated with FED. The learning curve plateaued after the 100th case. Furthermore, as compared to the more experienced phase of the learning curve, the complication rates were twice as high in the first cohort of patients. According to our institution's analysis, 72 surgeries resulted in similar clinical outcomes for the FED technique, which is in line with literature results ranging between 20 and 100 cases to reach the plateau.
Even though several risk factors have been identified for patients who will have a longer LOS after lumbar decompression surgery, an effective framework has not been developed that will predict whether a patient will have a long or short OT and/or LOS following surgery or suffer complications. The use of artificial intelligence techniques to predict prolonged LOS following lumbar spinal stenosis surgery has been identified only in one study [27]. Specifically, the authors report an AUC of 0.54 in their paper, which is insufficient as an AUC of 0.5 implies that the system does not possess the capacity to classify (similar to tossing a coin). The same study has been carried out in order to predict prolonged OT [27]. In their report, an accuracy rate of 0.81 was reported, a result that is comparable to the results of our algorithms, which ranged from 0.72 to 0.99. The difficulty of comparing surgical complications predictive models arises from the differing definitions used for the term “complication” (e.g., readmission, re-operation, wound infections, etc.) and from the pooling of various spine surgery procedures for predictive modeling in some studies [20]. The study by Siccoli et al. [27] focused on lumbar spinal stenosis and reported an accuracy of 0.81 in predicting re-operations, while the majority of available studies address lumbar spine fusion procedures [13]. In the present study, the use of machine learning and deep learning techniques was demonstrated to be effective in predicting complications, prolonged OT, and LOS, which supports the hypothesis that these outcomes are associated with other clinical features. Additionally, these findings can facilitate the assignment of resources and planning of discharge for patients with specific risk profiles. An open-source tool for the provided class predictions can be developed using the variables used in the present approach. As a result of such a procedure, the models provided would be able to be externally validated. In other orthopedic surgical specialties, risk assessment tools have been used in the past to assess patient risk [28]. The present results could also be incorporated into such tools. A quantitative assessment of the risks associated with surgery would be beneficial to both patients and physicians before surgery. Furthermore, by setting patient expectations properly, the entire patient care team will be able to put together an appropriate plan that will improve patient safety and satisfaction as well as reduce the duration of operations, the length of stays, and the number of complications. Notably, there is ongoing work to integrate various data modalities in predictive modeling for spine surgery. This includes the development of hybrid models and the application of radiomics techniques that incorporate imaging data into the modeling process [13, 29]. The insights from our current study have the potential to enhance these advanced modeling approaches.
This study found that AI-based algorithms were capable of predicting and discriminating between classes with satisfactory accuracy and AUC. AI-based techniques have been applied to several orthopedic subspecialties with similarly promising results. The use of machine learning methods and neural networks has previously been used to predict intraoperative blood loss, prolonged hospital stays, patient-reported outcomes, and discharge disposition in the field with similar or inferior results [13, 30,31,32,33,34]. As this study builds upon past work in this area, it contributes to the growing body of evidence that supports the use of machine learning in orthopedic surgery. It was reported in a previous study that prolonged LOS was associated with a longer operating time and a higher ASA classification [35]. However, we found that operating time contributed significantly more to a longer LOS than the ASA class. Surgical practice styles and preferences indicate a correlation between prolonged LOS and operating time, which is in line with our recent findings, which indicate that operating time is associated with an increase in LOS [36]. Comorbidities of patients have also been shown to contribute to longer surgical times in the literature, as well as our results [37]. Surgical prediction models should also take into account the surgeon's learning curve. Multiple parameters may affect the length of time it takes to perform an operation, which may also affect LOS. The invasiveness of a spinal procedure is generally correlated with its outcome (blood loss, duration of the operation, and risk of complications). In this context, it may be difficult to reduce the importance of features to a few parameters, especially if not all potentially important features are incorporated into the study design—something that is not possible in most retrospective studies.
Considering the results of our study, CRP levels at the time of surgery may be associated with the outcome of the surgery. During acute-phase inflammation, CRP is a well-recognized marker of systemic inflammation [38]. Furthermore, CRP can be used to monitor the effectiveness of treatment as well as to screen for inflammation early in the course of a disease process. CRP has been shown to rise in response to surgical trauma, peaking 48 h following surgery [39, 40]. There are cases in which patients' CRP responses are only incomplete or do not occur at all [40]. In apparently healthy individuals with validated CRP risk categories, cardiovascular events were predicted accurately [41]. There has been evidence that preoperative CRP levels are linked to longer-term morbidity and mortality after cardiac surgery [42,43,44]. The peak CRP response postoperatively has also been associated with the degree of surgical trauma [45]. The results of minimally invasive surgical procedures tend to yield lower levels of CRP than those produced by open procedures, although not all studies agree on this [46,47,48]. Generally, research on the effects of preoperative CRP on spinal surgery patients is limited. In spite of the intuitive notion that increased inflammation prior to surgery might have an adverse impact on clinical outcomes, this phenomenon will need to be confirmed in future studies.
Despite the fact that the algorithms presented in this study were well performing and the learning curve was incorporated into the algorithms in a novel way, the study has some limitations. In the early stages of learning a new MISS technique, the learning surgeon may preferentially select straightforward cases, which may lead to misleading data regarding the clinical outcomes. A properly designed randomized would reduce this selection bias. Furthermore, some patients had a high LOS (outliers) due to previous conservative treatments as well as multimorbidity, which increased the overall mean LOS for the respective groups. A larger dataset would have allowed for more specific subgroup analyses excluding patients who had previously undergone conservative treatment. Algorithms must be designed on the basis of a representative sample of patients undergoing spinal decompression surgery to be effective therapeutically. Based on one institution with a limited sample size, our data may not apply to other institutions. As a result, it is imperative that the models provided are externally validated. A further disadvantage of the data is that it was collected retrospectively, which may have a negative impact on the evidence grade since retrospective data is not necessarily as reliable as data collected prospectively. Moreover, since the selection of variables was retrospective, we were not able to include additional variables that may have enhanced the model further. Our study, however, attempted to incorporate important variables based on our previous research and literature review [16].
Conclusions
The decompression of the lumbar spine is among the most common spine surgeries. Operation time, hospital length of stay, and complication rates significantly affect the associated healthcare costs. There has been a lack of prediction models that take into account the learning curve of physicians in order to predict these parameters, but these models are highly warranted in order to effectively analyze societal healthcare resources. Through the use of this method, institutions are able to make comparisons between therapeutic strategies across different disciplines and establish relative priorities for allocating resources across various interventions. According to our results, a median of 72 cases of FED surgeries led to comparable clinical outcomes compared to experienced surgeons in the early learning curve phase. Patient characteristics appear to be more influential on clinical outcomes than the learning curve or the surgical technique. The results of the study suggest that machine learning and deep learning algorithms may be useful in predicting whether patients will experience increased LOS, OT, or experience complications following lumbar decompression surgery based on data obtained from several variables, including the learning curve. It is necessary to incorporate the algorithms provided into open-source software and externally validate them through large-scale randomized controlled trials in order to use the predicted tools in clinics.
References
Lurie J, Tomkins-Lane C (2016) Management of lumbar spinal stenosis. BMJ 352:h6234. https://doi.org/10.1136/bmj.h6234
Deer T, Sayed D, Michels J et al (2019) A review of lumbar spinal stenosis with intermittent neurogenic claudication: disease and diagnosis. Pain Med 20:S32–S44. https://doi.org/10.1093/pm/pnz161
Deyo RA, Mirza SK, Martin BI et al (2010) Trends, major medical complications, and charges associated with surgery for lumbar spinal stenosis in older adults. JAMA 303:1259–1265
Luo X, Pietrobon R, Sun SX et al (1976) Estimates and patterns of direct health care expenditures among individuals with back pain in the United States. Spine (Phila Pa). https://doi.org/10.1097/01.BRS.0000105527.13866.0F
Emanuel EJ, Fuchs VR (2008) The perfect storm of overutilization. JAMA 299(23):2789–2791
Modhia U, Takemoto S, Braid-Forbes MJ et al (2013) Readmission rates after decompression surgery in patients with lumbar spinal stenosis among medicare beneficiaries. Spine (Phila Pa 1976) 38:591–596. https://doi.org/10.1097/BRS.0b013e31828628f5
Martin BI, Mirza SK, Flum DR et al (2012) Repeat surgery after lumbar decompression for herniated disc: the quality implications of hospital and surgeon variation. Spine J 12:89–97. https://doi.org/10.1016/j.spinee.2011.11.010
McCulloch P, Altman DG, Campbell WB et al (2009) No surgical innovation without evaluation: the IDEAL recommendations. Lancet 374:1105–1112. https://doi.org/10.1016/S0140-6736(09)61116-8
Epstein N (2017) Learning curves for minimally invasive spine surgeries: are they worth it? Surg Neurol Int 8:61. https://doi.org/10.4103/sni.sni_39_17
Sclafani JA, Kim CW (2014) Complications associated with the initial learning curve of minimally invasive spine surgery: a systematic review. Clin Orthop Relat Res 472:1711–1717. https://doi.org/10.1007/s11999-014-3495-z
Voyadzis J-M (2011) The learning curve in minimally invasive spine surgery. Semin Spine Surg 23:9–13. https://doi.org/10.1053/j.semss.2010.11.003
Perez-Cruet MJ, Fessler RG, Perin NI (2002) Review: complications of minimally invasive spinal surgery. Neurosurgery 51:S26-36
Saravi B, Hassel F, Ülkümen S et al (2022) Artificial intelligence-driven prediction modeling and decision making in spine surgery using hybrid machine learning models. J Personalized Med 12(4):509. https://doi.org/10.3390/jpm12040509
Gellman DD (1974) Cost-benefit in health care: we need to know much more. Can Med Assoc J 111:988–989
Dagenais S, Roffey DM, Wai EK et al (2009) Can cost utility evaluations inform decision making about interventions for low back pain? Spine J 9(11):944–957
Saravi B, Ülkümen S, Lang G, Couillard-Després S, Hassel F (2023) Case-matched radiological and clinical outcome evaluation of interlaminar versus microsurgical decompression of lumbar spinal stenosis. Eur Spine J 32:2863–2874
Saravi B, Zink A, Ülkümen S et al (2022) Performance of artificial intelligence-based algorithms to predict prolonged length of stay after lumbar decompression surgery. J Clin Med 11:4050. https://doi.org/10.3390/jcm11144050
Krell RW, Girotti M, Dimick JB (2013) Extended hospital stay after surgery: a marker of hospital quality or efficiency? J Surg Res 179:219. https://doi.org/10.1016/j.jss.2012.10.395
Bottle A, Middleton S, Kalkman CJ et al (2013) Global comparators project: international comparison of hospital outcomes using administrative data. Health Serv Res 48:2081–2100. https://doi.org/10.1111/1475-6773.12074
Hoda M, El Saddik A, Wai E, Phan P (2019) Predicting spine surgery complications using machine learning. In: 2019 IEEE international conference on multimedia & expo workshops (ICMEW). IEEE, Shanghai, China, pp 49–53
McAfee PC, Phillips FM, Andersson G et al (2010) Minimally invasive spine surgery. Spine (Phila Pa 1976) 35:S271-273. https://doi.org/10.1097/BRS.0b013e31820250a2
Dhall SS, Wang MY, Mummaneni PV (2008) Clinical and radiographic comparison of mini-open transforaminal lumbar interbody fusion with open transforaminal lumbar interbody fusion in 42 patients with long-term follow-up. J Neurosurg Spine 9:560–565. https://doi.org/10.3171/SPI.2008.9.08142
Park Y, Ha JW (2007) Comparison of one-level posterior lumbar interbody fusion performed with a minimally invasive approach or a traditional open approach. Spine (Phila Pa 1976) 32:537–543. https://doi.org/10.1097/01.brs.0000256473.49791.f4
Rong L-M, Xie P-G, Shi D-H et al (2008) Spinal surgeons’ learning curve for lumbar microendoscopic discectomy: a prospective study of our first 50 and latest 10 cases. Chin Med J (Engl) 121:2148–2151
Zelenkov P, Nazarov VV, Kisaryev S et al (2020) Learning curve and early results of interlaminar and transforaminal full-endoscopic resection of lumbar disc herniations. Cureus 12:e7157. https://doi.org/10.7759/cureus.7157
Lee CW, Yoon KJ, Kim SW (2019) Percutaneous endoscopic decompression in lumbar canal and lateral recess stenosis–the surgical learning curve. Neurospine 16(1):63–71. https://doi.org/10.14245/ns.1938048.024
Siccoli A, de Wispelaere MP, Schröder ML (2019) Machine learning–based preoperative predictive analytics for lumbar spinal stenosis. Neurosurg Focus 46:5
Dibra FF, Silverberg AJ, Vasilopoulos T et al (2019) Arthroplasty care redesign impacts the predictive accuracy of the risk assessment and prediction tool. J Arthroplast 34:2549–2554
Saravi B, Zink A, Ülkümen S et al (2023) Clinical and radiomics feature-based outcome analysis in lumbar disc herniation surgery. BMC Musculoskelet Disord 24:791. https://doi.org/10.1186/s12891-023-06911-y
Biron DR, Sinha I, Kleiner JE (2019) A novel machine learning model developed to assist in patient selection for outpatient total shoulder arthroplasty. J Am Acad Orthop Surg 28:580–585
Navarro SM, Wang EY, Haeberle HS (2018) Machine learning and primary total knee arthroplasty: patient forecasting for a patient-specific payment model. J Arthroplasty 33:3617–3623
Durand WM, DePasse JM, Daniels AH (2018) Predictive modeling for blood transfusion after adult spinal deformity surgery: a tree-based machine learning approach. Spine (Phila Pa 1976) 43:1058–1066. https://doi.org/10.1097/BRS.0000000000002515
Fontana MA, Lyman S, Sarker GK et al (2019) Can machine learning algorithms predict which patients will achieve minimally clinically important differences from total joint arthroplasty? Clinical orthopaedics and related research. Lippincott Williams Wilkins 477:1267–1279
Malik AT, Khan SN (2019) Predictive modeling in spine surgery. Ann Transl Med 7:173
Kobayashi K, Ando K, Kato F et al (2019) Predictors of prolonged length of stay after lumbar interbody fusion: a multicenter study. Glob Spine J 9:466–472
Adogwa O, Lilly DT, Khalid S et al (2019) Extended length of stay after lumbar spine surgery: sick patients, postoperative complications, or practice style differences among hospitals and physicians? World Neurosurg 123:734–739
Kim BD, Hsu WK, De Oliveira GS et al (2014) Operative duration as an independent risk factor for postoperative complications in single-level lumbar fusion: an analysis of 4588 surgical cases. Spine 39:510–520
Vigushin DM, Pepys MB, Hawkins PN (1993) Metabolic and scintigraphic studies of radioiodinated human C-reactive protein in health and disease. J Clin Invest 91:1351–1357. https://doi.org/10.1172/JCI116336
Colley CM, Fleck A, Goode AW et al (1983) Early time course of the acute phase protein response in man. J Clin Pathol 36:203–207. https://doi.org/10.1136/jcp.36.2.203
White J, Kelly M, Dunsmuir R (1998) C-reactive protein level after total hip and total knee replacement. J Bone Joint Surg Br 80:909–911. https://doi.org/10.1302/0301-620x.80b5.8708
Perry TE, Muehlschlegel JD, Liu K-Y et al (2010) Preoperative C-reactive protein predicts long-term mortality and hospital length of stay after primary, nonemergent coronary artery bypass grafting. Anesthesiology 112:607–613. https://doi.org/10.1097/ALN.0b013e3181cea3b5
Nielsen HJ, Christensen IJ, Sørensen S et al (2000) Preoperative plasma plasminogen activator inhibitor type-1 and serum C-reactive protein levels in patients with colorectal cancer. The RANX05 colorectal cancer study group. Ann Surg Oncol 7:617–623. https://doi.org/10.1007/BF02725342
Nozoe T, Matsumata T, Kitamura M, Sugimachi K (1998) Significance of preoperative elevation of serum C-reactive protein as an indicator for prognosis in colorectal cancer. Am J Surg 176:335–338. https://doi.org/10.1016/s0002-9610(98)00204-9
Fransen EJ, Maessen JG, Elenbaas TW et al (1999) Enhanced preoperative C-reactive protein plasma levels as a risk factor for postoperative infections after cardiac surgery. Ann Thorac Surg 67:134–138. https://doi.org/10.1016/s0003-4975(98)00973-4
Brewster N, Guthrie C, McBirnie J (1994) CRP levels as a measure of surgical trauma: a comparison of different general surgical procedures. J R Coll Surg Edinb 39:86–88
Grande M, Tucci GF, Adorisio O et al (2002) Systemic acute-phase response after laparoscopic and open cholecystectomy. Surg Endosc 16:313–316. https://doi.org/10.1007/s00464-001-9042-5
Hildebrandt U, Kessler K, Plusczyk T et al (2003) Comparison of surgical stress between laparoscopic and open colonic resections. Surg Endosc 17:242–246. https://doi.org/10.1007/s00464-001-9148-9
Saravi B, Ülkümen S, Couillard-Despres S et al (2022) One-year clinical outcomes of minimal-invasive dorsal percutaneous fixation of thoracolumbar spine fractures. Medicina 58:606. https://doi.org/10.3390/medicina58050606
Acknowledgements
The article processing charge was funded by the Baden-Wuerttemberg Ministry of Science, Research and Art and the University of Freiburg in the funding program Open Access Publishing.
Funding
Open Access funding enabled and organized by Projekt DEAL. Joimax GmbH provided fellowship support for B.S.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors of the present manuscript declare no conflict of interest.
Ethical approval
This retrospective observational study was approved by the local Ethics Committee Freiburg, Germany [Number: 116/200]. Written informed consent to participate in observational studies was obtained from each patient.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Saravi, B., Zink, A., Ülkümen, S. et al. Artificial intelligence-based analysis of associations between learning curve and clinical outcomes in endoscopic and microsurgical lumbar decompression surgery. Eur Spine J (2023). https://doi.org/10.1007/s00586-023-08084-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00586-023-08084-7