Introduction

Decompression of the lumbar spine is among the most common spinal surgeries performed on patients over the age of 65 who suffer from lumbar spinal stenosis (LSS) [1]. Decompression surgery is intended to relieve discomfort and improve function. As a result of spinal surgery, spinal stenosis-affected nerves are relieved of pressure by enlarging the cross section of the spinal canal [2]. It is estimated that approximately 30 percent of the general population suffers from lower back pain. In accordance with the National Center for Health Statistics, the cost of surgical intervention alone amounted to 1.65 billion dollars [3]. In the USA, lumbar spine surgeries for patients 65 and older cost an estimated 306 million dollars over the past few years [3,4,5]. It is anticipated that healthcare expenditures related to degenerative spine diseases will increase significantly by 2050 due to an increase in the elderly population, which is expected to reach 83.7 million by 2050. According to studies, 8–10% of patients who undergo spinal decompression have to repeat the procedure, which results in greater hospital expenses [6, 7]. There is also a 3.1% likelihood of cardiac problems or stroke and a 0.4% likelihood of death within one month following these procedures [3].

Surgical outcomes can be improved through innovation. Innovations in surgery are intended to increase surgical effectiveness and reduce the risks associated with postoperative complications. More and more patients expect to undergo surgery using the most advanced techniques, which are typically complex and difficult to learn. However, there is an associated learning curve with the implementation of surgical innovations, and this learning curve may negatively impact patient outcomes [8]. Analyzing outcome data acquired during an implementation period requires considering surgical learning curves. Due to the complexity of current surgical procedures and the increasing frequency of interventions, surgical learning curves are becoming increasingly significant. A learning curve analysis is becoming increasingly important not only to assist with the interpretation of outcome data during the implementation period but also to identify differences in learning curve length and learning-related morbidity. A patient's additional morbidity during the learning phase could have been avoided if the surgical team had been truly proficient. During the learning phase of technical procedures, patients may experience significant impairment of their results. Nowadays, the differences in effectiveness between newly implemented, innovative surgical procedures are relatively small in general. The impact of the learning phase on patient outcomes may therefore become more relevant. Specifically, this may be true for various types of minimally invasive spinal surgery (MISS).

Working in a narrow surgical corridor with limited visibility of anatomic landmarks is one of the major challenges of MISS. MISS techniques for the spine represent a considerable learning curve for a spine surgeon trained in traditional open techniques [9, 10]. Several techniques for decreasing the MISS learning curve have been developed, including specialized retraction systems, computer-aided navigation technology, and cadaveric training [11]. The surgeon must, however, be prepared for an increased complication rate and prolonged operative times when initially employing MISS procedures [12].

In evaluating the socioeconomic costs and benefits of novel surgical techniques, it is important to consider the operation time, the length of stay (LOS), and the postoperative complication rate. The use of predictive models and machine learning has become increasingly valuable in recent years for predicting patient outcomes based on pertinent characteristic variables [13]. As a result of the development of clinically significant outcome prediction models, society may be able to increase its utilization of healthcare resources [14, 15]. Thus, policymakers and clinicians will be able to determine the most effective allocation of budget resources between different treatment options. Through the application of machine learning and deep learning algorithms, it is possible to gain a better understanding of the acquired data and to predict whether a patient is more likely to experience a prolonged hospital stay, an extended operation time, or complications based upon a variety of clinicopathological characteristics. These algorithms can be incorporated into the hospital's software environment, enabling continuous monitoring of at-risk patients and accomplishment of precision medicine objectives. The learning curve has not been sufficiently considered in statistical algorithms and advanced machine learning algorithms to date. Despite the fact that there have been numerous AI-based predictive models published in the field of spine surgery over the past few years, no study has considered the learning curve as a confounding factor in their algorithms [13].

Since unsuccessful spine surgery has a socioeconomic impact and learning curves play a critical role in modern spine surgery, the present study explored the relationship between surgeon learning curve, operation time, length of stay, surgical complications, and several other clinical variables using artificial intelligence-based algorithms.

Methods

Study design

A retrospective cohort study was conducted between 2016 and 2021 to examine consecutive patients who had been treated for lumbar spinal stenosis using microsurgical or endoscopic interlaminar decompression.

We included patients who had lumbar spinal stenosis treated with either microsurgical or full-endoscopic decompression during the study period. An iLESSYS® system was used for the endoscopic group (Joimax GmbH, Karlsruhe, Germany). All data from patients who fulfilled our inclusion criteria were filtered based on our exclusion criteria after the initial dataset had been collected. The following patients were excluded from the study: (1) Those who were under 18 years of age, those who had spinal tumors, those who had spinal fusions, and those who had refused to allow their data to be used for research purposes.

Data handling

The patient information system was used to collect and extract patient data into a predefined datasheet. Pseudonymization of the data was performed using the “encode” command in Stata Statistical Software Release 15 (StataCorp. 2011, College Station, TX, USA). The study extraction form contained variables that were previously identified as significant for determining hospital length of stay and operation time in our previous studies [16, 17] and from a literature search. Clinical factors and surgery variables included surgery technique (microsurgical vs. endoscopic decompression), the number of targeted levels, the length of hospital stay, the classification of physical status by the American Society of Anesthesiologists, as well as complications associated with the surgery. The following complications were assessed: residual sensorimotor deficits, new sensorimotor deficits, postoperative instability, persistent stenosis requiring revision, and hematomas that required revision. Additionally, this group included demographic data (sex, age, body mass index), data regarding alcohol use nicotine use, and information regarding the type of German health insurance (public or private). C-reactive protein (CRP) levels were measured prior to surgery as part of the laboratory variable group. Additionally, we extracted the names of the surgeons who performed the surgery at the hospital during the study period. The majority of surgeries were performed by five surgeons. All other surgeons who performed fewer than ten cases were grouped under the heading “others.” The surgeons were then provided with an excel spreadsheet that contained the hospital case number, the variables “Years_experience_with_case_surgery_type” and “Number_of_surgeries_with_case_surgery_type_at_time_of_surgery,” as well as information regarding the number of cases a surgeon had already performed during the surgery of the respective technique at the time of the surgery (microsurgical or full-endoscopic). As a result, surgeons included in the study had time to review the required information in the hospital's information system and return the excel sheet for data analysis.

The binary classification task classified both the length of the operation and hospital stay as prolonged if they fell outside of the 75% percentile and normal if they fell inside of the 75% percentile [18, 19]. Unsupervised learning was conducted using three models (two-step clustering, K-means clustering, and Kohonen networks), and the model with the highest silhouette value was selected for the final clustering. In the final two-step clustering model, the Bayesian criterion (BIC) was used as the clustering criterion. When applicable, the Mann–Whitney-U or Chi2-test was used to compare cluster variables. A number of steps were carried out in the data preprocessing process, including the imputation of missing values, the partitioning of the set of training and test data, and the upsampling of the minority classes to achieve class balancing. An upsampling of classes was performed since class imbalance has been observed to significantly affect the performance results of prediction models [20]. OT and LOS classes, as well as complications, were predicted using supervised machine learning and deep learning techniques. The initial step involved the application of 15 models (support vector machine (SVM), k-nearest neighbor (KNN), discriminant analysis, Bayesian network, decision tree, logistic regression, CHAID algorithm, QUEST algorithm, classification and regression tree (C&R), C5.0 node, linear support vector machine (LSVM), Random Trees, Tree-AS, XGBoost Tree, XGBoost) of which the majority were omitted due to incomplete data fit. For the final prediction modeling, the following algorithms were applied: C5.0 node, random forest, CHAID, Tree-AS, KNN, C&R Tree, and SVM. We performed all analyses and data preprocessing steps in SPSS Modeler v18.3 (IBM, Armonk, NY, USA), Python, and SPSS v26 (IBM, Armonk, NY, USA) on a Ryzen 9 5950X 16-Core Processor, 64 GB RAM, NVIDIA Geforce RTX 3090 GPU Windows 10 computer.

Results

A total of 206 patients were included in the cohort. The total number of patients who underwent full-endoscopic decompression (FED) was 63 (male: 36, female: 27), while the total number of patients who underwent microsurgical decompression (MSD) was 143 (male: 69, female: 74). The mean age of the patients was 59.96 ± 16.49 (range: 27–92 years). The baseline characteristics of both groups are summarized and compared in Table 1. In general, there was no difference between the MSD and the FED groups with regard to the majority of study variables. For the MSD group, however, the number of levels accessed was slightly higher. Furthermore, surgeons were more experienced (9.84 years vs. 3.56 years) and had performed more surgeries (596.19 vs. 115.75) using the MSD technique than the FED technique at the time of surgery. As expected, the variables “years of experience with case surgery type” and “number of surgeries with case surgery type at the time of surgery” were significantly correlated (Spearman's rho: 0.731; p 0.001). There was also a positive correlation between the operation time and the LOS (Spearman's rho: 0.264; p < 0.001). Interestingly, there was also a significant positive correlation between years of experience with the type of surgery and the LOS (Spearman's rho: 0.18; p = 0.009). Further analysis of this relationship for both surgical techniques indicated that years of experience with case surgery and LOS were only positively correlated for the MSD (Spearman's rho: 0.237; p = 0.004), while for the FED group, there was an indirect but non-significant relationship (Spearman's rho: − 0.161; p = 0.207). Furthermore, there was a significant and inverse relationship between years of experience with case surgery and the operation time (Spearman's rho: − 0.249; p = 0.049) for the FED group, whereas for the MSD group, this relationship was positively correlated (Spearman's rho: 0.190; p = 0.023). In both groups, OT and LOS showed significant and positive correlations.

Table 1 Comparison of baseline characteristics between the groups receiving full-endoscopic decompression (FED) and microsurgical decompression (MSD)

After the first explorative and descriptive analyses steps, we performed unsupervised learning using a variety of clustering techniques (two-step clustering, k-means clustering, and Kohonen networks) to identify clusters that represent surgical learning curves (Silhouette value ≥ 0.60) for each of the surgical techniques. For the FED group, K-means clustering using the variables “Years of experience with case surgery type” and “Number of surgeries with case surgery type at time of surgery” provided the highest clustering performance (Silhouette value: 0.698) and was selected as the clustering method. For the MSD group, Two-Step clustering with the variables “Years of experience with case surgery type” and “Number of surgeries with case surgery type at time of surgery” achieved the highest clustering performance (Silhouette value: 0.819) and was selected as the clustering algorithm. There were two clusters representing the early learning curve phase (ELC; n = 137; 66.8%) and the late learning curve phase (LLC; n = 68; 33.2%) in the MSD group. Among the ELC group, the median number of surgeries with case surgery type was 136, while among the LLC group, it was 1610. A comparison of the study variables among the two clusters is shown in Table 2. Surgical procedures performed by the LLC group involved patients who had a higher number of levels, were older and had a higher CRP level preoperatively. However, patients in this group also showed longer OT and longer LOS. There was no significant difference between the groups in terms of BMI. Additionally, neither the sex nor the complication rates differed significantly. Nevertheless, private insurance patients were more likely to undergo surgical procedures in the ELC group.

Table 2 Patient and surgical parameters by learning curve phases in microsurgical decompression (MSD)

Furthermore, the FED group consisted of two clusters representing the early learning curve phase (ELC; n = 66; 68.0%) and the late learning curve phase (LLC; n = 31; 32.0%). For the FED group, the median number of surgeries with case surgery type at the time of surgery was 72 in the ELC group and 274 in the LLC group. Table 3 compares study variables between the two clusters. As opposed to the MSD examination, FED patients did not differ significantly between the ELC and LLC groups on most variables. In the LLC group, CRP was significantly higher preoperatively. Additionally, private insurance patients were more likely to undergo surgery in the LLC group. Other variables, such as the complication rate, did not differ significantly between the ELC and LLC.

Table 3 Patient and surgical parameters by learning curve phases in full-endoscopic discectomy (FED)

In the next step, we applied various machine learning and deep learning algorithms to predict the OT and LOS classes as well as the occurrence of complications based on study variables, including the learning curve as a confounding factor. In Table 4, performance measures for the best algorithms are presented for each outcome (OT, LOS, and complications). The most important predictors of OT were preoperative CRP (0.201; 95% CI 0.182–0.219), age (0.142; 95% CI 0.129–0.155), and BMI (0.137; 95% CI 0.126–0.149). The learning curve variable reached place 5 (number of surgeries with case surgery type at time of surgery [0.114; 95% CI 0.105–0.124]). The study group (MSD vs. FED) ranked place 9 on the important predictor list (0.051; 95% CI 0.043–0.058). Further, the random forest model indicated that LOS (0.160; 95% CI 0.146–0.175), OT (0.142; 95% CI 0.130–0.155), BMI (0.127; 95% CI 0.116–0.138), and age (0.125; 95% CI 0.115–0.136) were the most significant predictors of complications. Neither the learning curve (6th most important predictor; [0.092; 95% CI 0.083–0.100]) nor the surgical technique (FED vs. MSD; 11th most important predictor [0.020; 95% CI 0.020–0.031]) had a significant impact on complications. The most important predictors of LOS were age (0.239; 95% CI 0.220–0.258), preoperative CRP (0.154; 95% CI 0.139–0.168), and BMI (0.126; 95% CI 0.113–0.138). There was not any relevant impact of the surgical technique on the LOS (the 15th most important predictor [0.013; 95% CI 0.009–0.016]). The learning curve variables reached place 5 (number of surgeries with the type of surgery at the time of surgery [0.102; 95% CI 0.092–0.112)) in the predictor list. In general, the results suggest that the patient characteristics have a greater impact on LOS, OT, and complications than the surgical technique (FED versus MSD) or the surgeon's learning curve. Figures 1 and 2 illustrate the variability across the folds for the machine learning algorithms. In addition to the 95% CI shown in Table 4, the standard deviation of the cross-validation folds' performance metrics has been calculated for each algorithm and prediction category. These values indicate the variability of each algorithm's performance across the folds, with lower values suggesting more consistency. For predicting Complications, the C&R Tree algorithm showed the most consistent performance in terms of Accuracy, with a low standard deviation (Std Dev = 2.60), while the KNN algorithm displayed the greatest variability (Std Dev = 5.05). In terms of AUC, the C&R Tree again was the most consistent (Std Dev = 0.028), and the CHAID algorithm had the highest variability (Std Dev = 0.063). In the prediction of length of stay (LOS), the random forest algorithm was found to be the most consistent for accuracy (Std Dev = 2.45), with Tree-AS showing the greatest variability (Std Dev = 4.90). For AUC, the Tree-AS algorithm was the most consistent (Std Dev = 0.029), while the random forest showed the most variability (Std Dev = 0.079). Regarding the prediction of operation time (OT), the C&R Tree algorithm exhibited the most consistent Accuracy (Std Dev = 2.96), while Tree-AS had the highest variability (Std Dev = 5.74). For AUC, the CHAID algorithm was the most consistent (Std Dev = 0.027), with the KNN algorithm having the highest variability (Std Dev = 0.070). These results suggest that for this dataset, the C&R Tree algorithm tends to have more consistent performance across different folds in predicting complications, both in terms of Accuracy and AUC. The random forest algorithm shows similar consistency in predicting the length of hospital stays. However, the variability of the algorithms' performance also indicates that certain algorithms, like Tree-AS and KNN, may be more sensitive to the specific data presented in each fold, which could be due to factors such as the algorithms' complexity, parameter settings, or the nature of the data itself.

Table 4 Performance metrics of machine learning algorithms for hospital stay duration, complication occurrence, and operation time prediction
Fig. 1
figure 1

Comparative accuracy of machine learning algorithms for predictive modeling. This figure illustrates the mean accuracy of seven machine learning algorithms across fivefold cross-validation for predicting hospital stay duration (LOS), the occurrence of complications, and operation time (OT). Each algorithm is represented by a series of colored slices, with each color corresponding to one of the five folds. This visualization allows for a quick comparison of stability and performance variations across folds for each algorithm. The algorithms included in the analysis are C5.0 Decision Tree, random forest classifier, chi-squared automatic interaction detector (CHAID), Tree-AS algorithm, support vector machine (SVM), K-nearest neighbors (KNN), and classification and regression tree (C&R Tree). Algorithms with more uniform colors across folds indicate consistent performance, while those with varied colors suggest variability in accuracy across different cross-validation folds

Fig. 2
figure 2

Area under the curve (AUC) performance of machine learning algorithms for prediction modeling. This figure displays the AUC per fold for the same set of machine learning algorithms and prediction categories as Fig. 1. The AUC metric represents the model's ability to distinguish between the binary classes of the outcomes. Similar to Fig. 1, the colored slices represent the AUC scores for each fold, allowing for a comparison of the model's discriminative performance across the cross-validation process. The algorithms included in the analysis are C5.0 Decision Tree, random forest classifier, Chi-squared automatic interaction detector (CHAID), Tree-AS algorithm, support vector machine (SVM), K-nearest neighbors (KNN), and classification and regression tree (C&R Tree)

Discussion

This study is the first to present evidence related to the use of AI-based algorithms, specifically focusing on the impact of the providers themselves on LOS, OT, and complications. These results are useful for the implementation phases of MISS since the association between these important parameters remains unclear. Data obtained from lumbar decompression surgery patients suggests that OT, complication rates, and length of stay can be reliably predicted. Furthermore, our institution's results indicate that a median of 72 surgical cases performed in the early learning curve led to similar clinical outcomes as those performed by more experienced surgeons. In the MSD group, the OT and LOS were higher due to more complex procedures (higher number of levels, older patients) being performed by more experienced surgeons. According to the AI-based analyses, the longer OT and LOS in the MSD group are likely to be related to the patient characteristics. Despite the fact that we only used a small number of cases from one institution, the algorithms demonstrated satisfactory performance metrics. The findings of our study can thus serve as a basis for the development of more accurate models through the use of larger, multicenter prospective studies.

In assessing surgeon progress through the procedural learning curve, clinical outcomes such as the complication rate, LOS, and OT are considered to be among the most relevant parameters. MIS approaches to the spine are limited by the absence of clear anatomic landmarks. As noted by Jhala and Mistry [21], complications were associated with a lack of familiarity with endoscopic image orientation and a suboptimal approach to the surgical target during the endoscopic discectomy learning curve period. In this initial series of patients, multiple durotomies were performed on the wrong level, and facet joint structures were unintentionally removed. Several authors have argued that an ideal entry point and trajectory during the surgical approach are the key to overcoming the MISS learning curve [22,23,24]. The results of a systematic review found that all complications, adverse events, and conversions to open procedures occurred during the first 30 procedures when these parameters were reported as a function of the chronologic case number [10]. In a previous study, we found that 20 FED surgeries significantly reduced complication rates and operation times [16]. In accordance with Zelenkov et al. 's [25] learning curve assessment, the plateau of the learning curve would be reached within the first 20 patients of full-endoscopic interlaminar and transforaminal surgery. It was noted by Lee et al. [26] that complication rates were higher and operation times were longer in the first cohort of patients treated with FED. The learning curve plateaued after the 100th case. Furthermore, as compared to the more experienced phase of the learning curve, the complication rates were twice as high in the first cohort of patients. According to our institution's analysis, 72 surgeries resulted in similar clinical outcomes for the FED technique, which is in line with literature results ranging between 20 and 100 cases to reach the plateau.

Even though several risk factors have been identified for patients who will have a longer LOS after lumbar decompression surgery, an effective framework has not been developed that will predict whether a patient will have a long or short OT and/or LOS following surgery or suffer complications. The use of artificial intelligence techniques to predict prolonged LOS following lumbar spinal stenosis surgery has been identified only in one study [27]. Specifically, the authors report an AUC of 0.54 in their paper, which is insufficient as an AUC of 0.5 implies that the system does not possess the capacity to classify (similar to tossing a coin). The same study has been carried out in order to predict prolonged OT [27]. In their report, an accuracy rate of 0.81 was reported, a result that is comparable to the results of our algorithms, which ranged from 0.72 to 0.99. The difficulty of comparing surgical complications predictive models arises from the differing definitions used for the term “complication” (e.g., readmission, re-operation, wound infections, etc.) and from the pooling of various spine surgery procedures for predictive modeling in some studies [20]. The study by Siccoli et al. [27] focused on lumbar spinal stenosis and reported an accuracy of 0.81 in predicting re-operations, while the majority of available studies address lumbar spine fusion procedures [13]. In the present study, the use of machine learning and deep learning techniques was demonstrated to be effective in predicting complications, prolonged OT, and LOS, which supports the hypothesis that these outcomes are associated with other clinical features. Additionally, these findings can facilitate the assignment of resources and planning of discharge for patients with specific risk profiles. An open-source tool for the provided class predictions can be developed using the variables used in the present approach. As a result of such a procedure, the models provided would be able to be externally validated. In other orthopedic surgical specialties, risk assessment tools have been used in the past to assess patient risk [28]. The present results could also be incorporated into such tools. A quantitative assessment of the risks associated with surgery would be beneficial to both patients and physicians before surgery. Furthermore, by setting patient expectations properly, the entire patient care team will be able to put together an appropriate plan that will improve patient safety and satisfaction as well as reduce the duration of operations, the length of stays, and the number of complications. Notably, there is ongoing work to integrate various data modalities in predictive modeling for spine surgery. This includes the development of hybrid models and the application of radiomics techniques that incorporate imaging data into the modeling process [13, 29]. The insights from our current study have the potential to enhance these advanced modeling approaches.

This study found that AI-based algorithms were capable of predicting and discriminating between classes with satisfactory accuracy and AUC. AI-based techniques have been applied to several orthopedic subspecialties with similarly promising results. The use of machine learning methods and neural networks has previously been used to predict intraoperative blood loss, prolonged hospital stays, patient-reported outcomes, and discharge disposition in the field with similar or inferior results [13, 30,31,32,33,34]. As this study builds upon past work in this area, it contributes to the growing body of evidence that supports the use of machine learning in orthopedic surgery. It was reported in a previous study that prolonged LOS was associated with a longer operating time and a higher ASA classification [35]. However, we found that operating time contributed significantly more to a longer LOS than the ASA class. Surgical practice styles and preferences indicate a correlation between prolonged LOS and operating time, which is in line with our recent findings, which indicate that operating time is associated with an increase in LOS [36]. Comorbidities of patients have also been shown to contribute to longer surgical times in the literature, as well as our results [37]. Surgical prediction models should also take into account the surgeon's learning curve. Multiple parameters may affect the length of time it takes to perform an operation, which may also affect LOS. The invasiveness of a spinal procedure is generally correlated with its outcome (blood loss, duration of the operation, and risk of complications). In this context, it may be difficult to reduce the importance of features to a few parameters, especially if not all potentially important features are incorporated into the study design—something that is not possible in most retrospective studies.

Considering the results of our study, CRP levels at the time of surgery may be associated with the outcome of the surgery. During acute-phase inflammation, CRP is a well-recognized marker of systemic inflammation [38]. Furthermore, CRP can be used to monitor the effectiveness of treatment as well as to screen for inflammation early in the course of a disease process. CRP has been shown to rise in response to surgical trauma, peaking 48 h following surgery [39, 40]. There are cases in which patients' CRP responses are only incomplete or do not occur at all [40]. In apparently healthy individuals with validated CRP risk categories, cardiovascular events were predicted accurately [41]. There has been evidence that preoperative CRP levels are linked to longer-term morbidity and mortality after cardiac surgery [42,43,44]. The peak CRP response postoperatively has also been associated with the degree of surgical trauma [45]. The results of minimally invasive surgical procedures tend to yield lower levels of CRP than those produced by open procedures, although not all studies agree on this [46,47,48]. Generally, research on the effects of preoperative CRP on spinal surgery patients is limited. In spite of the intuitive notion that increased inflammation prior to surgery might have an adverse impact on clinical outcomes, this phenomenon will need to be confirmed in future studies.

Despite the fact that the algorithms presented in this study were well performing and the learning curve was incorporated into the algorithms in a novel way, the study has some limitations. In the early stages of learning a new MISS technique, the learning surgeon may preferentially select straightforward cases, which may lead to misleading data regarding the clinical outcomes. A properly designed randomized would reduce this selection bias. Furthermore, some patients had a high LOS (outliers) due to previous conservative treatments as well as multimorbidity, which increased the overall mean LOS for the respective groups. A larger dataset would have allowed for more specific subgroup analyses excluding patients who had previously undergone conservative treatment. Algorithms must be designed on the basis of a representative sample of patients undergoing spinal decompression surgery to be effective therapeutically. Based on one institution with a limited sample size, our data may not apply to other institutions. As a result, it is imperative that the models provided are externally validated. A further disadvantage of the data is that it was collected retrospectively, which may have a negative impact on the evidence grade since retrospective data is not necessarily as reliable as data collected prospectively. Moreover, since the selection of variables was retrospective, we were not able to include additional variables that may have enhanced the model further. Our study, however, attempted to incorporate important variables based on our previous research and literature review [16].

Conclusions

The decompression of the lumbar spine is among the most common spine surgeries. Operation time, hospital length of stay, and complication rates significantly affect the associated healthcare costs. There has been a lack of prediction models that take into account the learning curve of physicians in order to predict these parameters, but these models are highly warranted in order to effectively analyze societal healthcare resources. Through the use of this method, institutions are able to make comparisons between therapeutic strategies across different disciplines and establish relative priorities for allocating resources across various interventions. According to our results, a median of 72 cases of FED surgeries led to comparable clinical outcomes compared to experienced surgeons in the early learning curve phase. Patient characteristics appear to be more influential on clinical outcomes than the learning curve or the surgical technique. The results of the study suggest that machine learning and deep learning algorithms may be useful in predicting whether patients will experience increased LOS, OT, or experience complications following lumbar decompression surgery based on data obtained from several variables, including the learning curve. It is necessary to incorporate the algorithms provided into open-source software and externally validate them through large-scale randomized controlled trials in order to use the predicted tools in clinics.