Introduction

The operating room (OR) is healthcare’s epicenter, efficient OR resource management, personnel, equipment, is vital for top-tier surgical care [1]. Recently, Artificial Intelligence (AI) and Machine Learning (ML) integration are transforming OR management, redefining surgical planning and optimization [2]. The journey towards AI and ML in OR management began with a realization: healthcare’s data held untapped potential from patient demographics to surgery histories, anesthesia protocols to recovery room dynamics [3]. In 2015, research on ML in medicine grew exponentially, transitioning from theory to real-world applications [4]. With increased ML understanding and computing power, healthcare is using this technology to tackle complex challenges [5]. In the era of data-driven healthcare, ML became a cornerstone for OR tasks, predicting surgical durations, optimizing schedules, and improving resource use [6]. ML algorithms, like decision trees and random forests, redefined OR efficiency, promising more accurate predictions and proactive decision-making [7]. This systematic review updates our prior work, “Artificial Intelligence: A New Tool in Operating Room Management. Role of Machine Learning Models in Operating Room Optimization” focusing on Feb 2019 to Sep 28, 2023 [4]. In the prior review, we explored ML’s pivotal role in reshaping OR management, emphasizing AI-driven algorithms’ potential for scheduling, case duration prediction, and resource allocation streamlining. In this update, we delve into the latest ML developments in perioperative medicine, exploring how they redefine OR efficiency and patient care. We explore ML’s expansion into perioperative medicine, from Post Anesthesia Care Unit (PACU) resource allocation to reducing surgical case cancellations. We’ll also spotlight integration challenges and opportunities as we aim to maximize AI’s potential for all in healthcare.

Methods

Search Strategy

This comprehensive update was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A systematic search was performed across multiple databases, including PubMed, EMBASE, and Scopus databases from February 2019 to September 28, 2023. The search string was adapted from the previous review and comprised various combinations of the following terms: “machine learning,” “anesthesia,” “perioperative,” “PACU,” “operating room,” “recovery room,” and “robotic assisted surgery.”

Inclusion and Exclusion Criteria

We considered all relevant studies that employed ML techniques in the context of OR, anesthesia, Recovery Room (RR), and PACU management. Studies published before February 2019, abstracts and those not written in English were excluded. Additionally, pediatric and veterinary studies were excluded from the analysis.

Screening and Selection

Two independent reviewers conducted the screening process in two stages: title/abstract screening and full-text screening. Any discrepancies or uncertainties were resolved through discussion and consensus. After removing duplicates, an initial screening process excluded reviews and conference papers, resulting in a refined pool of potential studies. The remaining full-text articles were assessed, and studies not directly related to ML application were excluded. The final selection included studies published between February 2019 and September 28, 2023, that met the eligibility criteria.

Data Extraction

Data extraction followed a structured approach, with a focus on study characteristics related to ML methods, patient populations, trial settings, variables, and outcomes. The extracted data were synthesized narratively, focusing on the key themes and findings related to the role of new technologies in perioperative management from an administrative and managerial standpoint. The findings were summarized and presented in a comprehensive manner.

Results

The search returned 90,492 papers published between Feb 2019 and Sep 28, 2023, without duplicates; 44,723 were full text. Only 2,009 were clinical trials and randomized controlled trials. We further skimmed, keeping only English studies involving the adult human population (18 + years), totaling 1,071 studies. After screening the remaining 30 studies, we discarded eight papers: two were not strictly related to ML application, and six were theoretical studies. In the final selection, 22 studies were included in the analysis [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]. Figure 1 displays the PRISMA flowchart.

Fig. 1
figure 1

Literature search flow diagram based on PRISMA.

Tables 1 and 2, and 3 summarize key study characteristics, focusing on ML methods, populations, trial settings, variables, and outcomes. Table 1 predicts surgical intervention duration, Table 2 covers PACU stay prediction, and Table 3 focuses on surgical procedure cancellations.

Table 1 Main studies about prediction of surgical time
Table 2 Main studies about PACU length of stay
Table 3 Main studies about risk of surgery cancellation

Among the 22 studies analyzed [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29], sixteen primarily focused on predicting the duration of surgical cases [8,9,10,11,12,13, 15,16,17,18,19,20,21,22,23,24], three centered on predicting the length of stay in the PACU [25,26,27]. One study addressed both aspects [14], while only two studies examined the identification of surgical cases at high risk of cancellation [28, 29]. Additionally, it is noteworthy that only one of the studies is a randomized clinical trial [23], suggesting a need for more robust experimental designs in this research domain. In the selected studies, the most frequently used machine learning algorithms are represented by Random Forest, XGBoost, Linear Regression, Support Vector Regression (SVR), Neural Networks, Bagging, Ensemble Methods, Perceptron, CatBoost, and Logistic Regression. All of them [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29] demonstrated the capability to enhance predictive accuracy for surgical durations, PACU length of stay, and high-risk surgical case cancellation predictions. Notably, XGBoost exhibited the best overall performance when used. Ensemble methods, like Bagging and Random Forest, improved prediction accuracy by combining models [14]. ML models also optimized scheduling and resource allocation. For instance, Hassanzadeh et al. [11] predicted daily operating theatre arrivals with 90% accuracy, optimizing staffing and resource allocation. Several studies, including those from Bartek et al. [8] and Lam et al. [13], emphasized the importance of tailoring ML models to individual surgeons or considering additional patient and surgery-related factors.

The observed trend in scientific paper publications on ML in perioperative medicine showed an increase from 2015 to 2019 [4], followed by a decline (Fig. 2).

Fig. 2
figure 2

Publication per year since 2019. Note: the timeline counts all publication dates for a citation as supplied by the publisher. These dates may span more than one year. This means the sum of results represented in the timeline may differ from the search results count

This may be indicative of several factors. Initially, there was a surge in interest and investment in ML applications, optimizing OR management, cost reduction, and patient care quality improvement. However, the decrease from 2020 onwards may be due to promising research already being published, practical challenges, or a need for deeper understanding and resources. Characteristic of the learning curve are represented in (Fig. 3).

Fig. 3
figure 3

Learning curve of artificial intelligence and publications

This trend reflects the evolving nature of ML in perioperative medicine, necessitating a detailed analysis of research landscape, funding, technology, and evolving priorities. Although studies have demonstrated the effectiveness of AI/ML systems in OR applications, physicians’ hesitancy or reluctance to incorporate these systems into decision making remains a significant barrier. This phenomenon is caused by many factors. The complex nature of AI/ML technologies, particularly in healthcare setting, can contribute to slow adoption in clinical practice. For clinicians, it may be a challenge to understand the algorithms and processes that underpin these systems. The novel and evolving nature of AI technologies may create a perceived risk, causing clinicians to hesitate to fully embrace and trust these tools. Clinicians may not be sufficiently familiar with the concepts and operation of ML/AI systems. Failure to educate and train on how these technologies work can lead to skepticism. Filling this knowledge gap is essential to build trust and confidence among clinicians. Furthermore, in the OR, where patient safety is paramount, clinicians may be particularly reluctant to adopt technologies that may impact patient outcomes. Concerns about the reliability and safety of AI systems may contribute to a conservative approach to their adoption. While studies have demonstrated the efficacy of AI/ML systems in controlled environments, clinicians may be reluctant to consider their applicability and generalisability in the real world. Limited clinical validation and insufficient evidence of improved patient outcomes in diverse scenarios can hinder the acceptance of these technologies. Moreover, clinicians often face ethical and legal considerations when integrating AI into patient care. Issues related to data privacy, liability, and the ethical implications of automated decision-making can contribute to hesitancy in adopting ML/AI systems in ORs. Finally, effective communication and collaboration between data scientists, engineers, and clinicians are crucial. Misalignment in goals, expectations, and language between these interdisciplinary teams can lead to misunderstandings and hinder the successful deployment of AI in clinical settings. Addressing these factors involves not only improving the explainability and transparency of AI models but also implementing robust education and training programs for clinicians [30]. Building a collaborative environment that involves clinicians in the development process, ensuring rigorous clinical validation, and addressing ethical and legal concerns are essential steps toward fostering trust and acceptance. Overcoming these challenges can contribute to accelerating the integration of AI/ML systems in OR decision-making processes. Figure 4 Illustrates a comparison of the number of publications in each area between the previous version of the review and this update.

Fig. 4
figure 4

Number of publications per area

Discussion

Out of the 22 selected papers [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29], 17 focus on predicting the duration of surgical planning [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24]. This finding underscores the crucial role of accurate estimation in surgical case duration for effective operating room management. It presents a complex and multifaceted challenge that profoundly impacts OR scheduling, resource allocation, and overall operational efficiency. Our previous review [4] primarily highlighted the promising results of a proprietary algorithm known as leap Rail® [31]. While it exhibited an improvement in predictive accuracy compared to traditional methods, our updated review reveals a more nuanced picture. More recent studies, such as the work by Bartek and colleagues [8], have delved deeper into the use of machine learning models, emphasizing the importance of surgeon-specific models. These newer models outperform service-specific ones and significantly enhance the accuracy of case-time predictions, offering substantial benefits in terms of operating room management. Our updated analysis also demonstrates the dominance of XGBoost in machine learning models over other algorithms, including the random forest model and linear regression. XGBoost’s superior predictive capabilities are showcased, which is a notable deviation from the earlier review’s focus on leap Rail® [30]. This underlines the rapid advancements in machine learning technology and its potential to refine surgical case duration predictions. However, is important to keep in mind that different outcomes could require different ML algorithms. [32] Another key finding in the previous review was the potential cost savings associated with accurate surgical case duration predictions in robotic surgery. However, our updated review provides new insights. Jiao and colleagues [11] introduced the use of modular artificial neural networks (MANN) for predicting remaining surgical duration. MANNs are neural networks equipped with external memory. They excel at tasks requiring context and sequential reasoning, making them suitable for certain clinical applications. They leveraged anesthesia records from a diverse range of surgical populations and hospital types, showcasing the robustness and adaptability of their model. MANN consistently outperformed Bayesian statistical approaches, particularly during the last quartile of surgery, indicating its potential for cost savings and operational efficiency improvements. The study also assessed the generalizability and transferability of the MANN model. It found that even healthcare systems with lower operative volumes could benefit from fine-tuning a model trained at larger nearby systems. It also highlighted the lack of meaningful information in the anesthesia record during certain phases of surgery, suggesting room for improvement. This study underscores the rapid advances in machine learning algorithms and their application in real-world surgical scenarios. Variational autoencoders (VAEs), which are generative models designed for learning latent representations of data, also fit in this context. They consist of an encoder and a decoder. The encoder maps input data to a probability distribution in a latent space, and the decoder reconstructs data from samples in this latent space. Linking advanced models like MANNs and VAEs to clinical sense implies that these models could contribute to the field of personalized medicine by learning patient-specific representations, enabling tailored treatment plans and also address clinical needs, enhance diagnostics, improve patient outcomes, or streamline healthcare processes [33]. The work conducted by Strömblad et al. [23], a single-center, randomized clinical trial brought additional insights. They explored the accuracy of predicting surgical case durations using a machine learning model in comparison to the existing scheduling-flow system. This research emphasized the benefits of a comprehensive and data-driven prediction approach, which resulted in a significant reduction in mean absolute error (MAE), contributing to enhanced prediction accuracy. Importantly, this decrease in MAE translated into reduced patient wait times without adversely affecting surgeon wait times or operational efficiency, indicating a harmonious balance between efficiency and patient outcomes. This study is the first and only randomized clinical trial on the subject, to our knowledge, representing a significant milestone.

When comparing the reviews, both the previous [4] and the updated one underscore the potential benefits of improved prediction accuracy in surgical scheduling and operating room management. However, the newer studies provide more specific insights into practical implications. Bartek and colleagues’ work [8] shows a reduction in wait times and resource utilization through the implementation of machine learning-driven models. This has a significant impact on patient outcomes without disrupting operational efficiency, reinforcing the value of these predictive models in real-world healthcare settings. In comparing the updated review of predictive models for PACU length of stay with the previous version [4], we can discern a substantial evolution also in this field. The earlier review had already acknowledged the importance of improving hospital organization and internal logistics to reduce the costs associated with time and space waste in healthcare [4]. It had highlighted issues of congestion in the PACU due to inadequate surgical planning, which often led to patients being held in the OR when PACU beds were unavailable, incurring higher costs. In the current update, we have expanded our analysis to include more recent studies, specifically focusing on predicting PACU length of stay, and their findings are striking. One study conducted by Schulz and colleagues [25] utilized a dataset of 100,511 cases to develop predictive models for PACU length of stay. They considered variables such as patient age, surgical urgency, duration of surgery, and more to create a neural network model. Notably, the study evaluated individual anesthesiologists, categorizing them based on their mean PACU length of stay. The predictive model, relying on routinely collected administrative data, significantly explained variations in individual anesthesiologists’ mean PACU length of stay. This study underscored the practicality of deploying predictive models within existing hospital infrastructure. Tully and colleagues’ research [27], another notable study in this field, aimed to develop a model that could classify patients at high risk for a prolonged PACU stay of ≥ 3 h. The study considered factors like surgical procedure, patient age, and scheduled case duration. The most effective model was XGBoost, which significantly improved the ability to predict prolonged PACU stays. Furthermore, by using the XGBoost model’s predictions, cases were re-sequenced based on the likelihood of a prolonged PACU stay, which led to a substantial reduction in the number of patients in the PACU after hours. These recent studies collectively signify a remarkable shift in the field of PACU length of stay prediction. They highlight the potential of predictive models, machine learning, and data-driven approaches to enhance healthcare quality and operational efficiency. The adoption of big data analytics and optimization of case sequencing have clear implications for improving patient outcomes and resource allocation. It is evident that these models hold significant promise for healthcare institutions, potentially offering considerable cost savings and enhanced patient care. When comparing these recent findings with the previous version of the review, we see a marked advancement in the sophistication of predictive models. The earlier version primarily emphasized the issue of inefficient PACU use and its financial implications, highlighting the potential for cost savings through improved surgical planning. The new studies demonstrate not only the cost-saving potential but also the power of data-driven predictive models, which can significantly enhance the efficiency and effectiveness of healthcare operations.

One of the significant challenges in the healthcare industry is the unexpected cancellation of surgical cases. Surgical cancellations not only disrupt the workflow of healthcare facilities but also pose risks to patient safety and satisfaction [34]. To address this issue and optimize surgical scheduling, ML techniques have emerged as a promising solution for the early detection of potential cancellations. Comparing the updated review with the previous version [4] reveals substantial advancements in this critical aspect of healthcare management. In the earlier review [4], the focus was on the high costs associated with surgical case cancellations, particularly highlighting the cost variation across different types of surgeries. It underscored the need for automatic classification methods to detect high-risk cancellations from large datasets. Furthermore, the review discussed the potential for ML algorithms, specifically random forest, in identifying surgeries at high risk of cancellation, with the promise of optimizing healthcare resource utilization and cost-efficiency. The current review continues to emphasize the significance of addressing surgical case cancellations in healthcare. For example, Luo et al. [28] significantly contribute to the field by leveraging ML to identify high-risk cancellations. Their research focuses on a dataset of elective urologic surgeries, comprising over 5,000 cases, with the aim of identifying surgeries prone to cancellation due to institutional resource- and capacity-related factors. Authors employed three ML algorithms, including random forest, support vector machine, and XGBoost, and evaluated their performance across various metrics. Their findings revealed the suitability of ML models for identifying surgeries at low risk of cancellation, effectively narrowing down the pool of surgeries with higher risk. Moreover, the random forest models displayed good efficacy in distinguishing high-risk surgeries, with an area under the curve (AUC) exceeding 0.6, indicating an interesting result in this context. Different sampling methods allowed for adjustments in model performance, highlighting the trade-offs between sensitivity and specificity. The study concluded that ML models are feasible for identifying surgeries at risk of cancellation. In a subsequent study by Zhang and colleagues [29] from the same center, the focus shifted to providing effective methodologies for recognizing high-risk surgeries prone to cancellation. They also utilized the same dataset but explored a variety of machine learning models, including random forest, logistic regression, XGBoost, support vector machine, and neural networks. The study identified the random forest model as the top-performing algorithm, achieving a high accuracy of 0.8578 and an AUC of 0.7199. Despite the high specificity and negative predictive value, the study acknowledged the need for improving sensitivity and positive predictive value in identifying high-risk cases. In summary, both studies [28, 29] aim to address the challenge of surgical case cancellations in healthcare using machine learning techniques. They highlight the importance of selecting the right machine learning algorithm for this task and acknowledge the need for improving sensitivity and positive predictive value. Both studies [28, 29] acknowledge limitations related to their focus on elective urologic surgeries within a single hospital and suggest the potential for future research to expand to diverse healthcare settings for improved generalizability. Comparing the two reviews, the earlier version [4] emphasized the need for ML algorithms to address surgical case cancellations but did not delve into specific research findings for a lack of studies on the argument. In contrast, in the current version we provide in-depth insights into the suitability of different ML models for identifying high-risk surgeries. Both reviews share a common theme: the critical role of ML techniques in addressing surgical case cancellations to enhance healthcare resource utilization and cost-efficiency.

In summary, the comparison between the two editions of the systematic reviews on the artificial intelligence integration in operative room management highlights a remarkable evolution in each domain. In the case of surgical case duration estimation, the newer review showcases a shift towards machine learning-based models, notably XGBoost, and a heightened focus on surgeon-specific models. This means the realization of machine learning’s potential, promising increased precision in predictions, cost reduction, and enhanced operating room management. Similarly, in the PACU length of stay prediction domain, the updated review underscores the transformative potential of predictive models, emphasizing the value of big data analytics, optimized case sequencing, and risk-adjusted metrics for improving patient outcomes and resource allocation. It acknowledges the challenges of real-world implementation and the need for further validation through prospective studies and collaborative efforts. Overall, the updated review provides deeper insights into the practical applications of these advanced techniques, offering healthcare providers and managers valuable tools to enhance efficiency, reduce costs, and improve patient care. The shift towards center-specific models in healthcare, particularly for organizational aspects, merits in-depth exploration. This trend reflects the growing recognition that customization based on center-specific variables, such as the type of surgeon or anesthetist, can lead to more accurate predictions and better resource allocation. The balance between clinical and organizational applications in these models remains a key consideration. While clinical models focus on patient-specific factors, organizational models, including center-specific ones, primarily address resource optimization, scheduling efficiency, and cost reduction. The choice between center-specific and clinical models ultimately depends on the specific goals and priorities of a healthcare institution. Regarding clinical implementation, it is crucial to investigate how many of these advanced models will progress beyond research to practical application. The shift towards real-world usability is gaining traction, but not all studies provide tools or software for direct application. A critical aspect is the integration of these models into daily work routines. Successful implementation often involves interdisciplinary collaboration between data scientists, healthcare professionals, and administrators. These tools can be used by a range of stakeholders, including surgeons, anesthetists, scheduling teams, and hospital administrators. Different outputs from these models serve varied purposes. For example, clinical models can guide treatment decisions, while organizational models can enhance resource allocation and scheduling efficiency. The extent to which these models are designed for easy integration and use in daily healthcare operations is a key area of investigation, ultimately impacting their practical utility and impact on patient care and healthcare management.

Limitations

The limitations of this systematic review include the potential for publication bias, as only articles published in English were included. Additionally, the availability of relevant literature may vary across different databases, potentially impacting the comprehensiveness of the review even if, efforts were made to mitigate these limitations by employing a rigorous search strategy and conducting a thorough screening process. Nevertheless, conducting a comprehensive assessment and formulating definitive conclusions regarding the optimal algorithm for predictive models of perioperative complications remains a challenge due to the diverse nature of settings and variations in the algorithms under review. The lack of standardization across studies has impeded our ability to conduct a meta-analysis utilizing both univariate and multivariate random effect models. Furthermore, most studies exhibit a deficiency in external validation of their models. While the use of AUC as an evaluation criterion is practical, it is essential to acknowledge its limitations, particularly in scenarios involving imbalanced datasets within the realm of AI. The significance of ensuring data quality for the successful application of AI extends across various domains, including research, clinical practice, and health system organization. However, achieving datasets of both high quality and quantity necessitates rigorous scrutiny at every stage of the process, spanning from data collection to the selection of ML models and their algorithms.

Conclusion

In conclusion, this systematic review provides a comprehensive overview of the recent advancements in the application of artificial intelligence, particularly machine learning, in the management of operating rooms. The analysis of the 22 selected studies spanning from February 2019 to September 28, 2023, sheds light on the evolving landscape of AI-driven solutions in perioperative medicine. The review highlights the pivotal role of machine learning in predicting surgical case durations, optimizing resource allocation in the PACU, and detecting surgical case cancellations. These AI-driven models have demonstrated their potential to significantly enhance the efficiency, cost-effectiveness, and safety of surgical procedures. It is evident that machine learning techniques are increasingly integrated into healthcare management to address complex challenges. Furthermore, the review recognizes that the adoption of machine learning in perioperative medicine is not without its challenges. Issues such as data access, privacy concerns, and the need for extensive validation studies pose hurdles to the widespread implementation of AI solutions. The review also suggests that as the field matures, researchers and practitioners must develop a deeper understanding of AI applications, which may lead to a slowdown in new publications as they tackle more complex questions and challenges. Overall, this systematic review underlines the transformative potential of artificial intelligence, particularly machine learning, in reshaping the management of operating rooms. It calls for continued research, collaboration, and innovation to overcome existing challenges and unlock the full benefits of AI for healthcare administrators, practitioners, and most importantly, patients. As we move forward, the integration of AI into operating room management holds the promise of further enhancing healthcare delivery and improving patient outcomes in the years to come.