Introduction

Numerous barriers prevent efficient and effective robotic surgical training, including reliance on observational learning, low-quality feedback, and inconsistent assessment [1,2,3,4]. As surgeons apply the robotic platform to more and increasingly complex procedures, these problems stand to intensify [5]. Accepted approaches to addressing these issues have been unevenly adopted by different training programs; indeed, while the Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) recommends structured curricula for robotic surgical education, training varies substantially by specialty, program, and stage [6,7,8,9]. Such variation reflects heterogeneous uptake of robotic surgery as a whole and insufficient faculty development highlighting the specifics of robotic surgical instruction [5, 10, 11].

At a basic level, programs’ use of simulation differs considerably, with some requiring simulation experience prior to operative involvement and others struggling with adequate access to simulation [12, 13]. Among those incorporating simulation in training, simulation types range from low-cost home simulation to virtual reality to complex tissue- and cadaver-based models [14,15,16]. Feedback during simulation also varies, with some programs emphasizing real-time, in-person expert feedback and others relying on self-monitored progression [2, 17, 18]. In the operating room (OR), learners’ ability to participate in and observe robotic surgery diverges dramatically by setting [11, 19, 20]. Feedback and assessment in the OR also differ across sites; some instructors provide adept and actionable advice, while others struggle to offer useful guidance or reliable assessment [4, 21, 22]. Finally, after an operation, some surgeons and learners have the opportunity to review intra-operative videos to learn from their performance, while others experience barriers to video review [23].

To some extent, this heterogeneity in robotic surgical education reflects differing uptake and enthusiasm for robotic surgery. However, divergent physical, administrative, and instructional resources also likely contribute to the diverse educational experiences of those learning robotic surgery. Artificial intelligence (AI) offers potential solutions to these central problems in robotic surgical education. Properly applied, AI may provide for a more accessible and standardized experience for tomorrow’s robotic surgeons.

Rationale for the use of artificial intelligence

AI encompasses a range of subfields, including machine learning, artificial neural networks, natural language processing, and computer vision [24]. Each of these subfields has the potential to impact and improve robotic surgical education in different ways. One significant way in which AI can improve robotic surgical education is by increasing training efficiency. While many surgeons and trainees have the ability to record intra-operative videos, manually reviewing them in a meaningful way takes a substantial amount of time. Automated video labeling allows learners and their instructors to rapidly identify critical parts of an operative video, including key operative steps and near misses, and hone in on important areas for future improvement [25]. Through natural language processing, AI can also save time by allowing for basic automated feedback to reduce assessment burden and save time for expert faculty [26,27,28]. A second key way by which AI can advance robotic surgical education is by improving the efficacy of simulation, feedback, and assessment. Many learners perform standard simulation exercises that are general to all robotic surgical learners. An AI-informed simulation curriculum, if created and optimized, could harness past performance to customize simulation based on a learner’s needs [29]. Furthermore, AI can allow for more efficacious feedback and assessment by harnessing new inputs and automated performance metrics (APMs) that more accurately reflect operative events than a human rater [30]. For example, AI-detected instrument movement could offer a more accurate interpretation of a surgeon’s economy of motion than an instructor’s perceptions [31]. AI can then incorporate these and other inputs to provide descriptive action plans to learners. Furthermore, AI-based assessment can offer a standardized way by which to increase assessment reliability, eliminate the unpredictability of human raters, identify struggling learners, and promote evidence-based graduated autonomy [3]. Altogether, AI offers surgical educators opportunities to improve the efficiency and efficacy of robotic training greatly.

Available evidence

Work to incorporate AI into robotic surgical education has made significant strides in recent years. We will focus here on the applications of AI to video labeling, feedback, and assessment. Of note, there are numerous additional areas of promise for AI in robotic surgery beyond the scope of this paper, including in intra-operative decision support and autonomous task performance [24, 32].

Video labeling

Video labeling refers to the automated designation of prespecified categories to operative videos [25]. Numerous prior studies have reported on methods to apply AI for video labeling [33]. As such, SAGES has developed consensus guidelines around video labeling to promote a shared vocabulary among educators, researchers, and engineers [34]. Labeling can be done of various types of surgical activities. At the most basic level, videos can be labeled by their short and discrete gestures, alternately called surgemes in some prior work [35]. Gestures may include things like moving to a target or grabbing a suture [35, 36]. Gesture labeling may be too granular for training and is not included in the SAGES guidelines around annotation. However, others have reported on gesture review for assigning meaningful stylistic descriptors from fluid and smooth to viscous and rough [37]. A set of gestures performed together can be labeled as an action, alternately called maneuver. Actions are composite gestures like knot tying and suture throwing [38]. Action labeling can also be useful as certain actions may indicate complications or points of interest in a video [38]. Multiple actions come together to make up a task, such as closing a wound or dissecting the hepatocystic triangle [34, 36]. Task labeling can be particularly helpful in identifying and reviewing specific components of a procedure. Finally, all procedure tasks together comprise the phase, which includes access, execution of the surgical objectives, and closure [34].

In addition to surgical activities, video labeling can also be applied to specific events, anatomy, and instruments. For example, in cholecystectomies, AI video labeling has been used to identify the critical view of safety [39]. AI has also been applied to label other anatomic structures. One study assessed the ability of a deep-learning model to accurately identify the uterus, ovaries, and surgical tools [40]. Another study attempted to overcome limitations related to visual noise and obscured structures to apply labels in partial nephrectomies [41]. Finally, AI-assisted video blurring is an automated method by which to protect patient privacy. AI allows for non-operative moments, such as out-of-body camera cleaning, to be blurred; this can protect identities without compromising the educational quality of videos [42].

Feedback

Feedback is a second major area in which AI has been studied in robotic surgical education. Given surgeons’ inconsistent formative feedback that may not be actionable for learners, AI offers a mechanism by which reliable, high-quality feedback could be provided in robotic surgery [43]. The first way by which AI could improve feedback is by making meaning of APMs. Though APMs do not require AI for collection, they sometimes represent vast amounts of data that can be minimally understandable in raw form. Thus, AI can facilitate the interpretation of APMs and provide actionable, specific feedback tailored to individual surgeons. Such feedback is already incorporated into virtual reality robotic simulation [35]. APM-based feedback has also been evaluated in real-world and tissue model settings. Hung and colleagues used machine learning to train a model based on APMs from 78 robotic-assisted radical prostatectomies [30]. Similarly, Lazar et al. created a model using APMs from 42 simulated lung lobectomies [44]. APMs have been shown to correlate in these studies with actual outcomes of interest, including postoperative complications. APM-based models have highlighted substantial variance in surgeon and trainee performance in certain metrics, such as idle time and wrist articulation, which could be fed back to surgeons and trainees to help them reflect on their motions during an operation or simulation.

Another way in which AI could enhance feedback is through language processing. Surgeons often provide poor quality feedback [21]. Prior authors have harnessed natural language processing to assess feedback quality with high accuracy and specificity [45, 46]. These studies demonstrated the ability of natural language processing to recognize when surgeons’ narrative feedback did not include relevant, specific, and corrective components necessary for learners to improve. This allows low-quality feedback to be identified and improved to help surgeon reviewers. Of note, AI-generated suggestions have been shown in other fields to be of most use to novice learners [47, 48]. This may reflect limitations in AI to detect and replicate the nuances of feedback for advanced surgical techniques. Interestingly, however, in an AMEE guide, Tolsgaard and colleagues note, “While novice learners may gain the most from good AI feedback, they may be more susceptible than experts to incorrect advice provided by AI systems.” Indeed, while AI could be quite useful for providing feedback to novices, such novices may have less context to understand errors in the AI-based feedback.

Assessment

Summative assessment is closely related to feedback, though it differs in its goal of predicting future performance and determining readiness for additional autonomy or practice. Additionally, unlike feedback, summative assessment may not always be conducted to improve performance [49]. Multiple authors have investigated the use of AI for assessment in robotic surgery. Wang and colleagues used a deep convolutional neural network based on a publicly available robotic surgery dataset [50]. They created a model to assess performance on suturing, needle passing, and knot tying and found that AI-based assessment was 91.3% to 95.4% accurate in skill rating when compared with a global rating scale. Another group used computer vision to similarly compare AI-based assessment with an expert-completed rating scale [51]. They found correlations between AI-determined metrics, like bimanual dexterity and efficiency, and expert ratings. Additionally, AI has been applied to determine disease severity in cholecystectomies [52]. Disease severity or procedure difficulty assessment can provide context for manual expert assessment. More specifically, an expert could contextualize a rubric-based assessment with an AI-generated difficulty recommendation. No prior identified work has used AI for actual summative assessment—with resultant learner consequences—in robotic surgery.

Recommendations

The potential uses of AI in robotic surgical education are exceptionally promising (Fig. 1). Nonetheless, we acknowledge that further work must determine the ways in which AI affects robotic surgical education in real-life settings, from trainees’ reactions and learning to actual behavior and results. Based on the available evidence, we recommend incorporating AI-based video labeling into robotic surgical education for learners and surgeons to replay and review robotic operations. This may require addressing medicolegal and technological barriers to intra-operative video recording and easing the process by which those videos can later be seen [23]. Future work should focus on how surgeons use and apply video labeling to ensure that the technology meets educational needs. Video labeling has been chiefly used for retrospective educational review after an operation [33]. A next step for video labeling could be incorporating real-time video labeling on an intra-operative observer screen. Though research must address the effects of this on workflow, such an intervention could allow for more efficacious learning for trainees who are relegated in parts or all of an operation to observation [53, 54]. Other work could investigate automated case logging based on video labeling to allow surgeons and trainees to track their progression over time and divide joint case metrics by operator [42]. Future studies that apply and assess video labeling should use the SAGES consensus vocabulary [34].

Fig. 1
figure 1

Artificial intelligence has multiple applications to address barriers in robotic surgical education for a re-imagined future with more efficient and efficacious learning

We also recommend combining AI-generated, APM-based feedback with expert-based feedback to allow surgeons and trainees to reflect on metrics like bimanual dexterity and efficiency. AI-generated, APM-based feedback may represent a more accurate measure of certain activities in robotic surgery, which could help surgeons and trainees better track their performance and monitor improvement over time. We also suggest that further work identify how APMs could be used to tailor simulation efforts. This could allow for more efficient use of simulation time. Future work in AI-based feedback in robotic surgery could focus on better combining language processing with computer vision. High-quality automated feedback could be provided based on videos of robotic operations or simulations. This could overcome current hurdles associated with manual feedback, including delays in feedback and poorly written or unactionable feedback [51]. Until further work demonstrates the accuracy of AI-based feedback in additional and novel settings, we recommend that AI-based feedback generally be supervised and approved by an expert.

Finally, we recommend against the use of AI without expert supervision for summative assessment at this time. While the aforementioned and other studies have shown substantial strides in the ability of AI to detect a number of important metrics in robotic surgery, it remains insufficiently tested to use summatively. We recommend that further work develop additional evidence in all aspects of validity and in users’ experience prior to considering AI-based summative feedback [55].

Limitations of artificial intelligence

There are important limitations to note with the application of AI to robotic surgical education. First, much of the existing work has applied models to ideal cases or basic procedures that ‘follow the script’ through a single camera view and using a single robotic system. Applying computer vision when there is deviation from the script and improving algorithmic performance in new settings will be of utmost importance to the broader use of AI in robotic surgical education. Similarly, many studies have been limited by small datasets, a lack of external validation, and opaque processes [33]. As additional authors apply and investigate AI in robotic surgical education, they should continue to emphasize the validity and transparency of their processes. Doing so may increase learner and surgeon receptivity to AI-based feedback. Furthermore, validity and transparency will be particularly important prior to the use of AI for assessment in robotic surgical education. Educators simply cannot adopt AI-based assessment in the setting of opaque processes. While a goal of AI-based feedback and assessment is standardization and bias reduction, algorithmic bias remains a troubling barrier [56]. Underskilling is the inaccurate downgrading of a surgeon’s performance by AI, while overskilling is the inaccurate upgrading of a surgeon’s performance by AI; both underskilling and overskilling present obstacles to the summative use of AI-generated assessment [57].

Finally, we caution against the widespread view of AI-based feedback and assessment as “objective.” While many authors in the works cited in this paper emphasize the importance of objective metrics such as those possible through AI, it is important to note that humans decide which metrics to include in an algorithm or AI-based rubric. Whether a particular gesture correlates with an outcome that is important to patients is mostly unknown [30, 58]. Similarly, many AI-based assessments use expert feedback as a gold standard; however, not all expert feedback is created equally. As such, while AI certainly may produce highly reliable or consistent results, portraying AI-based metrics as an objective and ultimate reflection of truth may do a disservice to learners, educators, and patients.

Conclusions

AI offers potential solutions to multiple educational challenges in robotic surgery. Through advances in video labeling, feedback, and assessment, AI has demonstrated ways by which to increase the efficiency and efficacy of robotic surgical education. Barriers to the widespread use of AI remain, particularly with regard to the use of AI for assessment of robotic surgical skill. Both AI-based technologies and robotic surgery have changed rapidly, and will likely continue to evolve quickly in coming weeks, months, and years. As such, recommendations and limitations in this area will also change. To facilitate impactful AI-based innovations in robotic surgical education, future work should focus on further developing validity and transparency.

Summary box

  • Robotic surgical education has been challenged by a reliance on observational learning, low-quality feedback, and inconsistent assessment

  • AI offers solutions to these and other challenges in robotic surgical education

  • Strengths of AI in this field center around video labeling and metric-based feedback

  • Ongoing work must focus on developing validity and ensuring transparency

Where to find more information

We suggest reading “Artificial Intelligence Methods and Artificial Intelligence-Enabled Metrics for Surgical Education: A Multidisciplinary Consensus” and “Artificial Intelligence and Surgical Education: A Systematic Scoping Review of Interventions” for recommendations and review regarding the use of AI in surgical education more broadly [29, 59]. For additional resources on the use of AI in robotic surgery, consider “A systematic review on artificial intelligence in robot-assisted surgery” [33]. Finally, for information on consensus video annotation, we advise reviewing “SAGES consensus recommendations on an annotation framework for surgical video” by Meireles and colleagues [34].