Abstract
The objective of this study is to compare automated performance metrics (APM) and surgical gestures for technical skills assessment during simulated robot-assisted radical prostatectomy (RARP). Ten novices and six experienced RARP surgeons performed simulated RARPs on the RobotiX Mentor (Surgical Science, Sweden). Simulator APM were automatically recorded, and surgical videos were manually annotated with five types of surgical gestures. The consequences of the pass/fail levels, which were based on contrasting groups’ methods, were compared for APM and surgical gestures. Intra-class correlation coefficient (ICC) analysis and a Bland–Altman plot were used to explore the correlation between APM and surgical gestures. Pass/fail levels for both APM and surgical gesture could fully distinguish between the skill levels of the surgeons with a specificity and sensitivity of 100%. The overall ICC (one-way, random) was 0.70 (95% CI: 0.34–0.88), showing moderate agreement between the methods. The Bland–Altman plot showed a high agreement between the two methods for assessing experienced surgeons but disagreed on the novice surgeons’ skill level. APM and surgical gestures could both fully distinguish between novices and experienced surgeons in a simulated setting. Both methods of analyzing technical skills have their advantages and disadvantages and, as of now, those are only to a limited extent available in the clinical setting. The development of assessment methods in a simulated setting enables testing before implementing it in a clinical setting.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Patient outcome is affected by the surgeon’s performance; therefore, surgeons must receive relevant training and possess the necessary competencies when operating on patients [1,2,3,4,5]. Virtual reality (VR) simulators allow surgeons to practice their skills in a risk-free environment and receive automated feedback on their performance [6,7,8]. The optimal way of training is using a mastery learning approach, where all surgeons train to a pre-defined proficiency level which ensures that all surgeons have gained the necessary basic competencies before proceeding to supervised real-life surgeries [9,10,11]. Proficiency levels on virtual reality simulators have typically been set using simulator-generated automated performance metrics (APM), based on time, instrument movements, and error parameters. However, APM are often presented in a cumulative score with abstract values e.g., instrument path length—left arm (in millimeters), clutch usage (number of times used), number of movements—right instrument (number of times used), etc. [10, 11]. These APM are easy to capture in the simulated setting and often good at measuring skills progression but can be difficult to convert to meaningful feedback, as they say little about the quality of the procedure and the surgical technique used [12,13,14,15]. Furthermore, APM are only to a limited extent available during real-life surgeries making them difficult to use for assessment in the operating room.
Therefore, the analysis of surgical gestures was introduced as a new assessment method. Gesture analysis involves breaking down surgery into phases of actions, e.g., ‘needle handling’, ‘grasping’, and ‘suturing’ [16, 17]. Surgical gestures can be used to analyze performance patterns throughout a procedure instead of a cumulative score as with the APM. We can use surgical gestures to provide feedback on how and where in the procedures surgeons can improve [15, 18]. It has previously been used for feedback for suturing models with participants being assessed using surgical gestures and receiving relevant feedback such as “To improve: minimize the number of re-grabs of the needle (< 2 times)” [19]. This method could be available for both simulation-based training and real-life surgeries, presenting a new opportunity for automated evaluation of surgical performance in the operating room. This method is still in the early stages of development as it requires manual interpretation and video annotation of surgical gestures [7, 13, 15, 20].
As both APM and surgical gestures seem to have advantages and disadvantages, we wanted to compare APM and surgical gestures analysis for technical skill assessment in simulated robot-assisted radical prostatectomy (RARP).
Methods
Messick’s framework was used to describe the validity evidence of the two methods by evaluating the relationship with other variables and the consequences and thereby comparing the two methods.
We compared the advantages and disadvantages of the use of APM and surgical gestures for technical skills as feedback can be difficult for the surgeon using either method. Neither of the methods is available in the clinical setting as the methods for automatic detection of APM and surgical gestures have not yet been developed. With this limitation, only a few studies have assessed the correlation of APM and surgical gestures to patient outcomes, and most studies were performed in urology (Table 1).
Ten novice surgeons (assisted to a minimum of one RARP but no other experience with robotic surgery) and six experienced RARP surgeons (performed > 50 RARP) performed simulated RARPs on the RobotiX Mentor (formerly Simbionix, now Surgical Science, Sweden) in a previously published study [21]. APM and videos were automatically recorded for each part-procedure of: bladder neck dissection, neurovascular-bundle dissection, and urethrovesical anastomosis. The participants performed each part-procedure three times. In a previous study [21], six of the recorded APM (Table 2) were transformed into z-scores, and a composite score was calculated for the six metrics for each of the three modules over the three repetitions giving each participant a total score (Fig. 1). A pass/fail score of −0.51 standard deviations was previously determined using the contrasting groups’ method (Fig. 2).
Videos were recorded of each part-procedure on the simulator. The 144 videos were manually annotated with surgical gestures in a previously published study [22]. A total of five different gestures were used: regular dissection, hemostatic control, application of clips, needle handling, and suturing (Table 2). The total times of the surgical gestures were transformed into idle time and active time. Idle time was the time between two phases where no annotations of surgical gestures were made. Active time is the opposite of idle time and was measured as the total duration of phases of surgical gestures. In a previous study [22], the total time for each of the five surgical gestures (Table 2) for each part-procedure was transformed into a z-score, and a composite score was calculated for the five gestures for each of the three modules over the three repetitions giving each participant a total score (Fig. 1). A pass/fail score of −0.4 standard deviations was previously determined using the contrasting groups’ method (Fig. 2).
Data analysis
To examine the agreement between APM and surgical gestures (relationship to other variables), an intra-class correlation coefficient (ICC1, one-way, random) [23] and a Bland–Altman plot were used using the pass-fail scores for novices and experienced surgeons. The limits of agreement of the Bland–Altman plot were set as the 95% confidence interval for the experienced RARP surgeons. We compared the consequences of the two assessment methods by calculating the sensitivity and specificity of the pass/fail levels, e.g., how many novices failed and how many experienced surgeons passed, for APM and surgical gestures.
For data analysis, we used the Python programming language (version 3.10.10, Python Software Foundation, Amsterdam, The Netherlands, https://www.python.org).
Ethics
The Danish Data Protection Agency approved the study (REG-059-2019 and P-2020-701). The study was deemed exempt from ethical approval by the Danish National Ethics Committee (H-19016423).
Results
The overall agreement between APM and surgical gestures measured with the intra-class correlation coefficient (ICC) was 0.70 (95% CI 0.34–0.88), which is acceptable (relationship to other variables) [24]. The Bland–Altman plot (Fig. 3) showed an agreement between APM and surgical gestures for all experienced surgeons, with both APM and surgical gestures being effective methods. However, for novice surgeons, the APM score was lower than the surgical gesture scores for 3 out of 10 novices (Figs. 3 and 4). The three lowest combined APM z-scores were exceptionally low due to increased instrument collisions for the urethrovesical anastomosis task.
We compared the pass/fail scores of APM and surgical gestures and found that both methods could fully distinguish between novices and experienced surgeons (consequences) (Fig. 5). Both methods could fully distinguish between the skill levels of the surgeons, meaning that all novices failed, and all experienced surgeons passed when using either method.
Discussion
Both APM and surgical gestures could fully discriminate between novices and experienced surgeons, but there was a greater variance in the pass/fail scores for the novices for the APM scores.
Ideally, surgeons should acquire the initial skills in a simulated setting and practice until they reach proficiency using a relevant pass/fail level. This could be established using APM, rater-based assessment tools, or video analysis [21, 22, 25]. This would ensure that the surgeons have gained the competencies required to proceed to supervised real-life surgeries [10]. Dubin et al. [25] and Jørgensen et al. [26] found that APM and rater-based assessment tools matched for some metrics/items but not for others. We found that the two methods could fully discriminate between experienced surgeons, adding to the validity of evidence supporting both assessment methods. However, we found a greater distribution for the APM scores among the novices compared to the gestures scores, and the two methods disagreed on the skill level of 3 out of 10 of the novice surgeons where one method scored the novices with a higher skill level than the other. These novice surgeons have a low APM score, but they performed at an average level when analyzed using gestures. The three novices all had a higher incidence of instrument collisions in the urethrovesical anastomosis, which is the part of the procedure where the instruments move closely together for needle handling and suturing. This will increase the risk of instrument collisions, but the clinical effect is uncertain, as we can see from the gestures that they manage to perform suturing just as well as their novice colleagues. APM and surgical gestures differ greatly in data type and how the data could be presented to the trainees. APMs are cumulative scores based on movement data from the instruments and the camera and focus on technical skills. In contrast, surgical gestures are specific actions the surgeon performs and are used to analyze the pattern of actions throughout the procedure [10, 11, 15, 18]. This makes surgical gesture analysis more procedure-specific as the workflow will most likely differ between procedures, even though each individual gesture is not procedure-specific [10, 11, 15, 18]. Our previous study showed that the distribution of surgical gestures throughout the part-procedures differed greatly between the novices and experienced surgeons. This could indicate that the two methods measure different aspects of surgical skills, but both can be used to assess competency. Another way could be to provide the trainees with feedback from both APM and surgical gestures. Previous studies have shown that data from both APM and surgical gestures will improve the prediction of AI models on the surgical experience level [15, 27,28,29,30]. Trainees could benefit from dual feedback which covers different aspects of technical skills and procedural skills. However, it is still difficult and time-consuming to capture and analyze both types of data.
We have only tested the correlation of the two methods with a limited number of experienced surgeons and did not include intermediate surgeons. We also used data from a simulated setting and not real-life surgery. Previous studies have shown a correlation between APM and surgical gestures on patient outcomes in RARP [18, 31]. The current availability of APM and surgical gestures for real-life surgeries is very limited. In the operating room, it is difficult to find methods for assessment that do not require a more experienced surgeon to be present onsite or spend time reviewing video recordings of the procedure afterward. Onsite clinical assessment is time-consuming for surgeons and subject to bias as the assessor often comes from the same institution as the surgeons in training [32]. APM can be recorded in the clinic using custom recording tools for robotic surgery, but these recorders are both difficult to obtain, and the data can be difficult to interpret [31, 33, 34]. Surgical gestures could be an alternative to unbiased assessment in the clinic. It requires the surgeon to record the procedure and for someone trained in video annotation to analyze the video. Unfortunately, manual annotation of surgical procedures is very time-consuming and not scalable but with the increasing use of artificial intelligence (AI) and automation; in the future, it could be possible to annotate surgical videos with surgical gestures automatically. The greatest force of AI is the ability to detect patterns that we as human evaluators did not consider, e.g., certain part-procedures could be more important for patient outcomes, or novice surgeons tend to have difficulty in different skill sets than expert surgeons. The AI algorithms could provide surgeons with information on their surgical performance levels and, perhaps, the expected patient outcomes. They could also be used for targeted feedback with advice on improving your skills [19]. For smaller procedures such as a simulated suturing task, simple feedback on needle handling or the number of needle punches in the tissue could be provided and most likely enhance the suturing skills of the trainees [19]. Video examples comparing how the trainee performed the suturing compared to an expert could even be provided for extra feedback. For full-length or real-life surgeries, a small deviation in the surgical pattern in one part of the procedure might not influence the surgery or patient outcome, whereas deviations in other parts could. Feedback on every gesture that was not performed optimally throughout an entire procedure might not benefit the surgeon. For the feedback to be targeted and correct, it will require a much better understanding of the patterns of surgical gestures relevant to each procedure. The creation of AI algorithms requires substantial testing before they can be implemented for clinical assessment [7, 13, 20]. Therefore, as of now, it seems that most assessment methods, without the need for expert surgeons, such as APM and surgical gestures, are only available in the simulation-based setting and not for clinical assessment, but this will likely change in the future.
Conclusion
APM and surgical gestures could fully distinguish between novices and experienced surgeons in a simulated setting. Both methods of analyzing technical skills have advantages and disadvantages and are only available to a limited extent in the clinical setting. This will hopefully change with the technological development of more advanced robotic systems and AI algorithms.
Data availability
No datasets were generated or analyzed during the current study.
References
Goldenberg MG, Goldenberg L, Grantcharov TP (2017) Surgeon performance predicts early continence after robot-assisted radical prostatectomy. J Endourol 31(9):858–863
Ghani KR, Miller DC, Linsell S, Brachulis A, Lane B, Sarle R et al (2015) Measuring to improve: peer and crowd-sourced assessments of technical skill with robot-assisted radical prostatectomy. Eur Urol 69(4):547–550
Lovegrove C, Novara G, Mottrie A, Guru KA, Brown M, Challacombe B et al (2016) Structured and modular training pathway for robot-assisted radical prostatectomy (RARP): Validation of the RARP assessment score and learning curve assessment. Eur Urol 69(3):526–535
Govaerts MJB, Schuwirth LWT, van der Vleuten CPM, Muijtjens AMM (2011) Workplace-based assessment: effects of rater expertise. Adv Health Sci Educ 16(2):151–165
Birkmeyer JD, Finks JF, O’Reilly A, Oerline M, Carlin AM, Nunn AR et al (2013) Surgical skill and complication rates after bariatric surgery. N Engl J Med 369(15):1434–1442
Brewin J, Ahmed K, Challacombe B (2014) An update and review of simulation in urological training. Int J Surg 12:103–108
Kirubarajan A, Young D, Khan S, Crasto N, Sobel M, Sussman D (2021) Artificial intelligence and surgical education: a systematic scoping review of interventions. J Surg Educ. https://doi.org/10.1016/j.jsurg.2021.09.012
Chu TN, Wong EY, Ma R, Yang CH, Dalieh IS, Hui A et al (2023) A multi-institution study on the association of virtual reality skills with continence recovery after robot-assisted radical prostatectomy. Eur Urol Focus. https://doi.org/10.1016/j.euf.2023.05.011
McGaghie WC, Issenberg SB, Barsuk JH, Wayne DB (2014) A critical review of simulation-based mastery learning with translational outcomes. Med Educ 48:375–385
Cook DA, Brydges R, Zendejas B, Hamstra SJ, Hatala R (2013) Mastery learning for health professionals using technology-enhanced simulation: a systematic review and meta-analysis. Acad Med 88(8):1178–1186
Bjerrum F, Thomsen ASS, Nayahangan LJ, Konge L (2018) Surgical simulation: current practices and future perspectives for technical skills training. Med Teach 40(7):668–675
Mirchi N, Bissonnette V, Ledwos N, Winkler-Schwartz A, Yilmaz R, Karlik B et al (2020) Artificial neural networks to assess virtual reality anterior cervical discectomy performance. Oper Neurosurg 19(1):65–75
Bissonnette V, Mirchi N, Ledwos N, Alsidieri G, Winkler-Schwartz A, Del Maestro RF (2019) artificial intelligence distinguishes surgical training levels in a virtual reality spinal task. J Bone Joint Surg. https://doi.org/10.2106/JBJS.18.01197
Gazis A, Karaiskos P, Loukas C (2022) Surgical gesture recognition in laparoscopic tasks based on the transformer network and self-supervised learning. Bioengineering 9(12):737
Ahmidi N, Tao L, Sefati S, Gao Y, Lea C, Haro BB et al (2017) A Dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Trans Biomed Eng 64(9):2025–2041
Ma R, Vanstrum EB, Nguyen JH, Chen A, Chen J, Hung AJ (2021) A novel dissection gesture classification to characterize robotic dissection technique for renal hilar dissection. J Urol 205(1):271–275
Nakawala H, Bianchi R, Pescatori LE, De Cobelli O, Ferrigno G, De Momi E (2019) “Deep-Onto” network for surgical workflow and context recognition. Int J Comput Assist Radiol Surg 14(4):685–696
Ma R, Ramaswamy A, Xu J, Trinh L, Kiyasseh D, Chu TN et al (2022) Surgical gestures as a method to quantify surgical performance and predict patient outcomes. NPJ Digit Med. https://doi.org/10.1038/s41746-022-00738-y
Ma R, Kiyasseh D, Laca JA, Kocielnik R, Wong EY, Chu TN et al (2023) AI-based video feedback to improve novice performance on robotic suturing skills—a pilot study. J Endourol. https://doi.org/10.1089/end.2023.0328
De Backer P, Eckhoff JA, Simoens J, Müller DT, Allaeys C, Creemers H et al (2022) Multicentric exploration of tool annotation in robotic surgery: lessons learned when starting a surgical artificial intelligence project. Surg Endosc 36(11):8533–8548
Olsen RG, Bjerrum F, Konge L, Jepsen JV, Azawi NH, Bube SH (2021) Validation of a novel simulation-based test in robot-assisted radical prostatectomy. J Endourol. https://doi.org/10.1089/end.2020.0986
Olsen RG, Svendsen MBS, Tolsgaard MG, Konge L, Røder A, Bjerrum F (2024) Surgical gestures can be used to assess surgical competence in robotic-assisted surgery. J Robot Surg 20(18):47
Koo TK, Li MY (2016) A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 15(2):155
George D, Paul Mallery W (2019) IBM SPSS statistics 25 step by step: a simple guide and reference, 15th edn. Routledge Taylor & Francis Group, New York, p 244
Dubin AK, Smith R, Julian D, Tanaka A, Mattingly P (2017) A comparison of robotic simulation performance on basic virtual reality skills: simulator subjective versus objective assessment tools. J Minim Invasive Gynecol 24(7):1184–1189
Jørgensen RJ, Olsen RG, Svendsen MBS, Stadeager M, Konge L, Bjerrum F (2022) Comparing simulator metrics and rater assessment of laparoscopic suturing skills. J Surg Educ. https://doi.org/10.1016/j.jsurg.2022.09.020
Vedula SS, Malpani A, Ahmidi N, Khudanpur S, Hager G, Chen CC (2016) Task-level vs. segment-level quantitative metrics for surgical skill assessment. J Surg Educ 73(3):482–489
Hung AJ, Bao R, Sunmola IO, Huang DA, Nguyen JH, Anandkumar A (2022) Capturing fine-grained details for video-based automation of suturing skills assessment. Int J Comput Assist Radiol Surg. https://doi.org/10.1007/s11548-022-02778-x
Murali A, Garg A, Krishnan S, Pokorny FT, Abbeel P, Darrell T, et al. (2016) TSC-DL: unsupervised trajectory segmentation of multi-modal surgical demonstrations with deep learning. In: 2016 IEEE international conference on robotics and automation (ICRA) Stockholm, Sweden, May 16–21. 21st edition.
Van Amsterdam B, Funke I, Edwards E, Speidel S, Collins J, Sridhar A et al (2022) Gesture recognition in robotic surgery with multimodal attention. IEEE Trans Med Imaging 41(7):1677–1687
Hung AJ, Ma R, Cen S, Nguyen JH, Lei X, Wagner C (2021) Surgeon automated performance metrics as predictors of early urinary continence recovery after robotic radical prostatectomy—a prospective Bi-institutional study. Eur Urol Open Sci 1(27):65–72
Dai JC, Lendvay TS, Sorensen MD (2017) Crowdsourcing in surgical skills acquisition: a developing technology in surgical education. J Grad Med Educ 9(6):697–705
Hung AJ, Chen J, Ghodoussipour S, Oh PJ, Liu Z, Nguyen J et al (2019) A deep-learning model using automated performance metrics and clinical features to predict urinary continence recovery after robot-assisted radical prostatectomy. BJU Int 124(3):487–495
Hung AJ, Chen J, Jarc A, Hatcher D, Djaladat H, Gill IS (2018) Development and validation of objective performance metrics for robot-assisted radical prostatectomy: a pilot study. J Urol 199(1):296–304
Funding
Open access funding provided by Copenhagen University. We received no specific grants from funding agencies in the public, commercial, or not-for-profit sectors for the work presented in this manuscript.
Author information
Authors and Affiliations
Contributions
R.G.O collected all data and wrote the manuscript. R.G.O, M.B.S.S, M.G.T, and L.K performed the data analysis. All authors were involved with the protocol development and manuscript editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Olsen, R.G., Svendsen, M.B.S., Tolsgaard, M.G. et al. Automated performance metrics and surgical gestures: two methods for assessment of technical skills in robotic surgery. J Robotic Surg 18, 297 (2024). https://doi.org/10.1007/s11701-024-02051-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11701-024-02051-0