Over the past two decades, laparoscopic surgery has seen widespread adoption. This trend has improved patient outcomes, but also corresponds to unique educational challenges [1]. Traditionally, surgical education has been time based, where competency was assumed due to the volume of cases experienced through sheer exposure during the course of a residency. However, this paradigm of surgical skills acquisition has been challenged. Acquisition of laparoscopic skills can be difficult, particularly in the modern educational environment with multiple competing technologies, resident duty hour restrictions, and patient safety concerns [2]. The COVID-19 pandemic has only worsened these challenges, as provider safety limitations impair our ability to spend time in close proximity to one another in order to participate in hands-on skills training [3]. Fortunately, simulation training, particularly for laparoscopic surgery, has been shown to enhance skills acquisition [4]. The Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) developed the Fundamentals of Laparoscopic Surgery (FLS) course nearly 20 years ago [5]. This practical and comprehensive educational program is now the gold standard for teaching and evaluating laparoscopic skills and has been endorsed by the American Board of Surgery [5,6,7].

Recently, new high-fidelity mobile simulation platforms have been developed that merge real-time views from local and remote video streams, allowing for live, interactive instruction, and collaboration by distance [8]. These so called “merged virtual-reality” (MVR) platforms have seen wide acceptance in a variety of commercial and educational environments. For example, MVR has been used to subjectively enhance patient engagement in postoperative care following orthopedic surgery [9] and to facilitate neurology virtual consultation [10]. It has also been used as an educational tool in a pilot study to train infant car seat installation [11, 12]. To our knowledge, these platforms have not been used as an instructional tool for medical education, but are a promising candidate for adaptation to laparoscopic surgery training and practice. Use of MVR in surgical education could allow for dissemination of robust, high-fidelity technical skills training even when learners and educators are not in close physical proximity. This is of particular importance during a pandemic, when physical distancing and isolation are required. However, even as the current safety restrictions begin to relax, MVR is potentially an asset for enhancing connectivity between geographically diverse populations. As FLS moves toward becoming an international standard, MVR could help ensure dissemination of laparoscopic skills worldwide [8]. Finally, MVR could be used in patient care environments. For example, a geographically remote surgeon could initiate a virtual intraoperative consultation with a tertiary-care center specialist for difficult cases. The subspecialist could view the entire operative field, annotate relevant anatomy on the referring surgeon’s screen, or even virtually put his or her hands into the field [8]. However, no comprehensive investigations have yet sought to support MVR adoption for laparoscopic teaching. Herein we describe a non-inferiority randomized comparative study to obtain validity evidence for the use of an MVR platform (HelpLightning™, 2015–2021 Help Lightning Inc., Alabama, USA) in laparoscopic skills education. The study purpose was to compare novice laparoscopic skills acquisition between this MVR platform and conventional in-person teaching. We hypothesized that training using MVR is non-inferior to standard in-person methods, thus laying the foundations for future research into MVR applications in laparoscopic surgery.

Materials and methods

Study design

This was a prospective randomized controlled parallel group non-inferiority study with three arms. Participants were assigned equally into one of three groups, each with two hours of supervised practice. Group 1 (control) received no formal mentorship or feedback; group 2 (standard) received in-person instruction and expert feedback; and group 3 (experimental) received remote instruction and feedback via the MVR platform (Fig. 1). Approval was obtained from the Health Research Ethics Board and all participants signed an informed consent form (protocol #HS20908 (H2017:218). This study is reported in accordance with the Consolidated Standards of Reporting Trials (CONSORT) [13].

Fig. 1
figure 1

Merged virtual reality (MVR) instruction of laparoscopic suturing using the HelpLightning™ (2015–2021 Help Lightning Inc., Alabama, USA) system. A The learners’ set-up, with the learner grasping both needle drivers inside of the simulation box. Virtually overlaid Maryland forceps and drawn red arrow belongs to the remote proctor. B The remote proctor’s set-up

Messick’s validity framework was used to guide this research. This framework has become the gold standard for evaluation of validity evidence in performance assessment and is supported by multiple education and research organizations [14]. In brief, this framework uses five sources of validity evidence: content, response process, internal structure, relations with other variables, and consequences of the assessment or test. A more detailed review of validity in surgical education is beyond the scope of this paper, therefore we refer the reader to a recent systematic review on this topic [15]. According to this framework, we attempted to establish initial validity evidence for MVR by measuring participant performance in “relation to other variables.” We compared novices trained using three different modalities and compared their performance using the previously established FLS tasks.

Participants

Medical students and pre-medical (undergraduate students with an interest in medicine) students from the University of Manitoba were invited to participate. Eligibility criteria included no previous exposure to FLS training and no laparoscopic surgery experience.

Study intervention

All participants viewed a demonstration of the five FLS tasks with instructions for each read from a preformed script. The tasks were as follows: (1) peg transfer: six plastic objects are transferred from one side of a pegboard to the other. (2) Precision cutting: a circular pattern is cut along a pre-marked line. (3) Ligating loop: a loop is placed at a pre-marked line on a foam appendage. (4) Suture with extracorporeal knot: a long suture is passed through two targets on a Penrose drain and three knots tied in an extracorporeal fashion are placed using a knot pusher. (5) Suture with intracorporeal knot: a short suture is used and three knots are tied in an intracorporeal fashion.

Following initial instruction and demonstration, the participants completed a pre-test to ensure baseline homogeneity. After the pre-test, Group 1 (control) underwent a 2-h self-practice session with no further instruction or feedback. Group 2 (standard) underwent a 2-h session consisting of in-person instruction and expert summary feedback. Group 3 (experimental) also underwent a 2-h session, with instruction and feedback provided via our MVR platform. To guard against potential confounders between the standard and experimental groups, the nature of instruction provided was purposefully semi-scripted and nearly identical between groups. For example, the instructor did not physically intervene in either group to demonstrate additional tips and tricks, aside from pre-determined scripted verbal feedback. Furthermore, the same instructor mentored both groups. Immediately following these sessions, all groups completed a post-test. One month later, they completed a retention test. The study protocol was pilot tested on a small number of non-participant students to ensure flow and homogeneity of instructions prior to study commencement.

Outcome measures

Standard FLS assessment metrics were measured for each of the five tasks during the testing phases. The FLS scoring system is previously described and supported by past validity evidence [5,6,7, 16, 17]. In brief, the FLS scoring system uses a cut-off time assigned for each of the tasks. A raw score is calculated by subtracting the total time taken to complete a task in seconds, plus any predefined penalty points. A higher score indicates greater performance.

Upon initial pilot testing, we noted that even the trained participants would often obtain a score of zero on both the pre- and the post-tests. We hypothesized that equivalency between scores of zero is more easily demonstrated and could falsely conclude equivalency of each training regimen. Therefore, to increase the sensitivity of our non-inferiority study, total time to task completion was used rather than the FLS scoring system.

Primary outcomes were the change in total FLS times from pre-test to post-test for each group, and differences between groups. Secondary outcomes were the change from pre-test to post-test times for each of the individual FLS tasks and skills retention. Skill retention was calculated by subtracting the post-test time from the retention test time.

Statistical analysis

Normality of data was tested using Shapiro–Wilk test and visually using Q–Q plot and histogram. Data were normally distributed, so ordinary one-way analysis of variance (ANOVA) was used to evaluate the effects of group (control, standard, MVR) on test time. Pairwise comparisons were done using post hoc Tukey HSD. Paired t tests were used to assess the amount of skill acquisition and retention that occurred within each group. All data were analyzed using SPSS for Windows (Version 27, SPSS Inc., Chicago, IL). Statistical significance was set a priori at a p < 0.05.

Sample size

Sample size calculation was performed using a non-inferiority margin set at 10% of the mean improvement in expected total FLS scores (post-test minus pre-test) after training using the standard regimen. The sample size was calculated by applying a single-tailed alpha of 0.05 and a power of 0.80. Calculation was based upon previous FLS tasks of novice trainees showing a mean improvement in mean total scores of 281.6 points with a standard deviation (SD) of 25 points [18]. Ten participants were required in each training arm. For this sample size calculation, mean and SD were converted from median and interquartile range data using methods reported by Wan et al. [19]. Three participants dropped out near the beginning of study recruitment, therefore they were replaced with three additional volunteers to maintain the target sample size.

Randomization

Randomization was done using blocked computer-generated random sequence. Due to the nature of the study design, blinding was not possible. Randomization, assessment, and participant coaching were all administered by a single investigator.

Results

Pre-test

Thirty-three individuals agreed to participate and were assigned equally to one of the three study interventions (Fig. 2). Thirty participants completed study follow-up. Seventeen participants were female and 16 were male. Participants were all students with no laparoscopic surgical experience (simulated or otherwise). Eleven participants were medical students and 22 were pre-medical students. Three participants (one in each group) did not return for retention testing within the study timeline due to scheduling conflicts and were therefore excluded. The pre-test undertaken by all participants confirmed baseline homogeneity between groups (Table 1).

Fig. 2
figure 2

CONSORT flow diagram showing study participants

Table 1 Baseline fundamentals of laparoscopic surgery (FLS) test times

Post-test

There were statistically significant improvements in performance for all three groups from comparisons between pre-test and post-test scores for each of the FLS tasks and total task scores, except for in the ligating loop in the control arm (p = 0.051) (Supplement 1). Pairwise comparisons between groups following the post-test are demonstrated in Fig. 3. There were no differences between the standard and MVR groups for any of the individual FLS tasks. Trained groups (Standard or MVR) were each significantly better than controls for the ligating loop, extracorporeal suturing, intracorporeal suturing, and total FLS task completion time but not for peg transfer and pattern cut tasks.

Fig. 3
figure 3

Task completion time for pre-test (light colors) and post-test (dark colors) for the control, standard, and merged virtual reality (MVR) groups for all Fundamentals of Laparoscopic Surgery (FLS) tasks. p values indicate comparisons for improvement between groups using 1-way ANOVA and pairwise Tukey HSD tests (for significant ANOVA). EC extracorporeal, IC intracorporeal

Retention test

There were no significant differences seen between the standard and MVR groups for skill retention for any of the FLS tasks or the total FLS task completion time (Fig. 4). Trained groups (Standard or MVR) were each significantly better than controls for the extracorporeal suturing task and total FLS task completion time, but not for peg transfer, pattern cutting, or ligating loop. The MVR group compared to control were significantly different for intracorporeal suture retention, but not the standard group compared to control.

Fig. 4
figure 4

Task completion time for post-test (light colors) and retention test (dark colors) for the control, standard, and merged virtual reality (MVR) groups for all Fundamentals of Laparoscopic Surgery (FLS) tasks. p values indicate comparisons for relative difference between groups using 1-way ANOVA and pairwise Tukey HSD tests (for significant ANOVA). EC extracorporeal, IC intracorporeal

Discussion

Advances in virtual simulation, such as the MVR HelpLightning™ platform, hold much promise for surgical education as it confronts new and ongoing challenges. However, proper evaluation of any new approach is imperative before incorporating into a surgical training program. Gallagher et al. found that for virtual reality to be successful in improving surgical skills, it must be backed up by validity evidence [20]. Therefore, according to the Messick’s validity framework [14, 15], we provide data which fall under the “relations to other variables” category of validity evidence. According to this framework, there are four other possible sources of validity evidence, which we did not explicitly assess: content, response process, internal structure, or consequences. However, by building upon the substantial work of those who have previously rigorously studied the FLS evaluation system [5, 6], we are able to assess the equivalence of two training modalities. In this randomized controlled trial, we have demonstrated that training and assessment of FLS tasks by MVR are non-inferior to in-person training on a standard laparoscopic box trainer, both in the immediate post-training period and on a skills retention test 4 weeks later. Significant improvement of skills occurred for both training regimens when compared to the control group. Our study provides important data in support of the validity of a mobile MVR platform for laparoscopic skills training, demonstrating it to be a non-inferior alternative to in-person participation.

This study used the tasks and assessment strategies from the FLS program. FLS is one of the most extensively studied laparoscopic surgical skills training programs and is supported by a breadth of validity evidence [5,6,7, 16, 17]. FLS has known standards and widely published expected training improvements, which allowed us to power our study with relative accuracy. We are also able to compare our results to a wide breadth of FLS literature. For example, the observed improvements in laparoscopic skills in the present study closely correspond with those reported previously. In all practice arms, the greatest improvements were seen in the suturing and pattern cutting tasks, compared to less improvements seen in the ligating loop or the peg transfer. These are widely observed phenomenon in the FLS literature, further supporting the validity of MVR as a training method compared to standard approaches [17, 18]. The effects of expert instruction in both the MVR and standard arms were most pronounced for enhancing speed on the suturing tasks. Suturing is widely hypothesized to be the most difficult task and expert instruction and feedback have previously been shown to speed acquisition of this skill in novices [21].

We did, however, modify the FLS assessment strategies employed compared to traditional FLS assessment metrics. For feasibility purposes, two hours of dedicated practice were chosen for the instruction component. This was based on past literature, which demonstrated that two hours of training led to significant and measurable improvements in laparoscopic suturing performance [22, 23]. A longer training session we felt would lead to more attrition or difficulty recruiting participants [24]. On pilot testing, we noted, however, that the majority of untrained participants and many participants after 2 h of training would often attain a score of 0 when assessed using traditional FLS metrics, despite demonstrating marked objective improvements in overall task performance. We felt that equivalency between scores of zero is more easily demonstrated and could falsely conclude equivalency of each training regimen. For example, the maximum time allocated to total FLS tasks to score above zero is 1800 s. Even without tabulation of error scores, many participants in all groups would have obtained a score of zero on many of the tasks (Figs. 3, 4). Therefore, the decision was made to use overall task completion time rather than a calculated FLS performance score.

Our findings support previous work on tele-simulation in laparoscopic surgery and provide important evidence for future implementation of such a teaching program into a surgical training. Prior to the present study, the use of remote tele-simulation had already seen preliminary successes in administering the FLS skills test. The FLS certifying exam has now been taken by physicians from more than 20 countries worldwide, yet there are very few international test centers, necessitating costly and time-consuming travel. To this end, O’Krainec et al. undertook a study in 2016 in which 20 participants completed the FLS test while being examined by both an on-site and a remote proctor [25]. They found excellent inter-rater reliability between the two proctors and high participant satisfaction with the process. This study was an important first step in addressing the dilemma of geographical diversity, proving the efficacy of remote proctoring and supporting our study in taking the next step of both instructing and assessing the FLS skills remotely. Such efforts have brought resource-rich and resource-poor learners and physicians together from around the world and effectively put them in the same room, marking a breakthrough in the accessibility of the FLS program and the benefits it brings to physicians and their patients. MVR has the potential to further enhance existing distance skills training, such as the FLS tele-simulation project developed by the University of Toronto, in which Canadian surgeons proctored surgeons in resource-limited countries allowing them to develop technical proficiency in basic laparoscopic skills [26].

Tele-mentoring has also seen successful adoption in collaboration between expert physicians. As early as 1998, Rosser et al. at the Yale University School of Medicine helped guide laparoscopic cholecystectomies in Ecuador [27]. Schlachta et al. from the Schulich School of Medicine at the University of Western Ontario in London, Ontario, successfully incorporated tele-mentoring in laparoscopic colon resections between their tertiary center and surrounding community hospitals [28]. More recently, a multi-institutional initiative set forward by SAGES demonstrated tele-mentoring to be a practical, feasible, and successful method of instruction for laparoscopic sleeve gastrectomy [29]. Despite these important steps, until now no prospective randomized controlled trials had demonstrated the efficacy of tele-simulation or tele-mentoring in both the instruction and the assessment of the FLS skills, which form the foundation of any laparoscopic surgery.

Considering the direct impact that remote virtual intraoperative consultations can have on patient care, this is an exciting area of future research for MVR technology. Some important adjuncts already in development include devices for automated surgical skills assessment. These tools can capture data in real-time, such as surgeon’s hand-motion efficiency and measures of manual dexterity. These data could potentially be fed back to surgeons or trainees and used to enhance performance [30]. When used in conjunction with MVR tele-mentoring, these novel technologies could prove revolutionary in remote surgical proctoring, education, and skills development. Furthermore, there appears to be a demand for such innovations. A large descriptive study conducted in association with the American College of Surgeons in 2017 surveyed more than 150 rural surgeons in which 79% reported that use of tele-simulation would be helpful in their practice and 68% expressed interest in virtual intraoperative consultation for unexpected findings [31].

There are two important limitations to this study. First, our study was powered to detect a clinically significant difference in total FLS test scores of > 10% between arms. However, the sample size was not powered a priori to adequately detect differences for each of the individual FLS tasks. Therefore, we can only conclude in the non-inferiority between training arms on the total FLS scores and not scores on each of the individual FLS tasks. Despite this limitation, improvements were observed for the majority of FLS tasks when pairwise comparisons were made between controls and each of the training arms. Improvement in the peg transfer task and pattern cutting task were numerically higher than controls, but were not statistically significant. It is possible that with more power, a difference in performance for these tasks would have been measured as well.

Second, this study was completed at an academic center equipped with reliable, high-speed internet connectivity that facilitated a strong connection between the mentor and mentee while using the MVR platform. Without such a powerful internet connection, this remote collaboration would have been potentially more difficult. This factor limits applicability of MVR to rural and remote settings where internet quality can be poor. Applicability to developing countries may also be an issue. However, ongoing globalization and advances in fiber and wireless internet such as LTE and 4G are expanding high-speed internet access more broadly [32]. Therefore, we expect in the not-too-distant future that high-speed internet will be available near universally, potentially facilitating the dissemination of distant virtual tele-mentoring.

Conclusion

In summary, our prospective, randomized, controlled study has demonstrated that the use of a mobile MVR platform is a non-inferior method of teaching and assessing the FLS skills compared to in-person instruction. In doing so, it builds on the existing literature on remote proctoring and collaboration, provides evidence for deeper integration of remote instruction of laparoscopic surgical skills, and paves the way for inclusion of an MVR platform in surgical education. With the aid of future work, including the development of automated surgical skills assessment and virtual collaboration, MVR may be a method of safely enhancing global connectivity, while remaining physically distant.