Effective visualization of the operative field is essential to successful surgery. It enables surgeons to identify diseased and healthy anatomy as well as instrument–tissue interactions as they treat disease states. Poor visualization can be costly and resulted in decreased patient safety through an increase in surgical errors [1, 2]. It is critical surgeons learn to optimally visualize the operative field just as they learn to use instruments.

Different forms of surgery use different methods to visualize patient anatomy. During open surgery, surgeons trade off invasiveness for access and visualization (i.e., a larger incision allows a direct view and interaction with anatomy but is more invasive to the patient). During minimally invasive surgery (MIS), an endoscope is used to peer inside a patient through a small incision, thereby reducing invasiveness (compared to open surgery) while maintaining or even improving how well a surgeon sees anatomy. However, this imposes new skills surgeons must learn. Manual laparoscopy requires coordination between a surgeon and an assistant, where the surgeon verbally instructs the assistant where to position the endoscope since the surgeon’s hands are dedicated to instruments. Robot-assisted minimally invasive surgery (RAMIS) removes the assistant from the workflow and returns control of the endoscope to the surgeon: the surgeon uses hand controllers to switch between controlling her instruments and her camera. It is apparent from these examples that many new MIS technologies require surgeons learn how to control an endoscope in order to achieve optimal visualization.

Commonly, MIS surgeon trainees learn how to visualize the operative field by observing experienced surgeons control their endoscopes and replicating their behaviors while receiving feedback from their mentors (i.e., an apprenticeship model [3]). Alternatively, objective rating scales can be used to evaluate how well trainees visualize their environment (see robotic control, depth perception in GEARS [4]), but these face challenges in being administered given they are time-consuming and a largely manual process involving video review. Furthermore, the apprenticeship model and objective rating scales can be inefficient given they require oversight by an experienced mentor in order for a trainee to receive feedback on his performance (although crowd-sourced objective rating scales have recently shown promise [5]). More automated, objective measures of visualization performance stand to improve training efficiency by delivering feedback to trainees even without expert supervision [6].

A primary obstacle to more automated, objective measures of performance is the ability to unobtrusively measure behavior during training or even live surgery. RAMIS is the exception; surgeon behavior can be measured unobtrusively by leveraging its tele-operative architecture, offering the potential to develop automated, objective performance measures that can be used by a surgeon throughout her training [7]. Many academic teams have used these measures to validate training exercises and set proficiency guidelines [810], as well as to develop advanced algorithms to classify skill [1113]. However, most performance measures focus on hand movements, instrument movements, environment interactions, or discrete errors and overlook measures specific to visualization through proficient endoscope control [1416]. In laparoscopy, several training paradigms have been designed specifically to teach surgeons how to visualize their environment [1719]; however, only a few performance measures focused on camera behavior have been proposed, including camera stability [20], endoscope path length [21], and horizon alignment [22]. Despite similar camera-specific exercises existing on RAMIS virtual reality simulators, objective performance measures focused specifically on endoscope control during RAMIS are lacking in virtual reality training and clinical scenarios.

In this work, we define performance metrics for endoscope control for a wide variety of existing RAMIS simulation exercises targeting many different technical skills, including endoscope control, needle driving, and instrument manipulation. We evaluate the construct validity of the newly defined metrics by comparing them between populations of novice, intermediate, and experienced RAMIS surgeons. Furthermore, we examine how well endoscope control metrics differentiate new and experienced RAMIS surgeons compared to conventional movement metrics. Finally, we offer motivation to examine these metrics clinically by correlating them to completion time, a commonly used metric to estimate proficiency in clinical procedures. In the end, we believe endoscope control metrics can improve surgeon training and ultimately visualization strategies by being incorporated into existing training protocols and proficiency standards for RAMIS trainees.

Materials and methods

Dataset

Study participants were enrolled in an Institutional Review Board-approved study. Thirty-nine RAMIS surgeons completed 25 simulation exercises using the da Vinci ® Skills Simulator (dVSS) for the da Vinci Si ® Surgical System (Intuitive Surgical Inc., Sunnyvale, CA). The exercises included: Camera Targeting—Level 1, Camera Targeting—Level 2, Dots and Needles—Level 1, Dots and Needles—Level 2, Energy Dissection—Level 1, Energy Dissection—Level 2, Energy Switching—Level 1, Match Board—Level 1, Match Board—Level 2, Match Board—Level 3, Needle Targeting, Peg Board—Level 1, Pick and Place, Ring and Rail—Level 1, Ring and Rail—Level 2, Ring Walk—Level 1, Ring Walk—Level 2, Ring Walk—Level 3, Scaling, Suture Sponge—Level 1, Suture Sponge—Level 2, Suture Sponge—Level 3, Thread the Rings, and Tubes. Participants were from multiple specialties: 11 general surgery, 16 gynecology, and 12 urology. Twenty-seven were practicing surgeons, 3 were fellows, and 9 were residents greater than PGY II. Surgeons were grouped based on expertise with 18 new, 8 intermediate, and 13 experienced RAMIS surgeons. New surgeons were defined as having completed less than 20 RAMIS procedures, intermediate surgeons between 21 and 150 RAMIS procedures, and experienced surgeons greater than 150 RAMIS procedures. New surgeons included residents, fellows, and practicing open and laparoscopic surgeons. All surgeons may have had prior experience in laparoscopic or open surgery. Each surgeon completed one trial of each exercise. The exercises were completed consecutively in a common order by all surgeons.

For each simulation exercise, kinematic and event data from the surgical system and virtual environment were recorded. The kinematic data included the movements of the hand controllers, instruments, and endoscope. The event data included all states of the da Vinci Surgical System, such as master clutch events, camera movement events, and head-in events, as well as select states of the virtual environment. In addition, the performance metrics and overall scores computed by the dVSS were recorded.

Skill assessment metrics

We defined three novel performance metrics related to how surgeons control their endoscope, and as a result how they visualize their environment, during RAMIS. We call these metrics camera metrics. The first performance metric was camera movement frequency (CFrq). It was defined as the average number of endoscope movements made by a surgeon per second over the entire exercise. The second performance metric was camera movement duration (CDur). CDur was defined as the average time in seconds of all endoscope movements over the entire exercise. Finally, the third performance metric was camera movement interval (CInt). It was defined as the average time in seconds between endoscope movements over an entire exercise.

In addition, we extracted four conventional performance metrics commonly used during simulation—overall score (OverallScore), completion time (CompTime), economy of motion (EOM), and master workspace range (MWR). OverallScore was the MScore™ used to give a single score for a given exercise by combining multiple metrics (Mimic Technologies, Inc., Seattle, WA). CompTime was defined as the total time in seconds to complete an exercise. EOM was the total distance travelled by the instruments in meters throughout an exercise. Finally, MWR was defined as 85 % of the larger of two radii in meters that represented the distance between the average hand position (in three dimensions) and each sampled position. All of these performance metrics are used in the MScore on the dVSS. Note that given the heterogeneity of the simulation exercises and associated errors, the comparison in this paper focused on a select few efficiency metrics while excluding other metrics related to efficiency and errors.

Construct validity of camera metrics

We defined construct validity as the ability of the performance metrics to differentiate populations of surgeons with varying expertise. In particular, we compared the mean performance of new, intermediate, and experienced surgeons for each camera metric as well as the overall score and completion time. Student’s t tests were used to determine significance (p < 0.05).

Camera and conventional metric comparisons

The ability of camera metrics to differentiate new and experienced surgeons across all exercises was compared to the subset of conventional metrics (see “Skill assessment metrics” section). First, the mean of performance metrics for each exercise was normalized across exercises according to Eq. (1):

$$x_{i}^{n} = \frac{{\left( {x_{i} - x_{\hbox{min} } } \right)}}{{\left( {x_{\hbox{max} } - x_{\hbox{min} } } \right)}}$$
(1)

x min and x max are the minimum and maximum, respectively, of the mean performance metrics for each exercise, x i is the mean performance metric for exercise i, and x n i is the normalized mean performance metric for exercise i. Next, the differences between the normalized mean performances of novice and experienced surgeons across all exercises were computed according to Eq. (2):

$$d = \left| {\left( {\mu_{1} - \mu_{2} } \right)} \right|$$
(2)

d is the mean difference, μ 1 and μ 2 are the mean of the normalized metrics across all exercises for two groups (i.e., new and experienced surgeons), and |·| represents the absolute value. The mean differences of normalized metrics were sorted in decreasing magnitude to illustrate their ability to differentiate new and experienced performance. A Student’s t test was used to make pair-wise comparisons across camera and conventional performance metrics (p < 0.05).

Correlation to conventional performance metrics

The correlation of camera metrics with metrics typically used to assess clinical performance was used to examine whether camera metrics could be good candidates to include in assessments of clinical performance. The correlation coefficient assuming a linear model was computed between each camera metric and both CompTime and OverallScore while including new, intermediate, and experienced surgeon data. A Student’s t test was used to determine significance (p < 0.05).

Results

Bar plots for OverallScore, CompTime, and all three camera metrics across all simulation exercises are shown in Fig. 1. Tables 1, 2, and 3 list the results from the t tests comparing the camera metric means for new, intermediate, and experienced RAMIS surgeons. Across all exercises except Scaling, experienced surgeons achieved a significantly higher OverallScore than new surgeons. Similarly, intermediate surgeons achieved a significantly higher OverallScore than new surgeons for 20/25 exercises. In three exercises—Energy Switching—Level 1, Suture Sponge—Level 2, and Tubes—experienced surgeons achieved a significantly higher OverallScore than intermediate surgeons.

Fig. 1
figure 1

Construct validity of conventional metrics and camera metrics. A Overall score, B completion time, C camera movement frequency (CFrq), D camera movement duration (CDur), and E camera movement interval (CInt). Horizontal bars indicate significant differences between surgeon groups (p < 0.05)

Table 1 Mean comparisons and correlation coefficients for camera movement frequency (CFrq) across all simulation exercises
Table 2 Mean comparisons and correlation coefficients for camera movement duration (CDur) across all simulation exercises
Table 3 Mean comparisons and correlation coefficients for camera movement intervals (CInt) across all simulation exercises

Experienced surgeons performed all exercises significantly faster than new surgeons. Intermediate surgeons performed 17/25 exercises significantly faster than new surgeons. There were no significant differences in CompTime across the exercises between intermediate and experienced surgeons.

Experienced surgeons had significantly higher CFrq than new surgeons for all but two exercises—Pick and Place (p = 0.6933) and Suture Sponge—Level 1 (p = 0.1332) (Table 1). In 22/25 exercises, intermediate surgeons had significantly higher CFrq than new surgeons. There were no significant differences in CFrq between intermediate and experienced surgeons.

Experienced surgeons had significantly shorter CDur than new surgeons for all exercises except Camera Targeting—Level 1 (p = 0.4062), Peg Board—Level 1 (p = 0.0779), and Pick and Place (p = 0.0882) (Table 2). In 15/25 exercises, intermediate surgeons had significantly shorter CDur than new surgeons. Finally, in 4/25 exercises, experienced surgeons had significantly shorter CDur than intermediate surgeons.

In 19/25 exercises, experienced surgeons had significantly shorter CInt than new surgeons whereas intermediate surgeons had significantly shorter CInt than new surgeons in 17/25 exercises (Table 3). There were no significant differences in CInt between intermediate and experienced surgeons.

The mean differences of normalized metrics illustrated CDur and CompTime best differentiated new and experienced surgeons across all exercises (Fig. 2). The mean difference in CDur was significantly different than the mean difference in CFrq, CInt, EOM, and MWR. The mean difference in CompTime was significantly different than the mean difference in CFrq, EOM, and MWR. CFrq significantly differentiated experienced and new surgeons better than MWR but not EOM. The mean difference in CInt was not significantly different than the mean difference in CFrq, EOM, or MWR.

Fig. 2
figure 2

Difference between novice and expert surgeons mean normalized metrics across all exercises (ordered by magnitude). Gray bars correspond to viewpoint metrics. Black bars correspond to conventional performance metrics. Error bars represent +1 SD. Top brackets indicate significant differences between metrics (p < 0.05)

Individual metric correlations between CompTime and OverallScore are listed in Table 1 (CFrq), Table 2 (CDur), and Table 3 (CInt). CompTime was significantly correlated with CFrq in 21/25 exercises, CDur in 21/25 exercises, and CInt in 20/25 exercises. Pick and Place and Scaling did not correlate with CompTime for any camera metrics. CFrq during Energy Dissection—Level 2 (p = 0.1151) and Peg Board—Level 1 (p = 0.0501), CDur during Camera Targeting—Level 1 (p = 0.0931) and Dots and Needles—Level 2 (p = 0.0593), and CInt during Dots and Needles—Level 1 (p = 0.0604), Energy Dissection—Level 2 (p = 0.1591), and Needle Targeting—Level 1 (p = 0.1063) did not correlate significantly with CompTime.

OverallScore was significantly correlated with CFrq in 20/25 exercises, CDur in 19/25 exercises, and CInt in 20/25 exercises. Again, Pick and Place and Scaling did not correlate with OverallScore for any camera metrics. CFrq during Dots and Needles—Level 2 (p = 0.1075), Energy Dissection—Level 2 (p = 0.1904), and Suture Sponge—Level 1 (p = 0.0985), CDur during Camera Targeting—Level 1 (p = 0.2258), Dots and Needles—Level 2 (p = 0.3259), Energy Dissection—Level 1 (p = 0.1566), and Ring Walk—Level 2 (p = 0.1100), and CInt during Dots and Needles—Level 1 (p = 0.1193), Energy Dissection—Level 2 (p = 0.1543), and Thread the Rings (p = 0.0871) did not correlate significantly with CompTime.

Discussion

Objective performance measures of RAMIS surgeon technical skills are critical to minimizing learning curves and maximizing patient safety [6, 2325]. The results presented here show construct validity of new performance metrics related to endoscope control during virtual reality simulation exercises for RAMIS (Fig. 1; Tables 1, 2, 3). Similar to conventional efficiency measures (e.g., completion time and economy of motion), the camera metrics consistently differentiated new and experienced surgeons. A few consistent exceptions existed (Pick and Place and Scaling), but these exercises may not have been challenging enough for this comparison. Further metric comparisons between new or experienced surgeons and intermediate surgeons offered a window into the learning curves of each simulation exercise; those that differentiated intermediate from experienced surgeons may be more challenging than those that do not and could be used after simpler exercises and vice versa. In addition, an aggregated analysis of the camera metrics showed they differentiated new and experienced surgeons across all tasks as well as, and sometimes better than, conventional efficiency metrics (Fig. 2). Finally, camera metrics showed strong correlation between OverallScore and CompTime (a metric used to evaluate efficiency in clinical scenarios) (Tables 1, 2, 3). This suggests that camera metrics could be used to evaluate procedural performance; however, additional validation studies are needed. This result combined with the results of construct validity across most exercises suggests endoscope control is an essential underlying technical skill for many types of surgical tasks, such as camera control, Endowrist ® manipulation, needle driving, and energy and dissection. Given endoscope control is intrinsically linked to effective visualization, surgeon competency defined using camera metrics could be helpful in ensuring safe and effective surgery.

Although we show that camera metrics are important indicators of RAMIS technical skill, we do not know exactly why experienced surgeons adopt the specific behavior when controlling the endoscope. Could there be optimal camera positions for specific tasks simply to assist with surveillance of the operative field? Alternatively, could camera movements be exploited by experienced surgeons to extract relevant visual information from their environment, such as depth information [26, 27] or estimates of interaction forces [28]? One hypothesis is current RAMIS systems do not include haptic feedback, and therefore, surgeons might rely on visual cues to estimate interaction forces accurately [29]. Another hypothesis is that the viewpoint influences the ease by which experienced surgeons make subsequent movements. This could be a result of better visualization as well as relative position and orientation of their instruments and the environment (e.g., anatomy and needle). Future research studies—both in controlled laboratory and applied clinical settings—should examine the underlying causes of these endoscope control behaviors so that future training scenarios and RAS technologies could be optimized to surgeon sensorimotor control.

Thorough characterization of endoscope control might also be useful for technology development. Automated and objective measures of endoscope control could be used in intelligent tutoring systems to deliver formative feedback during surgeon training [30]. Such systems have the potential to consistently remind inexperienced surgeons to optimize how they visualize patient anatomy and their instruments without requiring an expert surgeon or instructor to be present. Similarly, several research teams have developed robot arms and algorithms to give control of the endoscope to surgeons during conventional laparoscopy [3133] and to automate control during RAMIS [34, 35]. These laparoscopic systems remove the need for the surgeon to verbally instruct an assistant how to adjust the endoscope, whereas the RAMIS systems remove the need to control the endoscope altogether. It will be imperative that these systems remain flexible enough to accommodate the sensory demands of surgeons and do not inherently limit a surgeon’s ability to optimize his view of the operative field, which could increase the likelihood of technical errors.

Several limitations exist with this research study. First, the simulation exercises are relatively simple and involved a subset of technical skills compared to an actual clinical procedure. Similarly, the simulation exercises contain different visual information than live tissue. Live tissue has soft, shiny, and whispy structures that the simulation exercises do not replicate. It would be interesting to reproduce the same camera metrics during clinical procedures where surgeons might experience familiar anatomy, surgical planning, or other cognitive demands that could influence how and why they choose a certain viewpoint of the operative field. Finally, the viewpoint measures used in this study were simply the gross positions of the endoscope. Additional examinations of surgeon viewpoint could examine specific aspects of the field of view (the extent of the observable anatomy that a surgeon sees with a particular endoscope position), point of view (the direction from which the specific anatomy within the field of view are viewed), and focal point (the specific point of interest within the view).

Despite these limitations, camera metrics might be helpful for discriminating surgeon skill or setting proficiency standards if incorporated into existing RAMIS simulation training exercises, such as the dVSS, dV-Trainer ® (Mimic Technologies, Inc.), RobotiX Mentor™ (3D System, Inc.), and RoSS™ (Simulated Surgical Systems, LLC). For scenarios outside of simulation where data might not be recorded directly from a RAMIS platform, camera metrics could be further emphasized by expert trainers, proctors, and attending surgeons, possibly through supplements to existing objective rating scales. Such scenarios might include dry laboratory exercises, wet laboratory training tasks, and clinical procedures. Interestingly, it is possible to replicate the camera metrics presented here for any type of task on RAMIS platforms by extracting the icons indicating camera control that normally appear on the surgeon’s screen or by using image processing algorithms to analyze changes in viewpoint. In this way, future efforts toward automated, objective evaluation are not limited to those research teams with access to internal data from RAMIS platforms.

In the end, we show that camera metrics are compelling RAMIS surgeon performance measures, comparable to many conventional, efficiency metrics focused on time and instrument movements. We believe that they could be used to improve current RAMIS surgeon training paradigms and proficiency-based curricula. By encouraging surgical trainees to exhibit optimal endoscope control, we could continue to improve patient safety.