Understanding the relationship between operator physiology and performance during robot-assisted surgery (RAS) has the potential to improve operating outcomes, surgical education, training, and skills assessment [1,2,3]. Time and frequency-domain electrokardiogram (EKG) metrics provide high-temporal resolution measures of surgeon physiology that are robust to sensing challenges in dynamic operating room (OR) environments [4]. These metrics have been associated with cognitive and affective features such as mental workload, acute stress, and cognitive fatigue that are known to impact surgical performance [5,6,7]. To explore the relationships between physiology and performance, subjective, manual ratings of surgical performance or error are then post-processed and correlated with physiological data and their implied cognitive or affective features [8,9,10]. Developing a more robust relationship between intraoperative EKG metrics and surgical performance, however, will require rater-independent, objective error annotation, high-temporal resolution error identification, and precise alignment of error markers with physiological signals. The present study seeks to develop on-line, automated detection and alignment of intraoperative errors with EKG signals to identify relationships between EKG statistics and surgical performance in real-time. Results suggest that operator EKG statistics change significantly during intraoperative errors and the magnitude of these changes varies with operator skill level.

Prior works examining EKG statistics in the context of surgical performance typically consist of retrospective analyses that often aggregate physiological metrics over the entire surgical procedure and use manual rating schemes to characterize per-procedure surgical performance. For example, Langelotz et al. associated perceived stress levels in surgeons during a 24-h shift with heart rate variability (HRV) metrics [11] while Bohm et al. examined HRV to measure mental strain in surgeons during laparoscopic surgery [7]. Joseph et al. used subjective measures of cognitive load to examine physiological differences in novice vs. expert surgeons during emergency surgery [12]. Wetzl et al. measured HRV and salivary cortisol levels to determine whether physiological indicators of stress were predictive of surgical performance during simulated OR crises [13]. Amirian et al. examined the correlations between HRV and laparoscopic simulation performance [14]. All of these studies use aggregated physiological statistics and subjective performance outcomes. Higher temporal resolution experiments examining intraoperative, rather than per-procedure performance and physiology, have been published but are largely reliant on retrospective, rather than on-line or real-time, analysis due to the need for manual annotation of surgical performance by expert surgeons [15].

The aim of this study is to analyze intraoperative time and frequency domain EKG statistics of operators participating in RAS simulation using automated annotation of surgical videos for performance metrics. Changes in EKG metrics are compared relative to personalized baselines during periods of operator error to those captured during periods of no error. These differences are then analyzed to examine the differences in EKG statistics for high and low-performing operators during error. It is hypothesized that changes in EKG metrics relative to baseline will be significantly different during error compared to no error and that changes in relative EKG metrics during errors would differ across levels of performance.

Materials and methods

Participants

Twenty participants were recruited for this study under institutional IRB approval. Of these participants, 5 were non-medical graduate students and the remainder were general surgery residents at various levels of training (Table 1). All participants were naïve to the hypotheses of the study prior to participation.

Table 1 Participant demographics

Surgical system, simulation software and sensors

The da Vinci Xi robotic system console (Intuitive Surgical Ltd.) with the da Vinci Skills Simulator (DVSS) software (Surgical Science Sweden AB) was used for this study. The operator’s point-of-view (POV) video from the DVSS was recorded using a video capture card with a recording rate of 30 frames-per-second (fps). For EKG recording, we used a Polar H10 (Polar Inc.) chest-strap monitor with a sampling rate of 130 Hertz (Hz).

Protocol

Each participant completed three simulation tasks during a single sitting without breaks. These tasks were completed in the same order for all participants. The tasks utilized were “Ring Rollercoaster 1,” “Ring Rollercoaster 3,” and “Wrist Articulation 1” in this order for all participants.

A Polar EKG monitor was fitted on each participant prior to beginning the first task and checked for appropriate positioning and data transfer. Four minutes of baseline EKG readings were then recorded as each participant sat still at the operating console. Following baseline recording, each participant was directed to start the first surgical simulation task and given no other instruction. After completion of the first task, each participant was directed to start the second simulation, also without additional instruction or feedback. After completion of the second simulation, each participant was directed to start the third and final simulation, also without additional instruction or feedback. Following completion of the final simulation, each participant was instructed to complete a demographics survey after which the EKG monitor was removed and the system was reset for future participants.

Data capture, feature extraction, and synchronization

Both the video data stream and the EKG data stream were recorded on the same external computer. Operator POV videos were captured at a frame rate of 30 fps. Each frame was programmatically annotated for the presence or absence of error using pixel-based frequency and intensity filters, Hough transforms, and canny edge detection. Frame-by-frame error annotation provided 30 Hz resolution for intraoperative error.

EKG data was captured at a sampling rate of 130 Hz. Interbeat interval (IBI), standard deviation of N–N interval (SDNN), root-mean-square successive differences in N–N interval (RMSSD), low frequency (LF) signal (0.04–0.15 Hz), and high frequency (HF) signal (0.15–0.40 Hz) features were calculated from raw EKG data [16]. SDNN and RMSSD were calculated using a 30 s (s) sliding window [17] with a single sample stride resulting in trailing 30 s SDNN and RMSSD measures at 130 Hz resolution.

LF and HF signal components were isolated from raw data using Butterworth bandpass filters with cascaded second-order sections and passbands of (0.04–0.15 Hz) and (0.15–0.40 Hz), respectively. LF and HF signal powers were calculated using root-mean-square (RMS) amplitude.

Data streams were synchronized using Lab Streaming Layer [18]. For simulation measures, IBI, RMSSD and SDNN metrics were associated with the time-matched video frame from the operator POV. Each video frame’s duration (0.033 s) was used as a non-overlapping window for LF and HF RMS power. Baseline IBI, RMSSD, and SDNN metrics were averaged over the entire 4-min baseline recording. Baseline LF and HF RMS power were computed using the average RMS power across non-overlapping 0.033 s windows.

Measures

Time-domain EKG statistics

IBI reflects the amount of time between successive heart beats (Fig. 1). A decrease in IBI is equivalent to an increase in heart rate which is a marker of increased sympathetic autonomic nervous system (SANS) activity or decreased parasympathetic autonomic nervous system (PANS) activity [19, 20]. Increased SANS activity has been associated with increased stress, mental and physical workload, and suppression of negative emotions [21].

Fig. 1
figure 1

Time-Domain EKG Statistics. Interbeat interval (IBI) measures the time (ms) between two heart beats as indicated by the N–N interval. Standard deviation of N–N intervals (SDNN) measures the standard deviation of N–N intervals as measured by IBIs over a specified window of time. RMSSD measures the root mean square of successive differences in N–N intervals as measured by successive differences in IBIs over a window of time

Increases in SDNN are associated with increased vagal tone and therefore increased PANS activity [22]. Vagal tone and increased PANS activity has been correlated with resilience to mental, emotional, and physical stress as well as lower cognitive workload [23,24,25]. Increases in RMSSD are also associated with higher vagal tone and PANS activity [26].

Frequency-domain EKG statistics

LF power (Fig. 2) is an indicator of mixed SANS and PANS activity as well as baroreflex activity whereas HF power reflects vagal tone [26]. Increases in LF are associated with higher mental workload [27], and decreases in HF are associated with stress, panic, anxiety, or worry [28].

Fig. 2
figure 2

Time and frequency-domain EKG statistics. After acquiring a raw EKG signal, the raw signal is passed through a low-frequency bandpass filter (0.04–0.15 Hz) to obtain the LF component of the raw EKG signal. To obtain the HF component of the raw EKG signal, the raw signal is passed through a high-frequency bandpass filter (0.15–0.4 Hz). The RMS amplitudes of the LF and HF components are used to estimate the power of each component

Statistical analyses

All participants

EKG metrics relative to individual baselines were associated with intraoperative errors using separate linear mixed effects models for each metric. Intraoperative errors were modeled as a fixed effect and individual participant variance as a random effect. Identically structured models were used for each relative frequency-domain metric (LF RMS power, HF RMS power).

High and low performers

All participants were assigned to either a high, middle or low performance group based on the sum total of their simulation scores for all three tasks (Table 2). Simulation scores for each task are generated by the DVSS based on aggregated simulation statistics including time to completion, instrument distance traveled, instrument errors, and manipulation errors. The high performance group included the top-third (7 participants) by score. The low performance group included the bottom-third (7 participants) by score. The remainder of the participants were assigned to the middle performance group.

Table 2 Group demographics

High Performer and Low Performer groups were analyzed separately. For each group, the association between relative time-domain (IBI, SDNN, RMSSD) EKG metrics and intraoperative error were modeled using separate linear mixed effects models for each time-domain metric. Intraoperative error was modeled as a fixed effect and individual participant variance as a random effect. Identically structured models were used for each relative frequency-domain metric (LF RMS power, HF RMS power).

High vs. low performers

To compare High Performer and Low Performer group physiology during intraoperative error, separate linear mixed effects models were used for all relative time and frequency-domain metrics. Group membership was modeled as a fixed effect using Low Performer as reference level, and we modeled participant level variance as a random effect.

Multiple comparison correction

We independently tested all 5 EKG metrics (IBI, SDNN, RMSSD, LF RMS, HF RMS) for all 4 groupings (all participants, high performers, low performers, high vs. low performers). Our uncorrected significance level was P = 0.01. Given a total of 20 independent tests in this study, we conservatively corrected our significance level by a factor of 20 to P = 0.0005.

Results

All participants

Relative IBI for all participants decreased 0.15% (Standard Error: 3.60e−04; P = 3.25e−05) during error as compared to no error. Relative SDNN decreased 3.08% (S.E.: 1.60e−03; P < 2e−16), and relative RMSSD decreased 1.19% (S.E.: 2.63e−03; P = 5.66e−06) during error as compared to no error. Relative LF RMS power increased 1.44% (S.E.: 2.34e−03; P = 8.38e−10), and relative HF RMS power increased 5.51% (S.E.: 1.95e−03; P = 2e−16) during error as compared to no error (Table 3 and Fig. 3).

Table 3 Changes in EKG metrics during error
Fig. 3
figure 3

Results for all analyses across all EKG metrics. *P < 0.0005

High and low performers

In the High Performer group, relative SDNN and relative RMSSD decreased 6.3% (S.E.: 2.846e−03, P < 2e−16) and 5.8% (S.E.: 3.683e−03, P < 2e−16), respectively, during error as compared to no error while relative IBI showed no significant change. Relative LF and HF RMS powers increased 1.1% (S.E.: 2.914e−03, P = 0.000230) and 3.4% (S.E.: 2.241e−03, P < 2e−16), respectively, during error as compared to no error (Table 3 and Fig. 3).

In the Low Performer group, relative IBI decreased 0.8% (S.E.: 6.027e−04, P < 2e−16) during error as compared to no error while relative SDNN and relative RMSSD increased 2.4% (S.E.: 3.284e−03, P = 5.28e−13) and 3.5% (S.E.: 5.570e−03, P = 5.24e−10), respectively. Relative LF and HF RMS powers increased 2.6% (S.E.: 3.918e−03, P = 1.81e−10) and 6.6% (S.E.: 3.975e−03, P ≤ 2e−16), respectively, during error as compared to no error (Table 3 and Fig. 3).

High vs. low performers

Differences in changes in all EKG metrics during error for high performers as compared to low performers did not meet our corrected significance threshold of P = 0.0005 (Table 3 and Fig. 3).

Discussion

Potential cognitive and affective determinants of surgical performance, including stress, mental workload, and valence, have been explored in prior research [29,30,31]. These determinants have been associated with both physiological and neurophysiological metrics including EKG statistics [32, 33]. Time and frequency-domain EKG measures are particularly relevant given their proposed mappings to cognitive and affective features [34]. Additionally, these EKG measures have been associated with surgical performance independent of any cognitive or affective mediators [30]. Surgical education and surgical performance improvement are of paramount importance to patients, surgeons, and surgical trainees. EKG-based biometrics coupled with objective intraoperative error detection offer a potential method for personalized surgical performance evaluation and targeted surgical skill improvement. Precision surgical skill evaluation and development may drive improved patient outcomes through higher operator technical proficiency. Prior work, however, neglects the potential of on-line EKG analysis coupled with automated surgical error detection to capture high temporal-resolution indicators of intraoperative success or failure.

In this study, we built and deployed a novel, on-line operating room data capture and analysis platform to test the hypotheses that relative time and frequency-domain EKG metrics of operators performing RAS would change significantly during intraoperative error. Our results only consider detected errors and the EKG statistics associated with those detected errors. While other factors, such as stress, may also influence the physiological correlates of error, stress can occur without error and error can occur without stress. Our analysis does not distinguish between the potential effects of error on stress or the potential effects of stress on error, and our results suggest that all relative time-domain metrics (IBI, SDNN, RMSSD) and all relative frequency-domain metrics (LF and HF RMS power) demonstrate significant changes during error as compared to no error. The direction of these changes largely aligns with proposed associations between EKG statistics and cognitive or affective determinants of surgical performance. Decreases in relative IBI, SDNN, and RMSSD, and increases in relative LF RMS power during intraoperative error may reflect an increase in operator stress and mental workload as well as a more negative affective state. The increase in relative HF RMS power during intraoperative error, however, may suggest a decrease in stress, panic, anxiety or worry [35].

For high performers, decreases in relative SDNN and RMSSD associated with intraoperative error are possibly reflective of greater stress, mental workload and negative valence. However, increases in relative LF RMS power, indicative of higher stress and workload, were of smaller magnitude than increases in relative HF RMS power. HF RMS power is reported to be indicative of lower anxiety or worry [35]. For Low Performers, only relative IBI decreased during error while all other metrics increased. For both groups, the direction of changes in relative IBI, LF RMS power and HF RMS power were similar, potentially indicating common physiological associations with intraoperative error regardless of performance status. However, the magnitude of these changes was not identical.

In the previous group analyses, EKG statistics during error were compared to those during no error, and the differences between error and no error were of sufficient magnitude to be detected. Examination of the differences in relative IBI and HF RMS during error for high as compared to would benefit from further investigation in a higher-powered group comparison. Given the relatively small magnitude of differences in the error compared to no error analyses, detecting even smaller magnitude differences in error compared to error across two groups would benefit from a larger cohort size.

There are several limitations of this study. First, the number of participants was neither large nor evenly distributed across skill level, potentially skewing cohort analysis. Second, the use of trailing 30 s windows for SDNN and RMSSD statistics implies that the metrics associated with a particular time point reflect cardiac activity for the entire 30 s up to and including that time point. Third, using bandpass filters and RMS power calculations for frame-by-frame LF and HF power measurements are accompanied by a tradeoff in the settling time of the filter. Fourth, we do not exclude individuals on chronotropic or inotropic medications nor do we correct for the effect of these medications. While this may impact our results, all reported EKG statistics are analyzed relative to each individual participant's personalized baseline EKG. As these baseline characteristics are captured immediately prior to the simulation tasks, the effects of medication are likely to be consistent across both baseline and simulation EKG recording periods. Finally, these results may be affected by other variables such as movement at the console and time at the console. Based on our observations, low performers tended to move and shift body positions more frequently than high performers, potentially indicative of discomfort at the console. Low performers also required more time to complete the simulation tasks.

Despite these limitations, this study shows that a novel, on-line operating room data capture and analysis platform can enable the detection of distinct physiological changes associated with intraoperative error. These changes are largely consistent with known relationships between EKG metrics and cognitive or affective factors that impact performance [36]. While this preliminary study only proposes potential biomarkers of intraoperative error, these biometric error signals have the potential to improve our understanding of surgical proficiency and surgical training. Using EKG-based metrics of error, challenge, or difficulty could help identify intraoperative tasks and procedures that pose significant challenges to operators and guide personalized training on an individual-by-individual basis. Coupling these EKG-based metrics with other markers of error, challenge or difficulty such as electroencephalogram (EEG) statistics, eye-tracking metrics, and galvanic skin response could provide a multidimensional approach to understanding, predicting, and improving surgical performance. This data highlights the possibility of using objective biometric data capture and error detection to monitor intraoperative surgical proficiency and perceived intraoperative difficulty in order to improve patient outcomes as well as potentially guide more precise, personalized RAS training curriculums. Future studies may focus on use of real-time error detection and physiological analysis to reduce error commission and mitigate negative cognitive or affective states through coaching or other skills development tools in an individualized surgical learning curriculum.