Unconstrained lightweight control interface for robot-assisted minimally invasive surgery using MediaPipe framework and head-mounted display

Rhee, Wounsuk; Kim, Young Gyun; Lee, Jong Hyeon; Shim, Jae Woo; Kim, Byeong Soo; Yoon, Dan; Cho, Minwoo; Kim, Sungwan

doi:10.1007/s10055-024-00986-1

Unconstrained lightweight control interface for robot-assisted minimally invasive surgery using MediaPipe framework and head-mounted display

Original Article
Open access
Published: 23 May 2024

Volume 28, article number 114, (2024)
Cite this article

Download PDF

You have full access to this open access article

Virtual Reality Aims and scope Submit manuscript

Unconstrained lightweight control interface for robot-assisted minimally invasive surgery using MediaPipe framework and head-mounted display

Download PDF

294 Accesses
Explore all metrics

Abstract

Robotic surgery is preferred over open or laparoscopic surgeries due to its intuitiveness and convenience. However, prolonged use of surgical robots can cause neck pain and joint fatigue in wrist and fingers. Also, input systems are bulky and difficult to maintain. To resolve these issues, we propose a novel input module based on real-time 3D hand tracking driven by RGB images and MediaPipe framework to control surgical robots such as patient side manipulator (PSM) and endoscopic camera manipulator (ECM) of da Vinci research kit. In this paper, we explore the mathematical basis of the proposed 3D hand tracking module and provide a proof-of-concept through user experience (UX) studies conducted in a virtual environment. End-to-end latencies for controlling PSM and ECM were 170 ± 10 ms and 270 ± 10 ms, respectively. Of fifteen novice participants recruited for the UX study, thirteen managed to reach a qualifiable level of proficiency after 50 min of practice and fatigue of hand and wrist were imperceivable. Therefore, we concluded that we have successfully developed a robust 3D hand tracking module for surgical robot control and in the future, it would hopefully reduce hardware cost and volume as well as resolve ergonomic problems. Furthermore, RGB image driven 3D hand tracking module developed in our study can be widely applicable to diverse fields such as extended reality (XR) development and remote robot control. In addition, we provide a new standard for evaluating novel input modalities of XR environments from a UX perspective.

Computer vision-based hand gesture recognition for human-robot interaction: a review

Article Open access 19 July 2023

Human-Robot Interaction in Rehabilitation and Assistance: a Review

Article 11 August 2020

Actuation Technologies for Soft Robot Grippers and Manipulators: A Review

Article Open access 20 May 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Minimal invasiveness has become a key paradigm in modern surgery (Berggren et al. 1994; Mack 2001; Fuchs 2002; Lanfranco et al. 2004; Gemmill and McCulloch 2007; Nezhat 2008; Townsend et al. 2016; Martínez-Pérez et al. 2017; Nouralizadeh et al. 2018). Over the past few decades, minimally invasive surgery has proven to be superior to open surgery in terms of various clinical outcome indicators, such as intraoperative blood loss, perioperative pain, postoperative complications, hospitalization period, and patients’ aesthetic satisfaction (Berggren et al. 1994; Fuchs 2002; Gemmill and McCulloch 2007; Townsend et al. 2016; Martínez-Pérez et al. 2017; Nouralizadeh et al. 2018). However, the environment for conducting minimally invasive surgery was not surgeon-friendly since it was uncomfortable and was constrained by difficult instrument control (Mack 2001; Fuchs 2002; Lanfranco et al. 2004; Lawson et al. 2007; Nezhat 2008; Townsend et al. 2016). To overcome these difficulties and maximize the benefits of minimally invasive surgery, robot-assisted minimally invasive surgery (RAMIS) was devised (Mack 2001; Lanfranco et al. 2004; Townsend et al. 2016).

RAMIS platforms, like the remarkable da Vinci surgical system, enable intuitive manipulator control by implementing a system composed of a surgeon console and a patient side robot (DiMaio et al. 2011; Douissard et al. 2019). The movements of the user are input via master tool manipulators (MTMs) and processed to control multiple patient side manipulators (PSMs) and an endoscopic camera manipulator (ECM). Current RAMIS platforms increase the surgeon’s dexterity and precision by adding extra wrist joints to the laparoscopic instruments and implementing tremor filtration algorithms as well as motion scaling (Mack 2001; DiMaio et al. 2011; Douissard et al. 2019). In addition, a stereoscopic endoscope and stereo viewer provide the surgeon with an in-depth binocular view of the operative field. Therefore, RAMIS allows surgeons to easily learn and comfortably perform submillimeter procedures, thereby decreasing complication rates for some surgeries that require delicate instrument manipulation (Ficarra et al. 2007; Nezhat 2008; Gala et al. 2014; Crippa et al. 2021). Nevertheless, various crucial weaknesses of RAMIS persist, including low cost-effectiveness, fatigue owing to non-ergonomic postures, extended operation time, and risk of intraoperative nerve compression from bulky hardware (Mack 2001; Lanfranco et al. 2004; Meijden and Schijven 2009; Mills et al. 2013; Koehn and Kuchenbecker 2015; Townsend et al. 2016; Wee et al. 2020). In this study, we mainly addressed problems related to ergonomics in RAMIS.

Although RAMIS is more comfortable than conventional laparoscopic surgery, surgeons still suffer from fatigue and pain in the back and joints of the upper extremities (Lawson et al. 2007; Lux et al. 2010; Hullenaar et al. 2017; Lee et al. 2017; Wee et al. 2020). Studies have shown that prolonged pain-inducing postures are adopted during extensive surgeries and approximately 20% of surgeons have advocated for improvements in the console design, thus leading to various solutions (Lawson et al. 2007; Lux et al. 2010; Hullenaar et al. 2017; Lee et al. 2017; Wee et al. 2020). Direct adjustments to the surgeon console, such as changing the viewer angle and arm rest height, have improved the user experience (UX), but it was reported that there existed some surgeons who could not be fully satisfied with such adjustments (Lux et al. 2010; Hullenaar et al. 2017). Alternatively, new systems have been developed by implementing new hardware. A head-mounted display (HMD) is one of the most actively studied vision systems as an alternative to the stereo viewer (Qian et al. 2018; Dardona et al. 2019; Jo et al. 2020; Danioni et al. 2022; Kim et al. 2022). Compared with a conventional stereo viewer, surgeons using an HMD can conduct procedures in a more upright position and have reported reduced neck and back fatigue in UX studies (Dardona et al. 2019; Kim et al. 2022).

Finger and wrist pain is also a significant ergonomic issue of contemporary RAMIS systems (Lee et al. 2017). In order to control ECM or PSMs, surgeons should grab and move the end-effector of MTMs. Although a gravity compensating mechanism is implemented to reduce the force applied to the fingers, it has been reported that extensive surgeries lasting for hours can cause a significant amount of hand and wrist pain, potentially affecting surgeons’ performance (Lee et al. 2017). Consequently, a need for a novel input mechanism for RAMIS systems emerged. Wearable devices such as motion gloves have been proposed as a substitute for MTMs (Danioni et al. 2022). However, they are too bulky for users to express delicate movements, their battery capacity is not sufficient for extensive robot surgeries lasting for hours, and cost issues can pose a significant barrier to a widespread use. Currently, deep learning based real-time hand gesture recognition is actively studied in the field of computer vision (Amprimo et al. 2023; Kavana and Suma 2022; Buckingham 2021). Also, in conjunction with HMD, hand tracking based on sets of RGB images is becoming an emerging input modality for users to intuitively interact with virtual reality environments (Amrani et al. 2022; Nikitin et al. 2020; Han S et al. 2020). However, to our knowledge, application of computer vision based hand tracking technology for controlling RAMIS systems has not been reported.

We propose an unconstrained lightweight control interface (ULCI) to reduce finger and wrist pain caused by MTMs by implementing contactless hand tracking algorithm for manipulator control. As depicted in Fig. 1, the three-dimensional (3D) locations of hand landmarks are tracked in real-time through images obtained from multiple cameras. The images are processed to control virtual manipulators that are displayed on the HMD. We report proof of the concept by introducing underlying mechanisms of the ULCI. Then, the interface was assessed both qualitatively and quantitatively through trajectories and latencies of virtual PSMs and ECM. Finally, we recruited naïve participants to conduct UX studies composed of several tasks and a set of questionnaires to investigate feasibility and ergonomic properties of the ULCI.

2 Materials and methods

2.1 Components of the unconstrained lightweight control interface

2.1.1 Overview

ULCI is a surgical robot control platform based on a real-time computer vision driven hand tracking system and an HMD. As shown in Fig. 2, the ULCI can be divided into three units according to information flow: input, robot control, and visualization units.

2.1.2 Input unit

The input unit comprises several USB cameras (C930E Business Webcam, Logitech, Newark, CA, USA) that stream images at 30 frames per second (FPS) and a pair of customized foot pedals connected to an Arduino Uno board (Arduino Uno Rev3, Arduino, Somerville, MD, USA). We focused on adopting widespread devices to the ULCI in order to allow maximum flexibility for further development. For example, by using USB cameras, various up-to-date computer vision algorithms can be implemented and additional sensors and input devices can be interfaced with the Arduino board. In this study, we placed four cameras at points surrounding and pointing down to the user (surgeon), as shown in Fig. 3a, to acquire images of the user’s hands from a wide range of viewpoints and to minimize the risk of malfunction owing to a blind spot. Although the settings can be configured according to the user’s preferences, the settings were kept fixed in this study. Input signals from cameras and pedals were sent to a computer via a USB connection for integration, data processing, and robot control.

2.1.3 Robot control unit

Robot control is implemented in a Linux computer with the following specifications: Ubuntu 20.04 LTS operating system, Intel^® Core i7-12700K (5.0 GHz, 12 cores) processor with 64 GB of DDR4 RAM, and MSI GeForce RTX™ 3080 Gaming Z 12G graphics processing unit (GPU). Input images from cameras are processed in parallel through the MediaPipe framework (version 0.8.10) with OpenGL GPU support to return two-dimensional coordinates of hand landmarks for both hands viewed from each camera (Lugaresi et al. 2019; Zhang et al. 2020). Then, these data and input signals from the foot pedals are integrated into a Robot Operating System (ROS) environment developed using open-source software for the da Vinci research kit (dVRK) (Kazanzides et al. 2014; Chen et al. 2017). The ULCI was built with the ROS Noetic Ninjemys version which is compatible with Linux Ubuntu 20.04 operating system. Sets of 2D coordinates of hand landmarks obtained from USB cameras via MediaPipe are used to configure the 3D positions of each hand landmark and determine the joint states of the manipulators consisting dVRK, especially PSM and ECM, at a loop rate of 100 Hz. Finally, the joint states are sent to the visualization unit via TCP communication using ROS TCP endpoint and connector modules, which are open-source python scripts provided by Unity Robotics (version 0.7.0) (Codd-Downey et al. 2014; Hussein et al. 2018). The mechanisms for collecting the 3D coordinates of each hand landmark and manipulating PSMs and ECM are introduced in detail in the following sections.

2.1.4 Visualization unit

We developed a virtual world and performed visualization on a Windows computer with the following specifications: Windows 10 operating system, Intel^® Core i9-9900K (3.6 GHz, 8 cores) processor with 64 GB of DDR4 RAM, and NVIDIA GeForce RTX 2070 GPU. A virtual world was built using Unity (version 2020.3.36f1), and we modified the open-source dVRK-XR package to visualize and render the movements of a virtual surgical robot in the Unity scene (Qian et al. 2019). Two PSMs and one ECM based on open-source 3D models, as shown in Fig. 4, were imported to the scene (Kazanzides et al. 2014; Chen et al. 2017; Qian et al. 2019). Finally, stereoscopic videos seen from the tip of the virtual ECM were streamed to a VIVE Pro Eye HMD (HTC, Taoyuan, Taiwan), which served as the main display of the proposed ULCI.

2.2 Hand tracking in 3D space

2.2.1 Naming conventions for vectors and coordinate systems

We detail the mathematical and mechanical bases of the ULCI. First, consider the following naming conventions for vectors and frames. Symbols $\overrightarrow{v}$ and $\widehat{v}$ denote a vector and its directional unit vector in 3D space, respectively. An uppercase letter enclosed in braces { } refers to a coordinate system (e.g., $\left\{O\right\}$). Symbol ${\overrightarrow{v}}^{\left\{O\right\}}$ represents a 3 × 1 matrix corresponding to a vector $\overrightarrow{v}$ expressed in frame $\left\{O\right\}$. In addition, ${R}_{AB}$ and ${T}_{AB}$ are the 3 × 3 rotation and 4 × 4 transformation matrices from frames $\left\{A\right\}$ to $\left\{B\right\}$, respectively.

2.2.2 Localization of hand landmarks

MediaPipe is a pipeline-based framework developed by Google for building artificial neural networks in a modular architecture (Lugaresi et al. 2019). It allows developers to integrate various calculators, which are modules executing specific functions, in a graph structure to develop various applications. MediaPipe provides calculators for human motion tracking including face mesh, hand tracking, and pose detection. We mainly used MediaPipe Hands with OpenGL GPU support to quickly track hand position and orientation. This module detects multiple hands and their handedness in an image and tracks 21 anatomical landmarks per hand as illustrated in Fig. 5 (Zhang et al. 2020).

Although MediaPipe provides 3D coordinates of the landmarks with respect to the camera frame, they are not very reliable for use in surgical robot control owing to the intrinsic difficulty of depth perception from a single two-dimensional image. Therefore, we used $N=4$ cameras in different poses to precisely estimate the 3D coordinates of a point of interest ($A$). Coordinate frames for each camera and its position vector in the fixed laboratory (global) frame, $\left\{O\right\}$, are denoted as $\left\{{C}_{i}\right\}$ and ${\overrightarrow{c}}_{i}$, respectively, as shown in Fig. 6. In addition, ${\widehat{v}}_{i}^{\left\{{C}_{i}\right\}}$ is a directional unit vector pointing from the $i$-th camera to point $A$ expressed in the coordinate frame $\left\{{C}_{i}\right\}$. Vector ${\widehat{v}}_{i}$ is determined from the relative location of a tracked point seen in the view of the i-th camera. For example, if the tracked point is at the center of the view of the i-th camera, ${\widehat{v}}_{i}$ would be parallel to the z-axis of {C_i}, resulting in ${\widehat{v}}_{i}^{\left\{{C}_{i}\right\}}$ being equal to [0, 0, 1]^T. Then, the position vector of $A$, denoted as $\overrightarrow{r}$, is estimated as the vector that minimizes the sum of squared distances between $A$ and points on lines ${l}_{i}:{\overrightarrow{c}}_{i}+{k}_{i}{\widehat{v}}_{i} ({k}_{i}\in \mathbb{R},\mathbb{ }1\le i\le N)$, as given in Eq. (1). This technique guarantees stable solutions tolerable to some level of measurement errors due to the convex nature of the target quadratic function.

$$\overrightarrow{r}=\underset{\overrightarrow{x}}{\text{argmin}}f\left(\overrightarrow{x}, {k}_{1}, {k}_{2}, \dots, {k}_{N}\right)=\underset{\overrightarrow{x}}{\text{argmin}}\sum _{i=1}^{N}|\overrightarrow{x}-({\overrightarrow{c}}_{i}+{k}_{i}{\widehat{v}}_{i}){\left.\right|}^{2}$$

(1)

Setting the partial derivatives of $f\left(\overrightarrow{x}, {k}_{1},{k}_{2}, \dots, {k}_{N}\right)$ with respect to variables ${k}_{j} (1\le j\le N)$ to zero, we obtain Eqs. (2) and (3), where ${k}_{j}^{\circ }$ is the value of ${k}_{j}$ that minimizes $f\left(\overrightarrow{x}, {k}_{1},{k}_{2}, \dots, {k}_{N}\right)$ for a given $\overrightarrow{x}$.

$$0=\frac{\partial f}{\partial {k}_{j}}{\left.\right|}_{{k}_{j}={k}_{j}^{\circ }}=2{k}_{j}^{\circ }-2\left(\overrightarrow{x}-{\overrightarrow{c}}_{j}\right)\circ {\widehat{v}}_{j}$$

(2)

$${k}_{j}^{\circ }=\left(\overrightarrow{x}-{\overrightarrow{c}}_{j}\right)\circ {\widehat{v}}_{j}$$

(3)

Then, under the set of ${k}_{j}^{\circ }$ obtained from Eq. (3) and looking from the laboratory frame $\left\{O\right\}$, Eq. (4) can be derived from Eq. (1), where matrices ${M}_{i}$ are defined by Eq. (5) and ${\widehat{v}}_{i}^{\left\{O\right\}}$ can be obtained from Eq. (6) (Lynch and Park 2017). TOC_i is a 4 by 4 transformation matrix from coordinate frame {O} to coordinate frame {C_i}.

$$\begin{aligned}& f\left({\overrightarrow{x}}^{\left\{O\right\}}\right)\\&\quad= \sum _{i=1}^{N}{\left|{\overrightarrow{x}}^{\left\{O\right\}}-{\overrightarrow{c}}_{i}^{\left\{O\right\}}-\left\{\left({\overrightarrow{x}}^{\left\{O\right\}}-{\overrightarrow{c}}_{i}^{\left\{O\right\}}\right)\circ {\widehat{v}}_{i}^{\left\{O\right\}}\right\}{\widehat{v}}_{i}^{\left\{O\right\}}\right|}^{2}\\&\quad= \sum _{i=1}^{N}{\left|{M}_{i}\left({\overrightarrow{x}}^{\left\{O\right\}}-{\overrightarrow{c}}_{i}^{\left\{O\right\}}\right)\right|}^{2}\end{aligned}$$

(4)

$${M}_{i}=\left[\begin{array}{ccc}1& 0& 0\\ 0& 1& 0\\ 0& 0& 1\end{array}\right]-{\widehat{v}}_{i}^{\left\{O\right\}}{\widehat{v}}_{i}^{\left\{O\right\}T}$$

(5)

$$\left[\begin{array}{c}{\widehat{v}}_{i}^{\left\{O\right\}}\\ 1\end{array}\right]={T}_{O{C}_{i}}\left[\begin{array}{c}{\widehat{v}}_{i}^{\left\{{C}_{i}\right\}}\\ 1\end{array}\right]$$

(6)

Furthermore, setting the partial derivative of $f\left({\overrightarrow{x}}^{\left\{O\right\}}\right)$ with respect to ${\overrightarrow{x}}^{\left\{O\right\}}$ to zero and using a relation ${M}_{i}^{2}={M}_{i}={M}_{i}^{T}$, an explicit solution for $\overrightarrow{r}$ is given through the process shown in Eqs. (7) and (8). The number of cameras $N$ should be at least 2 because a unique solution for ${\overrightarrow{r}}^{\left\{O\right\}}$ cannot be obtained otherwise, given the singular nature of ${M}_{i}$. This is consistent with the fact that in general, precise depth perception cannot be achieved through monocular vision.

$$0=\frac{\partial f}{\partial {\overrightarrow{x}}^{\left\{O\right\}}}{\left.\right|}_{\overrightarrow{x}=\overrightarrow{r}}=2\left(\sum _{i=1}^{N}{M}_{i}\right){\overrightarrow{r}}^{\left\{O\right\}}-2\left(\sum _{i=1}^{N}{M}_{i}{\overrightarrow{c}}_{i}^{\left\{O\right\}}\right)$$

(7)

$${\overrightarrow{r}}^{\left\{O\right\}}={\left(\sum _{i=1}^{N}{M}_{i}\right)}^{-1}\left(\sum _{i=1}^{N}{M}_{i}{\overrightarrow{c}}_{i}^{\left\{O\right\}}\right)$$

(8)

2.3 Robot control mechanism

2.3.1 Operation mode selection

Clutch and move are the two major operation modes for manipulators inspired by the control mechanism of conventional RAMIS platforms. Clutch mode allows the user to move the hands freely while the end-effector of the robot arm remains stationary. Move mode displaces the manipulators according to the user’s hand gestures which are tracked and scaled. In the proposed ULCI, the selection of manipulators and modes are controlled by the foot pedals shown in Fig. 3c. The default mode is PSM clutch. It is activated when none or both foot pedals are pressed. When the PSM pedal is pressed, PSM move mode is activated. The ECM pedal enables ECM control mode where ECM clutch mode and ECM move mode are determined from hand gestures as described below.

2.3.2 PSM control

Fig. 7 illustrates the coordinate frames involved in the ULCI. $\left\{O\right\}$, $\left\{P\right\}$, and $\left\{E\right\}$ in Fig. 7a are the reference frame, hand frame, and eye frame in the real world, respectively, while the coordinate frames shown in Fig. 7b belong to the virtual world. $\left\{{O}^{{\prime }}\right\}$, $\left\{{O}_{P}^{{\prime }}\right\}$, and $\left\{{O}_{E}^{{\prime }}\right\}$ are the reference frames of the virtual world, PSM, and ECM, respectively, while $\left\{{P}^{{\prime }}\right\}$ and $\left\{{E}^{{\prime }}\right\}$ are the end-effector coordinate frames of the PSM and ECM, respectively. In addition, $H$ and ${H}^{{\prime }}$ are the reference points for PSM control, which can be repositioned only by PSM clutch mode. During PSM clutch mode, H and H' are floating points tracking the 3D location of the user’s hand and the PSM end-effector. Then, at the instant PSM move mode is activated, H and H' are set fixed until the PSM move mode is returned to PSM clutch mode. $\left\{{O}_{P}^{{\prime }}\right\}$ and $\left\{{O}_{E}^{{\prime }}\right\}$ should be predefined alongside $\left\{{O}^{{\prime }}\right\}$ because the dVRK ROS package provides functions for setting goal transformation matrix from {O'_P} to {P'} and {O'_E} to {E'}, denoted as ${T}_{{O}_{P}^{{\prime }}{P}^{{\prime }}}$ and ${T}_{{O}_{E}^{{\prime }}{E}^{{\prime }}}$.

Before controlling the PSM, a relationship between the 3D hand landmarks and the end-effector of the PSM should be established. Most surgical procedures are conducted using the thumb and index finger. Hence, we mapped the second metacarpophalangeal (MCP) joint, M₂ for short, to the wrist-yaw joint axis of the PSM. Also, the tips of the thumb and the index finger, denoted as T₁ and T₂, were mapped to the respective tips of the end-effector as shown in Fig. 8. The jaw angle was determined as the angle between $\overrightarrow{M_2T_1}$ and $\overrightarrow{M_2T_2}$, and the orientation was assessed from M₁, M₂, T₁, T₂, and P₂, where M₁ and P₂ refer to the first metacarpophalangeal and second proximal interphalangeal joints, respectively. Unit vector $\widehat{z}$ of $\left\{P\right\}$ points from M₂ to the midpoint of T₁ and T₂, and $\widehat{x}$ of $\left\{P\right\}$ is calculated as the unit vector of the cross-product of $\overrightarrow{M_2M_1}$ and $\overrightarrow{M_2P_2}$. M₁ and P₂ were selected for the determination of $\widehat{x}$ because using $T_1$ and $T_2$ instead would lead to unstable solutions as the jaw angle approaches zero. Finally, $\widehat{y}$ of $\left\{P\right\}$ was determined by the cross-product of $\widehat{z}$ and $\widehat{x}$.

For the PSM to apparently replicate the user’s hand movements, two relations should be satisfied. First, the orientation of the PSM end-effector seen from the endoscope should be equal to the orientation of the hand seen from the eye. In other words, the rotation matrix from {E'} to {P'}, denoted as R_E'P', should be equal to the rotation matrix from {E} to {P}, denoted as R_EP, resulting in our first relation given as Eq. (9). Next, given the reference points of the hand and the PSM end-effector, each given as H and H', we can derive a relation to implement a scaled motion of the PSM end-effector according to the movement of the user’s hand. The displacement vector heading from H' to P' seen from the endoscope should be parallel to the displacement vector heading from H to P seen from the eye. After introducing a scaling factor α empirically set to 0.2 by default, the second relation is derived as Eq. (10).

$${R}_{EP}={R}_{{E}^{{\prime }}{P}^{{\prime }}}$$

(9)

$${\overrightarrow{{H}^{{\prime }}{P}^{{\prime }}}}^{\left\{{E}^{{\prime }}\right\}}=\alpha \times {\overrightarrow{HP}}^{\left\{E\right\}}$$

(10)

Using the relations above, ${T}_{{O}_{P}^{{\prime }}{P}^{{\prime }}}$ is determined uniquely as given in Eqs. (11) and (12). Transformation matrices ${T}_{OE}$ can be calculated through measuring the position and orientation of HMD. ${T}_{{O}^{{\prime }}{O}_{p}^{{\prime }}}$ and ${T}_{{O}^{{\prime }}{O}_{E}^{{\prime }}}$ are calculated from relative position and orientation of virtual PSM and ECM with respect to the reference coordinate frame of the virtual world. ${\overrightarrow{{O}^{{\prime }}{H}^{{\prime }}}}^{\left\{{O}^{{\prime }}\right\}}$ and ${T}_{{O}_{E}^{{\prime }}{E}^{{\prime }}}$ can be determined from the joint states of the PSM and ECM through forward kinematics. Similarly, ${T}_{OP}$ and ${\overrightarrow{HP}}^{\left\{O\right\}}$ are derived from the tracked 3D coordinates of hand landmarks.

$${R}_{{O}_{P}^{{\prime }} {P}^{{\prime }}}={R}_{{O}_{P}^{{\prime }}{O}^{{\prime }}}{R}_{{O}^{{\prime }}{E}^{{\prime }}}{R}_{{E}^{{\prime }}{P}^{{\prime }}}={R}_{{O}_{p}^{{\prime }}{O}^{{\prime }}}{R}_{{O}^{{\prime }}{E}^{{\prime }}}{R}_{EP}={R}_{{O}_{p}^{{\prime }}{O}^{{\prime }}}{R}_{{O}^{{\prime }}{E}^{{\prime }}}{R}_{EO}{R}_{OP}$$

(11)

$$\begin{aligned} \left[\begin{array}{c}{\overrightarrow{{O}_{P}^{{\prime }}{P}^{{\prime }}}}^{\left\{{O}_{P}^{{\prime }}\right\}}\\ 1\end{array}\right]&= {T}_{{O}_{P}^{{\prime }}{O}^{{\prime }}}\left[\begin{array}{c}{\overrightarrow{O{\prime }{P}^{{\prime }}}}^{\left\{{O}^{{\prime }}\right\}}\\ 1\end{array}\right]\\&= {T}_{{O}_{P}^{{\prime }}{O}^{{\prime }}}\left[\begin{array}{c}{\overrightarrow{{O}^{{\prime }}{H}^{{\prime }}}}^{\left\{{O}^{{\prime }}\right\}}+{\overrightarrow{{H}^{{\prime }}{P}^{{\prime }}}}^{\left\{{O}^{{\prime }}\right\}}\\ 1\end{array}\right] \\&= {T}_{{O}_{P}^{{\prime }}{O}^{{\prime }}}\left[\begin{array}{c}{\overrightarrow{{O}^{{\prime }}{H}^{{\prime }}}}^{\left\{{O}^{{\prime }}\right\}}+{R}_{{O}^{{\prime }}{E}^{{\prime }}}{\overrightarrow{{H}^{{\prime }}{P}^{{\prime }}}}^{\left\{{E}^{{\prime }}\right\}}\\ 1\end{array}\right]\\&= {T}_{{O}_{P}^{{\prime }}{O}^{{\prime }}}\left[\begin{array}{c}{\overrightarrow{{O}^{{\prime }}{H}^{{\prime }}}}^{\left\{{O}^{{\prime }}\right\}}+\alpha {R}_{{O}^{{\prime }}{E}^{{\prime }}}{\overrightarrow{HP}}^{\left\{E\right\}}\\ 1\end{array}\right]\\&= {T}_{{O}_{P}^{{\prime }}{O}^{{\prime }}}\left[\begin{array}{c}{\overrightarrow{{O}^{{\prime }}{H}^{{\prime }}}}^{\left\{{O}^{{\prime }}\right\}}+\alpha {R}_{{O}^{{\prime }}{E}^{{\prime }}}{R}_{EO}{\overrightarrow{HP}}^{\left\{O\right\}}\\ 1\end{array}\right]\end{aligned}$$

(12)

2.3.3 ECM control

During ECM control mode, a user can activate ECM move mode by simultaneously making a gripping gesture with both hands within a jaw angle threshold of 15°. At the instant the control mode is switched from ECM clutch mode to ECM move mode, reference point $H_E$ is set as the midpoint of second MCP joints from each hand. Then, as the user’s hand moves, ULCI tracks the 3D location of the midpoint and the slope of the line connecting the two MCP joints to calculate the displacement vector ${\overrightarrow{m}}^{\left\{E\right\}}$ and rotation angle $\varphi$, as illustrated in Fig. 9. The $x$-, $y$-, and $z$-axis values of ${\overrightarrow{m}}^{\left\{E\right\}}$ are scaled by factor $\beta$ (default value of 0.1) to control the yaw, insertion, and pitch of the ECM, respectively, and $\varphi$ controls the roll. Finally, as the user releases the gripping gesture, the control mode is returned to ECM clutch mode and $H$ is reset.

2.4 Usability study

2.4.1 Participants

Fifteen naïve participants with minimal experience in both HMD and RAMIS were enrolled in a usability study that was approved by the institutional review board of Seoul National University Hospital (approval number: H-2107-167-1236). The study consisted of three stages: modified peg transfer task (mPTT), line tracking task (LTT), and UX survey. All procedures involving PSM and ECM manipulation were conducted in a virtual environment constructed in Unity.

2.4.2 mPTT

Peg transfer task is a widely practiced task intended to evaluate and improve the ability to perform laparoscopic procedures (Peters et al. 2004; Ritter and Scott 2007; Sroka et al. 2010; Zendejas et al. 2016). During a peg transfer task, each participant is asked to grasp an object with one hand, pass it to the other hand in mid-air, place it in an empty peg on the opposite side, and repeat this process for six objects. In this study, we designed a virtual environment for the peg transfer task in Unity, as shown in Fig. 10a. Each participant was given 10 minutes to get familiar with the ULCI and 40 minutes to perform the task at their best effort. Originally, the task requires the participants to move 12 objects in 300 s, but considering their low familiarity with laparoscopic tools and procedures, we required them to move only 6 objects in 150 s (Badalato et al. 2014; Hong et al. 2019; Miura et al. 2019). Minimum task time during a testing period of 40 minutes was recorded per participant.

2.4.3 LTT

Line tracking task is used to assess a user’s proficiency in ECM control and we designed a virtual LTT environment as shown in Fig. 10b (Cagiltay et al. 2019; Jo et al. 2020).Participants were instructed to move a red sphere of radius 1.5 cm fixed at a point 5 cm distal to the end-effector of the ECM along a blue track consisting of six segments of 15 cm. Similarly with mPTT, participants were given 10 minutes to practice ECM control and were told to move the sphere along the track 10 times while trying their best to stay on path. Task time and mean deviation of the center of the red sphere from the track were measured.

2.4.4 UX survey

After mPTT and LTT, participants were asked to fill out a questionnaire based on their experience with the ULCI. We studied scores obtained from system usability scale (SUS), Van der Laan’s technology acceptance scale, and overall fatigue. SUS score is measured on a 100-point scale from answering 10 questions and reflects how easy the users can use the ULCI (Brooke 1996; Bangor et al. 2008; Lewis 2018). Van der Laan’s technology acceptance score is comprised of nine items that reflect usefulness and satisfaction scales with values ranging from − 2 to 2, where a higher score indicates a higher affinity to a proposed system (Laan et al. 1997). Then, the participants were asked to score their fatigue levels for neck, back, shoulder, elbow, wrist, fingers, and eyes as well as dizziness on a 10-point scale. Participants were instructed to answer the survey with the following standards. Scores less or equal to 3 indicate negligible discomfort and scores between 4 and 6 indicate perceivable yet tolerable discomfort, while scores greater or equal to 7 indicate intolerable fatigue. Finally, they were asked to write their subjective opinions about the ULCI in open-ended questions. The questionnaires and scoring guidelines are provided in the Appendix.

3 Results

3.1 Intrinsic ULCI characteristics

Accuracy of PSM control was qualitatively investigated from trajectories of the hands and PSMs recorded during an mPTT trial as shown in Fig. 11. Figure 11a, d show that the hand movements were replicated and scaled to the corresponding PSMs. Hand positions estimated by MediaPipe contained noise, but it was eliminated through the internal control mechanism available in the open-source dVRK ROS package, resulting in virtually no discomfort when manipulating the virtual robots.

End-to-end latency of the ULCI was determined from the sum of delays caused by the following processes: image processing through MediaPipe, robot control using dVRK-ROS, TCP communication, and rendering for robot visualization. The time consumed in image processing ranged from 3.5 ms to 12 ms, with its peak at 4.0 ms. The latency caused by robot control was measured by comparing the time difference between the hand and PSM coordinate values, which were scaled into the interval between 0 and 1, as shown in Fig. 11 and Fig. 12. This latency was measured to be 130 ± 10 ms and 230 ± 10 ms for the PSM and ECM, respectively. As we used wired communication, networking time was negligible and rendering time did not exceed 38 ms. Hence, the end-to-end latency of the PSM and ECM were measured to be 170 ± 10 ms and 270 ± 10 ms, respectively, as listed in Table 1.

Table 1 Latency values of ULCI for PSM and ECM are shown in milliseconds. Maximum latency was listed for the categories image processing time, rendering in Unity, and rendering to HMD. Otherwise, mean values were demonstrated

Full size table

3.2 Usability study

3.2.1 mPTT

Best task times for the mPTT during a testing period of 40 minutes per participant are listed in Table 2. Of the 15 participants, 13 (86%) achieved a task time below 150 s (P = .0015), with a mean task time of 95 ± 47 s.

Table 2 Minimum task times for modified peg transfer task (mPTT) are demonstrated

Full size table

3.2.2 LTT

Task times of LTT decreased after repetition, from 204 ± 91 s to 113 ± 30 s after 10 trials, while mean deviation of the red sphere from the track remained relatively constant, estimated to be 1.1 ± 0.5 cm as shown in Fig. 13a and Fig. 13b, respectively.

3.2.3 UX survey

As shown in Fig. 14a, the median SUS score was 75.0, indicating that most participants felt good about the proposed system (Bangor et al. 2009). In addition, both scales for Van der Laan’s technology acceptance score were distributed above zero (P < .001 for both scales), as shown in Fig. 14b.

Fatigue levels of different body parts were measured on a 10-point scale. Joint pain in the neck and shoulders was perceived yet tolerable while that in other joints was mostly imperceptible. The eye fatigue and level of dizziness were comparatively high but tolerable throughout the study.

4 Discussion

4.1 Feasibility study

We examined accuracy and latency to assess the feasibility of the proposed ULCI. These factors substantially affect the surgeon’s performance and patient’s safety (Anvari et al. 2005; Lum et al. 2009; Xu et al. 2014). Toward the end of each task, the participants were able to easily manipulate the virtual robots as intended. Latency requires thorough investigation because it is associated with various performance indicators in telesurgeries, such as task time and error rate (Anvari et al. 2005; Lum et al. 2009; Xu et al. 2014). The end-to-end latency measured for the PSM and ECM was around 200 ms, which is suitable to perform telesurgery (Xu et al. 2014). Additional latency caused by image processing with MediaPipe was minimal compared with the total delay, although it may vary according to the GPU specifications. None of the participants reported discomfort from inaccuracy or time delay, suggesting a successful development of a real-time robot manipulation interface based on contactless hand tracking.

Results from the mPTT trials show that a large proportion of the naïve participants could reach a qualifiable level of PSM manipulation after 50 minutes of practice (Peters et al. 2004; Ritter and Scott 2007). LTT results in Fig. 13 show that mean deviation from the track remained similar throughout a series of trials, while the mean task time decreased. Hence, the participants were able to learn from their experience to achieve faster execution with a similar amount of effort. In addition, SUS scores distributed from 60 to 80 (Fig. 14a) indicate that the participants easily familiarized themselves with the ULCI. Moreover, Van der Laan’s technology acceptance scores above zero indicate that most of the participants felt positive about the proposed ULCI as an emerging technology.

4.2 Ergonomic evaluation

Ergonomic evaluation is crucial to development of surgical robot control systems. According to Table 3, fatigue levels of the elbow, wrist, and finger joints were minimal, implying that ULCI may resolve existing issues of contemporary RAMIS systems related to pain in the distal upper extremities, though a comparative study should be conducted in future work. On the other hand, most users reported shoulder discomfort after completing the tasks. This originated from prolonged shoulder abduction illustrated in Fig. 15a, where the deltoid muscle was strained from holding the arms up most of the task time. This problem can be addressed by adding arm rests, which have been proven to be effective for the ergonomics of current RAMIS platforms (Lee et al. 2017). When adding arm rests to the ULCI, we suggest them to be long and wide so that users can freely slide their elbows on the arm rests, thus minimizing the fatigue of shoulder joints while maintaining the degree of freedom of the arms. In addition, arm rest heights should be adjustable to allow the most optimal use for each surgeon.

Table 3 Fatigue levels caused from modified peg transfer task (mPTT) and line tracking task (LTT) are listed for each body part. Fatigue levels less or equal to 3 imply virtually imperceivable, 4 through 6 mean perceivable yet tolerable, and greater or equal to 7 imply intolerable discomfort

Full size table

Although preliminary studies have shown that an HMD is more ergonomic than a stereo viewer in terms of posture and joint fatigue, the participants in this study reported a moderate level of fatigue in the neck. Most participants pointed out the weight of the HMD (550 g) as a major cause of neck pain. Fig. 15b shows that anterior position of the center of mass of HMD causes a considerable amount of torque to the cervical spine, resulting in a continuous recruitment of posterior neck muscles to compensate for the weight of HMD (Knight and Baber 2004, 2007; Chihara and Seo 2018; Ito et al. 2021). Toward the end of the usability study, participants tended to extend their necks aiming to reduce the torque exerted by the HMD. This behavior was consistent with the findings reported in (Knight and Baber 2007). Neck fatigue can be alleviated by either reducing the weight of HMD through further development of lightweight hardware or adding a counterweight to the back so that the center of mass moves posteriorly (Chihara and Seo 2018; Ito et al. 2021). In fact, the HMD used in our study weighs approximately 550 grams and most of the mass was focused on the display at the front. Newly released HMD are much lighter and according to our market analysis, the weight of HTC VIVE Flow is 189 grams (HTC VIVE 2022). In addition, many contemporary products such as Microsoft HoloLens 2 or HTC VIVE XR Elite use battery as a counterweight to the display, and they tend to cause less neck pain (Palumbo 2022). Therefore, we believe that reduced fatigue to the cervical spine can be achieved even with the implementation of devices available today.

Dizziness is another aspect to consider when using an HMD. Dizziness levels reported from the usability study were relatively high compared with other fatigue levels such as joint discomfort. Based on subjective opinions of participants, motion sickness may have been induced because of the discrepancy between the head rotation and movement of the field of vision (Munafo et al. 2017). In this study, foot pedals were used to switch modes between the PSM and ECM and suitably reproduce the control environment of commercial RAMIS platforms. This method may have aggravated motion sickness because the field of view cannot be changed intuitively through head motion but instead altered via predefined hand gestures. Dizziness caused by HMD is expected to decrease with improved hardware and it can be further reduced by implementing a more user-centered robot control mechanism such as using a gyroscope sensor to detect head orientation for mapping to the ECM (Jo et al. 2020). In addition, aiding hand-eye coordination by displaying user’s hands on the HMD would contribute to reducing dizziness. Implementing smooth movement of the ECM and modulating its speed and acceleration is also suggested (Lareyre et al. 2021; Sun et al, 2019).

4.3 Hardware cost and volume

In this study, a robot control interface based on the MediaPipe framework with tolerable accuracy and latency was developed using commercial devices such as USB cameras and an HMD. The total budget for developing the full system was around USD 10,000. Input system for contemporary RAMIS systems contain delicate input devices such as MTMs which is a wire driven robot manipulator with eight degrees of freedom. They are bulky and prone to collisions, making its maintenance challenging. However, since the proposed interface implements a computer vision driven input mechanism, it can be stated that ULCI is more compact and volume-efficient. In addition, hardware components of the ULCI are portable, meaning that a surgeon console need not be confined to an operating room, enabling an efficient use of the operating room. Although further studies should be conducted, ULCI may become a promising alternative to the surgeon console used today, hopefully contributing to the improvement of space utilization and reduction of medical expenses associated with robotic surgery, as well as providing comfort to the surgeon (Barbash and Glied 2010; Turchetti et al. 2012).

4.4 Contribution to Extended Reality (XR) Community

As mentioned previously, hand gesture recognition is an emerging input modality for users to actively interact with extended reality (XR) environments. Prior to development of computer vision based hand tracking modules, various mechanisms for 3D hand gesture recognition methods were developed such as infrared based depth perception modules and motion capture gloves (Yasen and Jusoh 2019; Suarez and Murphy 2012; Moreira et al. 2014). Recently, computer vision based hand tracking technology is actively studied due to the wide availability to RGB cameras and source codes. In this paper, we explicitly detailed the underlying mechanism of 3D hand tracking based on a set of RGB images and established a proof-of-concept of concept through UX studies. Alongside the wide accessibility to RGB cameras and computer vision algorithms, we expect our work to be actively studied and further developed among scholars interested in developing interactive XR environments not only confined to RAMIS systems (Tavakoli et al. 2006; Martin-Barrio et al. 2020; Hong et al. 2021).

In addition, our work carries significance in a way that we have conducted extensive UX studies based on widely accepted surveys and questionnaires such as SUS scores and Van der Laan’s technology acceptance scores. Also, ergonomic evaluation was also thoroughly carried out by quantitatively assessing discomfort in various body parts. Therefore, we hope that our work would provide a new standard to measure the properties of input modalities for XR environments in a UX perspective.

5 Conclusion

ULCI is a surgical robot control interface based on real-time contactless hand tracking. We devised the interface to overcome issues of existing RAMIS platforms regarding ergonomics. We investigated the mathematical background of the interface and performed feasibility and UX studies as proof of concept. Accuracy and latency of the ULCI did not significantly affect participants’ performance in conducting various tasks. SUS and Van der Laan’s technology acceptance scores showed that naïve users were able to easily learn and improve their task performance in a short period of 100 minutes, suggesting that the ULCI is intuitive and user-friendly. In addition, fatigue levels in the distal upper extremities experienced by the participants were low, indicating that ULCI may contribute to surgical environments with improved ergonomics. Nevertheless, the interface should be further improved to alleviate joint pain in the neck and shoulders. Consisting of portable devices such as an HMD and USB cameras, the proposed system can reduce the overall hardware volume of conventional RAMIS platforms and increase space utilization efficiency.

In conclusion, we have successfully developed a surgical robot control system using MediaPipe framework and an HMD and confirmed its feasibility by investigating its intrinsic features and UX including ergonomic evaluation with a group of widely accepted surveys and questionnaires. The proposed system allows dexterous robot manipulation without constraining the hands to delicate devices such as MTMs, resulting in minimal strain in the upper extremities. As real-time 3D hand tracking technology is not confined to surgical robot control, it can be applied to any field requiring delicate hand gestures as input including XR environments or remote controlled robotic systems, hopefully contributing to improved human–machine interaction.

Data availability

The datasets generated during and analyzed during the current study are available from the corresponding author on reasonable request.

References

Amprimo G, Masi G, Pettiti G, Olmo G, Priano L, Ferraris C (2023). Hand tracking for clinical applications: validation of the Google MediaPipe Hand (GMH) and the depth-enhanced GMH-D frameworks. arXiv preprint arXiv:2308.01088. https://doi.org/10.48550/arXiv.2308.01088
Amrani MZ, Borst CW, Achour N (2022). Multi-sensory assessment for hand pattern recognition. Biomed Signal Process Control 72:103368. https://doi.org/10.1016/j.bspc.2021.103368
Article Google Scholar
Anvari M, Broderick T, Stein H, Chapman T, Ghodoussi M, Birch DW, Mckinley C, Trudeau P, Dutta S, Goldsmith CH (2005) The impact of latency on surgical precision and task completion during robotic-assisted remote telepresence surgery. Comput Aided Surg 10(2):93–99. https://doi.org/10.3109/10929080500228654
Article Google Scholar
Badalato GM, Shapiro E, Rothberg MB, Bergman A, RoyChoudhury A, Korets R, Patel T, Badani KK (2014) The Da Vinci robot system eliminates multispecialty surgical trainees’ hand dominance in open and robotic surgical settings. JSLS 18(3):e2014. https://doi.org/10.4293/JSLS.2014.00399
Article Google Scholar
Bangor A, Kortum P, Miller J (2009) Determining what individual SUS scores mean: adding an adjective rating scale. J Usability Stud 4(3):114–123. https://doi.org/10.5555/2835587.2835589
Article Google Scholar
Bangor A, Kortum PT, Miller JT (2008) An empirical evaluation of the system usability scale. Int J Hum–Comput Interact 24(6):574–594. https://doi.org/10.1080/10447310802205776
Article Google Scholar
Barbash GI, Glied SA (2010) New technology and health care costs–the case of robot-assisted surgery. N Engl J Med 363(8):701–704. https://doi.org/10.1056/NEJMp1006602
Article Google Scholar
Berggren U, Gordh T, Grama D, Haglund U, Rastad J, Arvidsson D (1994) Laparoscopic versus open cholecystectomy: hospitalization, sick leave, analgesia and trauma responses. J Br Surg 81(9):1362–1365. https://doi.org/10.1002/bjs.1800810936
Article Google Scholar
Brooke J (1996) SUS-A quick and dirty usability scale. Usability Eval Ind 189(194):4–7
Google Scholar
Buckingham G (2021). Hand tracking for immersive virtual reality: opportunities and challenges. Front Virtual Real 2:728461. https://doi.org/10.3389/frvir.2021.728461
Cagiltay NE, Ozcelik E, Berker M, Dalveren GGM (2019) The underlying reasons of the navigation control effect on performance in a virtual reality endoscopic surgery training simulator. Int J Hum–Comput Interact 35(15):1396–1403. https://doi.org/10.1080/10447318.2018.1533151
Article Google Scholar
Chen Z, Deguet A, Taylor RH, Kazanzides P (2017) Software architecture of the Da Vinci Research Kit. 2017 First IEEE Int Conf Robot Comput 180–187. https://doi.org/10.1109/IRC.2017.69
Chihara T, Seo A (2018) Evaluation of physical workload affected by mass and center of mass of head-mounted display. Appl Ergon 68:204–212. https://doi.org/10.1016/j.apergo.2017.11.016
Article Google Scholar
Codd-Downey R, Forooshani PM, Speers A, Wang H, Jenkin M (2014) From ROS to unity: leveraging robot and virtual environment middleware for immersive teleoperation. 2014 IEEE Int Conf Inf Autom 932–936. https://doi.org/10.1109/ICInfA.2014.6932785
Crippa J, Grass F, Dozois EJ, Mathis KL, Merchea A, Colibaseanu DT, Kelley SR, Larson DW (2021) Robotic surgery for rectal cancer provides advantageous outcomes over laparoscopic approach: results from a large retrospective cohort. Ann Surg 274(6):e1218–e1222. https://doi.org/10.1097/SLA.0000000000003805
Article Google Scholar
Danioni A, Yavuz GC, Ozan DE, Momi ED, Koupparis A, Dogramadzi S (2022) A study on the dexterity of surgical robotic tools in a highly immersive virtual environment: assessing usability and efficacy. IEEE Robot Autom Mag 29(1):68–75. https://doi.org/10.1109/MRA.2022.3141972
Article Google Scholar
Dardona T, Eslamian S, Reisner LA, Pandya A (2019) Remote presence: development and usability evaluation of a head-mounted display for camera control on the Da Vinci surgical system. Robotics 8(2):31. https://doi.org/10.3390/robotics8020031
Article Google Scholar
DiMaio S, Hanuschik M, Kreaden U (2011) The Da Vinci surgical system. Surgical robotics: systems applications and visions. Springer US, USA
Google Scholar
Douissard J, Hagen ME, Morel P (2019) The Da Vinci surgical system. Bariatric robotic surgery: a comprehensive guide. Springer, Germany
Google Scholar
Ficarra V, Cavalleri S, Novara G, Aragona M, Artibani W (2007) Evidence from robot-assisted laparoscopic radical prostatectomy: a systematic review. Eur Urol 51(1):45–55. https://doi.org/10.1016/j.eururo.2006.06.017
Article Google Scholar
Fuchs KH (2002) Minimally invasive surgery. Endoscopy 34(2):154–159. https://doi.org/10.1055/s-2002-19857
Article Google Scholar
Gala RB, Margulies R, Steinberg A, Murphy M, Lukban J, Jeppson P, Aschkenazi S, Olivera C, South M, Lowenstein L, Schaffer J, Balk EM, Sung V (2014) Systematic review of robotic surgery in gynecology: robotic techniques compared with laparoscopy and laparotomy. J Minim Invasive Gynecol 21(3):353–361. https://doi.org/10.1016/j.jmig.2013.11.010
Article Google Scholar
Gemmill EH, McCulloch P (2007) Systematic review of minimally invasive resection for gastro-oesophageal cancer. J Br Surg 94(12):1461–1467. https://doi.org/10.1002/bjs.6015
Article Google Scholar
Han S, Liu B, Cabezas R, Twigg CD, Zhang P, Petkau J., Yu TH, Tai CJ, Akbay M, Wang Z, Nitzan A, Dong G, Ye Y, Tao L, Wan C, and Wang, R. (2020). MEgATrack: monochrome egocentric articulated hand-tracking for virtual reality. ACM Transactions on Graphics (ToG), 39(4), 87-1. https://doi.org/10.1145/3386569.3392452
Hong M, Rozenblit JW, and Hamilton AJ (2021). Simulation-based surgical training systems in laparoscopic surgery: a current review. Virtual Reality, 25, 491-510. https://doi.org/10.1007/s10055-020-00469-z
Hong N, Kim M, Lee C, Kim S (2019) Head-mounted interface for intuitive vision control and continuous surgical operation in a surgical robot system. Med Biol Eng Comput 57(3):601–614. https://doi.org/10.1007/s11517-018-1902-4
Article Google Scholar
HTC VIVE, https://www.vive.com/us/product/vive-flow/overview/ (2022).
Hullenaar CDPV, Hermans B, Broeders IAMJ (2017) Ergonomic assessment of the Da Vinci console in robot-assisted surgery. Innov Surg Sci 2(2):97–104. https://doi.org/10.1515/iss-2017-0007
Article Google Scholar
Hussein A, García F, Olaverri-Monreal C (2018) ROS and Unity based framework for intelligent vehicles control and simulation. 2018 IEEE Int Conf Veh Electron Saf 1–6. https://doi.org/10.1109/ICVES.2018.8519522
Ito K, Tada M, Ujike H, and Hyodo K (2021). Effects of the weight and balance of head-mounted displays on physical load. Applied Sciences, 11(15), 6802. https://doi.org/10.3390/app11156802
Jo Y, Kim YJ, Cho M, Lee C, Kim M, Moon HM, Kim S (2020) Virtual reality-based control of robotic endoscope in laparoscopic surgery. Int J Control Autom Syst 18(1):150–162. https://doi.org/10.1007/s12555-019-0244-9
Article Google Scholar
Kavana KM, Suma NR (2022). Recognization of Hand Gestures Using Mediapipe Hands. Int Res J Moderniz Eng Tech Sci 4(06).
Kazanzides P, Chen Z, Deguet A, Fischer GS, Taylor RH, DiMaio SP (2014) An open-source research kit for the da Vinci® Surgical System. 2014 IEEE Int Conf Robot Autom 6434–6439. https://doi.org/10.1109/ICRA.2014.6907809
Kim YG, Mun G, Kim M, Jeon B, Lee JH, Yoon D, Kim BS, Kong SH, Jeong CW, Lee KE, Cho M, Kim S (2022) A study on the VR Goggle-based vision system for robotic surgery. Int J Control Autom Syst 20(9):2959–2971. https://doi.org/10.1007/s12555-021-1044-6
Article Google Scholar
Knight JF, Baber C (2004) Neck muscle activity and perceived pain and discomfort due to variations of head load and posture. Aviat Space Environ Med 75(2):123–131
Google Scholar
Knight JF, Baber C (2007) Effect of head-mounted displays on posture. Hum Factors 49(5):797–807. https://doi.org/10.1518/001872007X230172
Article Google Scholar
Koehn JK, Kuchenbecker KJ (2015) Surgeons and non-surgeons prefer haptic feedback of instrument vibrations during robotic surgery. Surg Endosc 29(10):2970–2983. https://doi.org/10.1007/s00464-014-4030-8
Article Google Scholar
Laan JDVD, Heino A, Waard DD (1997) A simple procedure for the assessment of acceptance of advanced transport telematics. Transp Res Part C Emergng Technol 5(1):1–10. https://doi.org/10.1016/S0968-090X(96)00025-3
Article Google Scholar
Lanfranco AR, Castellanos AE, Desai JP, Meyers WC (2004) Robotic surgery: a current perspective. Ann Surg 239(1):14–21. https://doi.org/10.1097/01.sla.0000103020.19595.7d
Article Google Scholar
Lareyre F, Chaudhuri A, Adam C, Carrier M, Mialhe C, Raffort J (2021). Applications of head-mounted displays and smart glasses in vascular surgery. Ann Vasc Surg 75:497-512. https://doi.org/10.1016/j.avsg.2021.02.033
Article Google Scholar
Lawson EH, Curet MJ, Sanchez BR, Schuster R, Berguer R (2007) Postural ergonomics during robotic and laparoscopic gastric bypass surgery: a pilot project. J Robot Surg 1(1):61–67. https://doi.org/10.1007/s11701-007-0016-z
Article Google Scholar
Lee GI, Lee MR, Green I, Allaf M, Marohn MR (2017) Surgeons’ physical discomfort and symptoms during robotic surgery: a comprehensive ergonomic survey study. Surg Endosc 31(4):1697–1706. https://doi.org/10.1007/s00464-016-5160-y
Article Google Scholar
Lewis JR (2018) The system usability scale: past, present, and future. Int J Hum–Comput Interact 34(7):577–590. https://doi.org/10.1080/10447318.2018.1455307
Article Google Scholar
Lugaresi C, Tang J, Nash H, McClanahan C, Uboweja E, Hays M, Zhang F, Chang CL, Yong MG, Lee J, Chang WT, Hua W, Georg M, Grundmann M (2019) MediaPipe: a framework for building perception pipelines. arXiv preprint arXiv:1906.08172. https://doi.org/10.48550/arXiv.1906.08172
Lum MJH, Rosen J, King H, Friedman DCW, Lendvay TS, Wright AS, Sinanan MN, Hannaford B (2009) Teleoperation in surgical robotics–network latency effects on surgical performance. 2009 Annu Int Conf IEEE Eng Med Biol Soc 6860–6863. https://doi.org/10.1109/IEMBS.2009.5333120
Lux MM, Marshall M, Erturk E, Joseph JV (2010) Ergonomic evaluation and guidelines for use of the daVinci Robot system. J Endourol 24(3):371–375. https://doi.org/10.1089/end.2009.0197
Article Google Scholar
Lynch KM, Park FC (2017) Modern Robotics. Cambridge University Press, Cambridge
Book Google Scholar
Mack MJ (2001) Minimally invasive and robotic surgery. JAMA 285(5):568–572. https://doi.org/10.1001/jama.285.5.568
Article Google Scholar
Martín-Barrio A, Roldán JJ, Terrile S, Del Cerro J, and Barrientos A (2020). Application of immersive technologies and natural language to hyper-redundant robot teleoperation. Virtual Reality, 24, 541-555. https://doi.org/10.1007/s10055-019-00414-9
Martínez-Pérez A, Carra MC, Brunetti F, De’Angelis N (2017) Pathologic outcomes of laparoscopic vs open mesorectal excision for rectal cancer: a systematic review and meta-analysis. JAMA Surg 152(4):e165665. https://doi.org/10.1001/jamasurg.2016.5665
Article Google Scholar
Meijden OAJVD, Schijven MP (2009) The value of haptic feedback in conventional and robot-assisted minimal invasive surgery and virtual reality training: a current review. Surg Endosc 23(6):1180–1190. https://doi.org/10.1007/s00464-008-0298-x
Article Google Scholar
Mills JT, Burris MB, Warburton DJ, Conaway MR, Schenkman NS, Krupski TL (2013) Positioning injuries associated with robotic assisted urological surgery. J Urol 190(2):580–584. https://doi.org/10.1016/j.juro.2013.02.3185
Article Google Scholar
Miura S, Oshikiri T, Miura Y, Takiguchi G, Takase N, Hasegawa H, Yamamoto M, Kanaji S, Matsuda Y, Yamashita K, Matsuda T, Nakamura T, Suzuki S, Kakeji Y (2019) Optimal monitor positioning and camera rotation angle for mirror image: overcoming reverse alignment during laparoscopic colorectal surgery. Sci Rep 9(1):8371. https://doi.org/10.1038/s41598-019-44939-0
Article Google Scholar
Moreira AH, Queirós S, Fonseca J, Rodrigues PL, Rodrigues NF, Vilaça JL (2014, May). Real-time hand tracking for rehabilitation and character animation. In 2014 IEEE SeGAH:1-8. https://doi.org/10.1109/SeGAH.2014.7067086
Munafo J, Diedrick M, Stoffregen TA (2017) The virtual reality head-mounted display Oculus Rift induces motion sickness and is sexist in its effects. Exp Brain Res 235(3):889–901. https://doi.org/10.1007/s00221-016-4846-7
Article Google Scholar
Nezhat F (2008) Minimally invasive surgery in gynecologic oncology: laparoscopy versus robotics. Gynecol Oncol 111(2):S29–S32. https://doi.org/10.1016/j.ygyno.2008.07.025
Article Google Scholar
Nikitin A, Reshetnikova N, Sitnikov I, Karelova O (2020). VR Training for Railway Wagons Maintenance: architecture and implementation. Procedia Comput Sci 176:622-631. https://doi.org/10.1016/j.procs.2020.08.064
Article Google Scholar
Nouralizadeh A, Tabatabaei S, Basiri A, Simforoosh N, Soleimani M, Javanmard B, Ansari A, Shemshaki H (2018) Comparison of open versus laparoscopic versus hand-assisted laparoscopic nephroureterectomy: a systematic review and meta-analysis. J Laparoendosc Adv Surg Tech 28(6):656–681. https://doi.org/10.1089/lap.2017.0662
Article Google Scholar
Palumbo A. (2022). Microsoft HoloLens 2 in medical and healthcare context: State of the art and future prospects. Sens 22(20):7709. https://doi.org/10.3390/s22207709
Peters JH, Fried GM, Swanstrom LL, Soper NJ, Sillin LF, Schirmer B, Hoffman K, SAGES FLS Committee (2004) Development and validation of a comprehensive program of education and assessment of the basic fundamentals of laparoscopic surgery. Surgery 135(1):21–27. https://doi.org/10.1016/S0039-6060(03)00156-9
Article Google Scholar
Qian L, Deguet A, Kazanzides P (2018) ARssist: augmented reality on a head-mounted display for the first assistant in robotic surgery. Healthc Technol Lett 5(5):194–200. https://doi.org/10.1049/htl.2018.5065
Article Google Scholar
Qian L, Deguet A, Kazanzides P (2019) dVRK-XR: mixed reality extension for Da Vinci Research Kit. https://doi.org/10.31256/HSMR2019.47. Hamlyn Symp Med Robot
Ritter EM, Scott DJ (2007) Design of a proficiency-based skills training curriculum for the fundamentals of laparoscopic surgery. Surg Innov 14(2):107–112. https://doi.org/10.1177/1553350607302329
Article Google Scholar
Sroka G, Feldman LS, Vassiliou MC, Kaneva PA, Fayez R, Fried GM (2010) Fundamentals of laparoscopic surgery simulator training to proficiency improves laparoscopic performance in the operating room—a randomized controlled trial. Am J Surg 199(1):115–120. https://doi.org/10.1016/j.amjsurg.2009.07.035
Article Google Scholar
Suarez J, Murphy RR (2012, September). Hand gesture recognition with depth images: A review. In 2012 IEEE RO-MAN:411-417. https://doi.org/10.1109/ROMAN.2012.6343787
Sun Y, Pan B, Fu Y, Niu G (2019, December). Development of a Novel Hand-eye Coordination Algorithm for Robot Assisted Minimally Invasive Surgery. In 2019 IEEE ROBIO:1234-1239. https://doi.org/10.1109/ROBIO49542.2019.8961587
Tavakoli M, Patel RV, and Moallem M (2006). A haptic interface for computer-integrated endoscopic surgery and training. Virtual Reality, 9, 160-176. https://doi.org/10.1007/s10055-005-0017-z
Townsend CM, Beauchamp RD, Evers BM, Mattox KL (2016) Sabiston Textbook of Surgery. Elsevier Health Sciences, Amsterdam
Google Scholar
Turchetti G, Palla I, Pierotti F, Cuschieri A (2012) Economic evaluation of Da Vinci-assisted robotic surgery: a systematic review. Surg Endosc 26(3):598–606. https://doi.org/10.1007/s00464-011-1936-2
Article Google Scholar
Wee IJY, Kuo LJ, Ngu JCY (2020) A systematic review of the true benefit of robotic surgery: ergonomics. Int J Med Robot Comput Assist Surg 16(4):e2113. https://doi.org/10.1002/rcs.2113
Article Google Scholar
Xu S, Perez M, Yang K, Perrenot C, Felblinger J, Hubert J (2014) Determination of the latency effects on surgical performance and the acceptable latency levels in telesurgery using the dV-Trainer® simulator. Surg Endosc 28(9):2569–2576. https://doi.org/10.1007/s00464-014-3504-z
Article Google Scholar
Yasen M, Jusoh S (2019). A systematic review on hand gesture recognition techniques, challenges and applications. PeerJ Comput Sci 5:e218. https://doi.org/10.7717/peerj-cs.218
Article Google Scholar
Zendejas B, Ruparel RK, Cook DA (2016) Validity evidence for the fundamentals of laparoscopic surgery (FLS) program as an assessment tool: a systematic review. Surg Endosc 30(2):512–520. https://doi.org/10.1007/s00464-015-4233-7
Article Google Scholar
Zhang F, Bazarevsky V, Vakunov A, Tkachenka A, Sung G, Chang CL, Grundmann M (2020) MediaPipe hands: on-device real-time hand tracking. arXiv Preprint. https://doi.org/10.48550/arXiv.2006.10214. arXiv:2006.10214
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Korea Medical Device Development Fund grant funded by the Korean government (the Ministry of Science and ICT, the Ministry of Trade, Industry and Energy, the Ministry of Health & Welfare, the Ministry of Food and Drug Safety) (Project Number: 1711174462, RS-2020-KD000123), the Interdisciplinary Research Initiatives Program from College of Engineering and College of Medicine, Seoul National University (grant no. 800-20230385), and grant No. 1920230020 from the SNUH Research Fund.

Author information

Wounsuk Rhee and Young Gyun Kim are co-first authors.

Authors and Affiliations

Seoul National University College of Medicine, 103 Daehak-ro, Jongno-gu, Seoul, Republic of Korea
Wounsuk Rhee
Interdisciplinary Program in Bioengineering, Graduate School, Seoul National University, 1 Gwanak-ro, Gwanak- gu, Seoul, Republic of Korea
Young Gyun Kim, Jong Hyeon Lee, Jae Woo Shim, Byeong Soo Kim & Dan Yoon
Department of Transdisciplinary Medicine, Seoul National University Hospital, 101 Daehak-ro, Jongno-gu, Seoul, Republic of Korea
Minwoo Cho
Department of Biomedical Engineering, Seoul National University College of Medicine, 103 Daehak-ro, Jongno- gu, Seoul, Republic of Korea
Sungwan Kim
Institute of Bioengineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, Republic of Korea
Sungwan Kim

Authors

Wounsuk Rhee
View author publications
You can also search for this author in PubMed Google Scholar
Young Gyun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jong Hyeon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jae Woo Shim
View author publications
You can also search for this author in PubMed Google Scholar
Byeong Soo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Dan Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Minwoo Cho
View author publications
You can also search for this author in PubMed Google Scholar
Sungwan Kim
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: Wounsuk Rhee, Young Gyun Kim, Jong Hyeon Lee, Jae Woo Shim, Byeong Soo Kim, Dan Yoon, Minwoo Cho, and Sungwan Kim. Methodology: Wounsuk Rhee, Young Gyun Kim, Jong Hyeon Lee, and Jae Woo Shim. Investigation: Wounsuk Rhee, Young Gyun Kim, Jong Hyeon Lee, Jae Woo Shim, Byeong Soo Kim, and Dan Yoon. Software: Wounsuk Rhee. Evaluation: Wounsuk Rhee and Young Gyun Kim. Formal analysis: Wounsuk Rhee, Young Gyun Kim, Jong Hyeon Lee, and Jae Woo Shim. Visualization: Wounsuk Rhee. Writing – original draft: Wounsuk Rhee and Young Gyun Kim. Writing – review & editing: Wounsuk Rhee, Young Gyun Kim, Jong Hyeon Lee, Jae Woo Shim, Byeong Soo Kim, Dan Yoon, Minwoo Cho, and Sungwan Kim. Funding acquisition: Sungwan Kim. Supervision: Minwoo Cho and Sungwan Kim.

Corresponding authors

Correspondence to Minwoo Cho or Sungwan Kim.

Ethics declarations

Ethical approval

Approval of all ethical and experimental procedures and protocols was granted by the institutional review board (IRB) in Seoul National University Hospital (IRB No: H-2107-167-1236). Written consent was obtained from all participants.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Appendix

Usability questionnaire

The usability questionnaire contains the 10 questions listed in Table 4. A single value is calculated from the results by the contribution of each question according to the system usability scale. The preference scores for questions 1, 3, 5, 7, and 9 are subtracted by 1, and those for questions 2, 4, 6, 8, and 10 are subtracted from 5. Then, they are summed and multiplied by 2.5 to obtain an overall score on a 100-point scale. Higher scores indicate that the system is more user-friendly.

Table 4 Questionnaire shown below was utilized to assess system usability scale (SUS) score in our study.

Full size table

Van Der Laan’s technology acceptance scoring

Van der Laan’s technology acceptance score indicates the users’ willingness to use a given technology. It comprises the nine items listed in Table 5. The score per item ranges from − 2 to 2, where a higher score reflects a more positive response. The questionnaire consists of scales related to usability (items 1, 3, 5, 7, and 9) and satisfaction (items 2, 4, 6, and 8).

Table 5 Van der Laan’s technology acceptance score was assessed using the following questionnaire.

Full size table

Ergonomics survey

Ergonomics is evaluated separately for the modified peg transfer and line tracking tasks according to the items listed in Table 6. We mainly focus on joint fatigue in the spine and upper extremities as well as eye fatigue and dizziness. All the scores are recorded on a 0–10 scale. Higher scores indicate more fatigue.

Table 6 Joint discomfort and dizziness was recorded on a 0-10 scale using the questionnaire below.

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Rhee, W., Kim, Y.G., Lee, J.H. et al. Unconstrained lightweight control interface for robot-assisted minimally invasive surgery using MediaPipe framework and head-mounted display. Virtual Reality 28, 114 (2024). https://doi.org/10.1007/s10055-024-00986-1

Download citation

Received: 07 February 2023
Accepted: 17 March 2024
Published: 23 May 2024
DOI: https://doi.org/10.1007/s10055-024-00986-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Unconstrained lightweight control interface for robot-assisted minimally invasive surgery using MediaPipe framework and head-mounted display

Abstract

Similar content being viewed by others

Computer vision-based hand gesture recognition for human-robot interaction: a review

Human-Robot Interaction in Rehabilitation and Assistance: a Review

Actuation Technologies for Soft Robot Grippers and Manipulators: A Review

1 Introduction

2 Materials and methods

2.1 Components of the unconstrained lightweight control interface

2.1.1 Overview

2.1.2 Input unit

2.1.3 Robot control unit

2.1.4 Visualization unit

2.2 Hand tracking in 3D space

2.2.1 Naming conventions for vectors and coordinate systems

2.2.2 Localization of hand landmarks

2.3 Robot control mechanism

2.3.1 Operation mode selection

2.3.2 PSM control

2.3.3 ECM control

2.4 Usability study

2.4.1 Participants

2.4.2 mPTT

2.4.3 LTT

2.4.4 UX survey

3 Results

3.1 Intrinsic ULCI characteristics

3.2 Usability study

3.2.1 mPTT

3.2.2 LTT

3.2.3 UX survey

4 Discussion

4.1 Feasibility study

4.2 Ergonomic evaluation

4.3 Hardware cost and volume

4.4 Contribution to Extended Reality (XR) Community

5 Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethical approval

Conflict of interest

Additional information

Publisher’s Note

Electronic supplementary material

Supplementary Material 1

Appendix

Appendix

Usability questionnaire

Van Der Laan’s technology acceptance scoring

Ergonomics survey

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation