Camera scopes provide surgeons with extensive visualization of internal organs during minimally invasive surgeries. Traditionally, the operating surgeon relies on human assistance to move the camera for optimal views. The human assistant is required to hold the scope in a stable manner so there are no shaky views of the operating field. Long operating times lead to interrupted visualization due to fatigue, tremors, miscommunication, and increased need for cleaning when the lens accidentally touches nearby organs. Poor maneuvering of camera scopes by human assistance can complicate procedures [1].

Camera assistant roles are often assigned to junior surgical residents. Handling the scope requires complex psychomotor skills such as visual-spatial processing, hand–eye coordination, and knowledge of the surgical procedure. Camera navigation skills, such as target centering and smooth movements, are assessed using structured tools or simulators that are designed to differentiate between experienced and inexperienced assistants [2]. The type of skills required vary with the procedure. For example, assistants require more advanced navigation skills for colorectal resections, than for cholecystectomies. As surgeons are fully dependent on camera views during laparoscopic surgeries, any unstable views, smudges on the lens, or collisions with instruments caused by the human assistant can prolong operating time. This may compromise patient safety [3]. Inexperienced assistants may unintentionally rotate the camera scope, thereby affecting the surgeon’s visual perception. This can cause misidentification of anatomic structures and lead to intraoperative injuries [4].

Issues with human camera assistance can be resolved by using scope holders. Camera scope holders that replace human assistance can provide images without the effect of hand tremors. Passive scope holders are maneuvered manually between fixed camera positions. Although clear views without hand tremors are provided, smooth movement of the scope can be challenging [5, 6]. To overcome this, robotic scope holders that allow visual stability and full control by the operating surgeon have become commonplace. Compared to a human camera assistant, an active robotic scope holder provides the operating surgeon with a flexible and steady view, in addition to reducing operating time and cost [5]. Optimal views in human-assisted laparoscopy depend on the training and experience of the assistant, while there is less dependency on these factors in a robot-assisted procedure [7]. Using robotic scope holders offers improved ergonomics for surgeons [8]. While musculoskeletal disorders are prevalent among laparoscopic surgeons due to posture and repetitive movements, reports of physical discomfort, such as wrist, shoulder, back and neck pain, are much lower in robotic surgeries [9].

In robot-assisted surgical procedures, the surgeon controls the slave robot using a master interface. Robotic systems utilize a variety of user interfaces, which include control by foot, hand, voice, head, eyes, and image-based tracking of surgical tools. (Detailed descriptions of each user interface type are presented in the first part of the Results section.) To reduce cognitive load on the surgeon, natural and direct mapping of interface movement with the robotic actuator is required. An ideal interface is intuitive, ergonomic, and user-friendly [10, 11]. Intuitive interfaces help decrease the time required for endoscope tip positioning, which is imperative while performing advanced surgical interventions [12].

Surgical robotic systems (and hence the user interfaces to control them) vary as per the intervention site. Surgical sites close to an entry port may only require rigid or semi-rigid scopes for visualization. However, complex procedures in the gastrointestinal tract, such as endoscopic submucosal dissection (ESD), require robotically actuated flexible scopes for manipulation and optimal positioning [13]. Biopsies of peripheral pulmonary lesions benefit from robotic bronchoscopy, which allows scope navigation for direct visualization through bronchi that branch at different angles, and become progressively smaller deeper in the lungs [14]. Improved surgical precision that allows fine dissection makes robot assistance favorable for urological and colorectal surgeries.

To our knowledge, current literature does not provide a detailed review of the different scope user interfaces in robotic surgery. This review aims to provide an overview of user interfaces for robotically actuated camera scopes. The Results section describes the common user interfaces used by robotic systems for visualization during surgery. It also covers the different robotic surgical systems that actuates scope. It further provides mapping of user interfaces with the robotic systems as well as the surgeries performed under different specialties. The Discussion section describes the evolution of user interfaces over time. A comparison of key features of different user interfaces are also presented.

Methods

The review follows the Preferred Reporting Items for Systematic Reviews and Meta-Analysis extension for Scoping Reviews (PRISMA-ScR) guidelines [15]. An extensive search of scientific literature was conducted using PubMed and IEEE Xplore databases to identify articles describing user interfaces for robotic scope control in surgery. The search strategy for PubMed is given in Supplementary Content 1. Additional records were identified through thorough citation searches, websites, and patents. A total of 720 records were screened. Articles related to surgical systems using actuated scopes with user interfaces published between 1995 and 2022 were included. The records were screened using Rayyan app (https://www.rayyan.ai/). Duplicate reports, non-robotic passive systems, soft robots, systems not related to endoscopic or laparoscopic visualization, and papers not in English were excluded. Data extracted from the records were categorized into user interfaces and types of robotic systems. Additional citations were also used (such as company websites) to provide references for the technical specifications of the robotic systems. In addition, papers comparing different user interfaces were also identified.

Results

A total of 127 articles describing 67 different robot-assisted surgical platforms were included in the review after identifying and screening (Fig. 1). The platforms were grouped into: (a) 6 unique user interfaces to provide scope maneuvering commands (Fig. 2) and (b) 6 different categories based on the scope actuation mechanism (Fig. 3). Various characteristics of each robotic system, including (a) visualization type (stereo vision, high-definition, camera size, resolution), (b) degree(s) of freedom (DOF), (c) manipulation type (insertion, retraction, pan, tilt, rotate), (d) actuation method (motor, pneumatically driven), (e) control type (teleoperated, cooperative), (f) control interface, (g) development stage (commercial, research), (h) year, and (i) clinical application were also extracted.

Fig. 1
figure 1

Record identification and screening flowchart

Fig. 2
figure 2

Examples of interfaces to control scopes used in robot-assisted surgeries

Fig. 3
figure 3

Categories of robotic systems for visualization during surgery

Primary findings of the searches conducted are presented in the three subsequent sections. The first section describes the user interfaces for actuated scope control. The second section presents robot-assisted surgical platforms based on scope manipulation. A more detailed account of user interfaces used with different robot-assisted surgical platforms and in different surgeries is presented in the third section.

User interfaces to provide scope maneuvering commands

Robotic systems increase the performance of camera scopes by filtering tremors and translating precise movements. Intuitive user interfaces have been developed for control of robotic systems. These can be categorized by mode of input, which includes control by foot, hand, voice, head, eyes, and image-based tracking of surgical tools, as illustrated in Fig. 2.

Foot control

Foot pedals are often used as a clutch to activate scope control using handles such as finger loops or joystick [16]. The camera position is fixed unless the clutch is engaged. Foot pedals may also act as an independent control, such as the consoles developed by Yang et al. [17] and Huang et al. [16], where the novel foot interface controls the scope in four degrees of freedom (DOF). Foot control frees the hands for controlling surgical instruments. However, the buttons pressed by the foot may distract the surgeon’s attention, as they look down to differentiate the correct pedal from the ones used for operating an electric knife or other instruments [18].

Hand control

The types of hand control devices that have been adopted by commercially available systems include joysticks, buttons, finger loops, touch pads, and trackballs. These allow operating surgeons to have independent control on the visualization without relying on human assistance. The application of this type of interfacing is limited because surgeons cannot simultaneously operate the scope and their instruments [16]. Surgical flow is interrupted as the operating surgeon switches between control of surgical instrument and camera scope. Additionally, pain in the fingers and thumb is commonly reported for robotic surgeries during prolonged use [9].

Voice control

In systems controlled by voice, the surgeon speaks out commands such as “up”, “down”, “in”, “out” etc., to move camera scopes. Manipulating camera scopes using voice control mimics the default communication method used between operating surgeon and assistant, and there is no physical fatigue [19]. Noise in the background, however, can potentially affect voice recognition accuracy. Repetition of voice commands causing considerable delay in scope movement make it unfavorable for surgeons [20]. The typical task time for voice control is 2 s [21].

Head control

Head motion tracking provides a non-verbal intuitive control method using the surgeon’s head position as input data. Recognition of facial gestures [22] and use of head mounted displays [23] allows smooth scope control without discontinuing surgical tasks. However, it can be challenging to intuitively control the depth of the endoscope using head movements [24].

Eye tracking

Eye tracking involves navigating the scope using eye gaze control by measuring reflections in the cornea. Although eye tracking methods free up hands for surgical instruments, they can be considered distracting. In a study [25] reporting surgeon’s opinion on interfaces, 3 out of 5 surgeons rated eye tracking unfavorably.

Tool tracking

Tool tracking uses image analysis that continuously detects the surgical instruments when activated and controls the scope position accordingly. Automatic view centering and zoom adaption is possible with the computer-based instrument tip tracking system. However, surgeons might have different priorities in terms of what they want to see while using instrument tracking [26]. This control can be challenging for tasks without surgical tools.

Robot-assisted surgical platforms based on scope manipulation

This section presents the robot-assisted surgical platforms that utilize aforementioned user interfaces to visualize the operative field during surgery. As depicted in Fig. 3, two main categories were used: (i) robotic surgical systems (grouped based on access to surgical site: multiple port, single port, and natural orifice), and (ii) robotic scope holders (grouped based on flexibility of scope used: rigid, articulated, and flexible endoscopes).

Robotic surgical systems for multiple-port surgeries

As opposed to conventional laparoscopic surgery, robotic surgery provides enhanced visualization, dexterity, and ergonomics. Systems made for multiple-port surgeries utilize several incisions to gain access to the target area [27]. A surgeon console, either closed or open, with controllers is employed to teleoperate the robotic arm holding the camera scope. The surgeon may also switch ports over the course of the procedure. Robotic systems for multiple-port surgeries (Table 1), such as the da Vinci Xi (Intuitive Surgical Inc., USA) and Senhance (Asensus Surgical, USA), are utilized for a wide variety of clinical applications such as colorectal, general, gynecological, thoracic, and urological surgeries [28,29,30].

Table 1 Robotic surgical systems for visualization in multiple-port surgeries, by year

Robotic surgical systems for single-port surgeries

Compared to multiple-port procedures, single-port surgeries reduce invasiveness and significantly benefit patients with less scarring, low recovery time and reduced postoperative pain [56]. Robotic systems developed for single-incision laparoscopic surgeries, as detailed in Table 2, usually have a single arm with multiple instruments and a scope for visualization that extends outwards. The incision may be of different sizes depending on the system used and the procedure. Single-port surgery may prove challenging for the surgeon due to poor ergonomics. To avoid collision, distally actuated arms that achieve triangulation of the instruments around the target organ are often required [57]. Much like the ones for multiple-port surgeries, these systems utilize either closed or open surgeon console with controllers to manipulate the robotic arm. The da Vinci SP (Intuitive Surgical Inc., USA) has US Food and Drug Administration (FDA) approval for urologic and transoral otolaryngology procedures. Other platforms under development target gynecological and general surgery applications.

Table 2 Robotic surgical systems for visualization in single-port surgeries, by year

Robotic surgical systems for natural orifice procedures

Further minimizing surgical aggressiveness, robotic systems for natural orifice procedures approach the site of interest through the natural openings in the body such as the mouth or anus [67]. This is especially beneficial when the patient has a compromised immune system. The robot consists of a highly flexible and dextrous arm that can be steered towards intricate structures. An open surgeon console or a bed-side controller is used to manipulate the arm, and correspondingly the camera. Table 3 describes robotic systems used for transoral applications such as vocal cord lesion resection and bronchoscopy, as well as colorectal surgeries. Systems aimed for endoscopic submucosal dissection (ESD) in the gastrointestinal tract and ear, nose, throat (ENT) surgeries are under development.

Table 3 Robotic surgical systems for visualization in natural orifice procedures, by year

Robotic scope holders for rigid scopes

Minimally invasive surgeries employ rigid scopes for visualization that is either zero-degree which is forward-viewing or angulated that provides a wide range of view. Robotically actuated scope holders, which are used to hold and maneuver rigid scopes, provide a tremor-free stable view that is directly controlled by the operating surgeon. It eliminates the need to communicate desired scope position changes to an assistant [84]. Several holders have been developed for rigid scopes, with AESOP (Computer Motion, USA) being one of the earliest robotic scope holders using hand, foot, and voice control. As described in Table 4, they are used extensively in general, urology, gynecology, and colorectal surgeries. SOLOASSIST II (AKTORmed, Germany) has applications in transoral thyroid surgeries as well.

Table 4 Robotic scope holders for rigid scopes, by year

Robotic scope holders for articulated scopes

Articulated scopes have a flexible distal end that improves visualization around complex anatomy. Such scopes reduce the chance of interference with surgical instruments inserted through the same port. Research prototypes of scope holders described by Li et al. [121] and Huang et al. [26] aim towards thoracic surgery applications (Table 5). These research prototypes tend to use a variety of different control interfaces for scope manipulation.

Table 5 Robotic scope holders for articulated scopes, by year

Robotic scope holders for flexible endoscopes

Flexible endoscopes are highly dexterous and heavily used in gastroscopy and colonoscopy procedures. Complex movements are required when compared to rigid scopes [127]. Few robotic scope holders have been developed for forward-viewing flexible endoscopes (Table 6). Certain motions, such as rotation, are still controlled manually in some of these systems. Majority of the scope holders are exclusively used for colonoscopy and gastroscopy. The Avicenna Roboflex (ELMED Medical Systems, Türkiye) has applications in urology as well.

Table 6 Robotic scope holders used for flexible endoscopes, by year

User interfaces used in robot-assisted surgical platforms

Robot-assisted surgical platforms presented above utilize different user interfaces for scope manipulation. Overall, the results presented in Fig. 4a and Table 7 suggest that robotic surgical systems predominantly use hand control interfaces, whereas robotic scope holders tend to utilize and experiment with a variety of different interfaces, including tool tracking. In robotic surgical systems for multiple port, single port, and natural orifice, the design of closed consoles requires the surgeon to place their head on the stereo viewer. This limits the surgeon’s range of movement, making hand controllers appropriate for scope control. Most commercially available robotic scope holders offer a hand control interface due to its familiarity and intuitiveness which is necessary while performing surgical procedures. Advantages such as user-friendliness, easy hand–eye coordination, and lower cognitive load make hand control popular.

Fig. 4
figure 4

Mapping of user interfaces with robotic systems and surgeries

Table 7 Mapping of actuated scopes with common user interfaces used

As shown in Fig. 4b and Table 8, all categories of interfaces are used in general, urology, and gynecology surgeries. Otolaryngology, which focuses on ears, nose, and throat, predominantly utilizes hand control, and has the least variety of interfaces applied. Figure 5 illustrates the key surgical applications of the robotic systems, and the entry port sites. About 85% of prostatectomies in the USA are performed using robot assistance [148]. Complexity of the procedure and surgeon’s prior experience with related technology both affect the learning curve in robotic surgery [25].

Table 8 Common areas of surgical specialties and the interfaces used for robotic scope control
Fig. 5
figure 5

Surgical applications and entry port sites of various robotic systems

Discussion

Use of robot assistance in surgeries has increased in the past decade. Early appearances of user interfaces in research and commercial robotic systems are illustrated in Fig. 6. In the period of 1990–2010, commercial systems were chiefly controlled using foot, hand, voice, and head interfaces, while the period of 2010–2020 has witnessed the emergence of eye-gaze and tool tracking scope control interfaces. AESOP and ZEUS systems (Computer Motion Inc., USA) developed during the mid to late 1990s both utilized voice commands as input [32], mimicking the default communication between surgeon and assistant. Computer Motion Inc. was acquired by Intuitive Surgical which uses hand interfaces for their da Vinci systems. Intuitive Surgical has been the market leader since early 2000s [149]. Head motion for rigid scope control was first used in EndoSista (Armstrong Healthcare, UK) during the mid-1990s [150]. It was later commercialized by FreeHand Surgical, UK in 2008. Tool tracking, as implemented in the AutoLap system (MST Medical Surgery Technologies, Israel) in 2016, has received more attention recently.

Fig. 6
figure 6

Early appearances of different user interfaces in research and commercial robotic systems

There has been a limited number of studies comparing different user interfaces. These studies focus on robotic scope holders for rigid scopes. A summary of these studies is presented in Table 9, which illustrates that surgeons increasingly prefer scope control interfaces that free their hands to control surgical instruments and do not interrupt surgical tasks. Voice control was favored due to its reduced length of operating time and improved concentration [151]. However, foot control was preferred in multiple studies. In studies [19,20,21] comparing foot and voice controls that keep surgeon’s hands free, foot control was preferred, as voice commands had a higher chance of misinterpretation. In addition to task completion time, Allaf, Jackman [19] measured operator-interface failures, which was defined as occasions where the surgeon had to focus attention on the interface rather than the surgical field. The protocol was also repeated to assess the percentage of improvement retained after two weeks, where foot control was found easier to learn. While comparing AESOP and ViKY systems [21], it was found that voice commands had to be repeated due to speech recognition failures. Voice control was found to be affected by pronunciation while evaluating the RoboLens [20]. The system was assessed based on time for procedure completion, need for cleaning, image stability, and procedure field centering during several laparoscopic cholecystectomies. A significant lag between voice command and scope movement was observed. Although foot control is preferred over voice, eye–foot coordination might not be ideal, and surgeons often looked down to choose the right pedal from multiple ones [151]. Tool tracking is increasingly preferred as there is no interruption to surgery to control the scope. In a study by Avellino et al. [120] comparing joystick controlled by hand, body posture tracking and tool tracking, surgeons evaluated the interfaces based on a defined set of tasks. Joystick received good ratings and was ranked behind tool tracking, while posture tracking was found suitable for tasks requiring short distance movements. Despite raising concerns for tasks that do not involve surgical instruments, tool tracking was well-regarded.

Table 9 Comparison of different interfaces for scope control, by year

Overall, actuated scopes utilize a variety of user interfaces such as foot, hand, voice, head, eyes, and tool tracking to provide stable views and smooth control during minimally invasive surgeries. Hand control is the most popular interface across all categories of surgical systems as it is familiar, intuitive and requires less mental load. However, various other interfaces are being investigated to address the interruption to surgical workflow caused by hand control. Head tracking interfaces are being explored in research prototypes such as the multiple-port system by Jo et al. [48]. This helps address the issue of interruption to surgical procedure caused by hand interfaces when switching control between surgical instrument and scope. Breaks in surgical workflow can result in longer operating time and increased risk of patient injury [48]. Having an easy-to-use and intuitive single-person interface is considered important for scope control by surgeons and gastroenterologists [152]. In teleoperated systems, where the surgeon is away from the patient, there is a preference for an open surgeon console. In an open console design, the surgeon views the video feedback through a head-up display, as opposed to an enclosed stereo viewer. Compared to a closed console, an open platform offers increased situational awareness, enables the expert surgeon to effectively mentor interns, and improve team communication [153, 154]. Preference for working position, either sitting or standing, varies among surgeons [152].

Majority of the systems utilizing hand controllers (such as da Vinci—Intuitive Surgical, Revo-i—Revo Surgical Solutions, and Enos—Titan Medical) or head-motion-based controllers (such as FreeHand system and MTG-H100–HIWIN) requires a foot pedal to activate the scope control mechanism. In these multimodal user interfaces, the foot pedal has two functionalities. First, it acts as an on–off switch that triggers the motion of the scope. In case of hand controllers, it enables the operator to switch the control from surgical instruments motion (to operate on the tissue) to scope maneuvering (to navigate the operative field). In case of head-motion-based controllers, it activates the scope motion only when the foot pedal is pressed and thus allows the surgeons to freely move the head during the rest of the procedure [155, 156]. Second, the foot pedal acts as a clutch and facilitates ergonomic repositioning of the hand controllers or head position [157]. Another example of a multimodal user interface for scope control is head-mounted display (HMD) devices. HMDs have been used in the operating room for surgical navigation and planning [158, 159]. In case of actuated scope maneuvering, the operative field view is rendered by HMD devices in a virtual reality or a mixed reality environment, whereas head motions detected by the device’s sensors are used to maneuver the scope [160,161,162]. In contrast to visualizing the operative field on a physical screen, the usage of HMD devices offers the surgeon the flexibility to ergonomically place the virtual view of the operative field in the operating room [5, 163, 164]. It decreases the surgeon’s shift of focus from the screen to the operating site [165, 166] and thus may assist in reducing the prolonged strains (in the neck and lower back) due to bad monitor positioning [167, 168]. Further end-user clinical studies would be required to assess the potential of HMD devices as a multimodal user interface (i.e., to immerse the operator with the information pertaining to the operating field and evaluate the control of the robotic system [169, 170]).

Limitations of this review include removal of non-English literature. The exclusion may have prevented a broad representation and insight. Methodological quality of the included studies was also not assessed. Additionally, there are no studies comparing all the different user interfaces with the same surgical task and scenario, which would have provided an equal assessment.

In conclusion, the observations in this review indicate that integration of multiple control interfaces for camera control would be ideal, especially for scope holders used in bed-side procedures. As each interface has its own benefits, merging different control types enables the surgeon to benefit specifically from each interface in various surgical steps [120]. The surgeon would be free to choose the appropriate control type throughout different stages of the surgical procedure. Integration of head tracking, which is efficient for 3D navigation, or tool tracking, which lowers cognitive load, would be advantageous. Nevertheless, merging several controls may result in limitations such as redundancy. It may also pose a challenge for the surgeon to achieve seamless transition while changing interfaces. It would be helpful to further explore the impact of different user interfaces on surgical outcomes in future studies.