1 Introduction

In teleoperation involving high-value or delicate structures, it is imperative that the operator is able to perceive as much information about the remote environment as possible without being distracted. A combination of feedback qualities beyond simple monocular vision can assist the operator by overlaying sensory information over data captured from the remote scene using virtual fixtures [42]. This is particularly useful when the teleoperated task is known in advance. With an effective operator interface, the user can make the best decisions to efficiently complete the task with minimal physical and mental effort.

Operator performance can be improved with the implementation of haptic virtual fixtures and new vision modalities. However, haptic virtual fixtures for teleoperation are predominantly employed to increase awareness and understanding, not to directly assist in the teleoperated task [34]. For example, a novel augmented reality interface coupled with virtual fixtures reduced overall positioning error in a maintenance task; however, the fixtures primarily assisted as operator notifications [29]. Implementations of task-specific and directly assistive fixtures need to be evaluated in order to be widely deployed, e.g., in telemanipulation. Therefore, user studies are required to gauge relative impact of haptic and visual augmentations for telemanipulation.

Ni et al. investigated operator performance and taskload when servoing a robot manipulator to reach a point target using haptic fixtures within a virtual reality environment [36]. While promising, this research involved unconstrained tasks: provide arbitrary motion commands, navigate to a point location. Many robotic tasks require trajectory following or finer, more constrained commands, especially in the presence of sensitive structures. Learning of new interfaces has also been demonstrated to benefit from virtual fixtures, as shown in [14], whereby a user study validated the performance gains in an sEMG controlled 2D game. Real-world tasks, however, require manipulation in three spatial dimensions. In another study, haptic virtual fixtures were implemented to execute multiple grasps of various objects [26]. The only metric evaluated was the number of objects grasped, and the task was positioning of the grasper. More general performance metrics provide a broader insight into the benefits and drawbacks of 3D visualization and task-specific haptic virtual fixtures.

This study explores the effects of user-placed haptic guidance virtual fixtures and 3D-mapping methods for a telemanipulation interface. In particular, the following teleoperation user interface feedback types were examined:

  1. 1.

    Visual feedback

    • 2D monocular RGB

    • 3D voxel representation

  2. 2.

    Task-specific haptic guidance

Contributions

This work investigates the aforementioned feedback modes in a bilateral teleoperation task through extensive user-study experiments. These experiments are used to:

  1. 1.

    Quantify the effects of feedback mode in several performance metrics.

  2. 2.

    Provide baseline insight for designing feedback for user interfaces in telemanipulation.

  3. 3.

    Provide concrete directions and comparisons for future experiments.

2 Background

2.1 Visual feedback in teleoperation

Binocular vision has been shown to improve teleoperator speed and precision [11], but requires bulky and expensive 3D capable displays. In contrast, inexpensive commodity RGB-D cameras have provided non-contact means of collecting geometrical information and have grown in popularity in teleoperation applications [10, 33, 37, 51]. Stereo vision is another widely used alternative, and in robot-assisted minimally invasive surgery, stereo endoscopes provide real-time 3D geometries often represented as point clouds [24, 46,47,48]. It has been shown that providing the teleoperator with depth information in the form of a real-time point cloud for certain navigation tasks can improve performance when compared to monocular RGB streams [32]. Another efficient method of displaying 3D surface geometries is through voxel occupancy grids [15, 53], which can preserve previously observed occluded geometries.

However, with depth information, several factors can deteriorate the quality of feedback and increase confusion to the viewer. For example, sensor resolution may be an issue when resolving smaller manipulation targets—the visual data may only provide a general localization of the object. Furthermore, depending on material properties, depth information may be noisy or arbitrary (e.g., transparent materials, glancing angles or light absorbant materials) [30, 35]. Other variables such as measurement distance have also been shown to affect measurement noise and density [22]. Because of this variability, it is not certain that 3D-mapping techniques can improve operator performance in telemanipulation tasks.

Mast et al. [32] determined that the usefulness of voxel-based 3D-mapping in navigation varied between environments. In this work, the utility of such 3D representations when applied to a manipulation task is explored. In such a task, 3D information may not be conveyed in a useful way and could in fact be detrimental and confusing when dealing with movement and small objects such as a valve handle. Additionally, occlusions could result in lack of information and heightened interpretation effort, while a monocular RGB stream is intuitive and familiar to most users.

2.2 Haptic feedback in teleoperation

Forbidden region haptic virtual fixtures help to prevent the operator from entering an undesired configuration. In [17, 18], forbidden region virtual fixtures in combination with pretouching sensing were used to help prevent unwanted contacts during exploration of an unknown, potentially delicate object. Similar types of forbidden region fixtures have found their use in surgical contexts [27, 39]. In contrast to forbidden region schemes, haptic guidance virtual fixtures push, prod or otherwise guide the operator’s hand in a desired direction or trajectory [1, 3, 31]. This is useful for maintaining a predefined trajectory [2, 38, 43, 44] as well as adaptive constraints [40]. In the case where depth perception is difficult, avoiding contacts and maintaining a safe, desired path can be assisted with such virtual fixtures. Vision and haptic virtual fixtures have been used in tandem for novel clinical applications as well [7, 8].

Moreover, since the virtual environment and force feedback are calculated in software, the actual robot end effector can be locked out of deviating from the desired path, while a guiding force is applied to the user. In [19], a flexible guidance fixture was demonstrated where computer vision was used to identify obstacles obstructing a predefined 2D virtual guidance trajectory. A modified trajectory was calculated that avoided obstacles. The above types of guidance fixtures deal with fixed, predefined paths. In the case of telemanipulation in an unknown environment, while the task may be predefined, its ideal configuration in the remote location is difficult to determine. However, when enough information about the physical task space is obtained, it is feasible and desirable for the teleoperator to place the desired trajectory [52].

2.3 Comparative studies

Goodrich et al. [12] outlined several different schemes and levels for providing autonomous assistance in teleoperation. Peon et al. [41] explored the effect of different haptic modalities in combination with audio feedback on subject response time to violating a spatiotemporal constraint. Wang et al. [52] showed an increase in user execution time and accuracy from the combination of both the visual and haptic display of a guidance virtual fixture in a computer simulation, i.e., the user could see the desired trajectory as well as feel guidance forces. The guidance virtual fixture was generated and placed by the user using a computer mouse on a virtual surface [52]. While this method limited the user-defined virtual guidance fixture to the face of an object, it is extendable to trajectories in three-dimensional space. In a similar study, Kuiper et al. [25] examined haptic and visual feedback modes and their effects for nonholonomic steering, whereby a user controlled nonholonomic vehicle in a simulated steering task with different levels of constraints. The virtual fixtures were used to guide users along predicted or suggested vehicle paths. It was found that visual feedback is needed for improvements when providing only predicted trajectories. In [5], various haptic feedback modes for visuo-manual tracking for learning predefined trajectories, namely writing different Arabic and Japanese characters, were evaluated. It was suggested that haptic feedback can assist in learning to write 2D characters. In a similar study, it was shown that haptic information from handwriting can be compared and classified based on the users kinematic variations [54]. Different haptic assistance levels were assessed in completing a 2DOF maze navigation task in [40].

It has been established that haptic virtual fixtures can be useful for object avoidance and following a predefined path based on a priori information [3, 5], and several works explore 2D effects [5, 40]. In this work, the user is given the ability to manually set the virtual fixture for a useful path with predefined geometry to complete a 3D telemanipulation task—however, this path is not required to complete the task. For properly and efficiently placing this fixture, it is imperative that the user be provided with the real-time 3D-mapping representation—the negative effects of inaccuracies in shared haptic guidance feedback are described in [6]. In this study, depending on several spatial, temporal and sensor-limited factors (e.g., the user may have a difficult time placing the virtual fixture), it is not clear whether such a feedback option is beneficial.

3 Experimental setup

3.1 System description

Fig. 1
figure 1

Teleoperation master console station. Includes 3DOF haptic device as well as a LCD monitor to display visual feedback

The setup for this project includes a bilateral teleoperation arrangement. On the master console station, the teleoperator manipulates a haptic device, the Sensable PHANToM Omni. This device sends 3DOF position commands to control the end effector location of the youBot, and it also receives and displays 3DOF haptic force feedback commands from the virtual fixture software. In addition, the user is presented with visual feedback on a LCD monitor. The teleoperator’s goal is to manipulate the robot arm to turn a gas valve. The master console setup is illustrated in Fig. 1.

The remote robot proxy includes a Kuka youBot robot with a Primsense Carmine RGB-D camera. The youBot has an omnidirectional base and a 5 degree-of-freedom (DOF) manipulator. In this implementation, these joints are controlled by National Instruments’ Compact RIO real-time controller, and commands to the master console are transmitted via an Asus AC router. These features are shown in Fig. 2.

To evaluate the effectiveness of the described feedback modes and user-placed guidance fixtures, teleoperator performances during the valve turn task are compared under the following different user feedback conditions:

  1. 1.

    Visual only, monocular RGB stream (R)

  2. 2.

    Visual only, 3D-mapping voxel method (V)

  3. 3.

    Visual and haptic guidance virtual fixture (VF)

Fig. 2
figure 2

Teleoperated platform based on Kuka youBot

Scenario 1 (R) represents a particularly simple baseline case, RGB streaming video, which is still employed in teleoperated tasks.

Scenario 2 (V) provides a baseline for 3D visual representation. The user is able to rotate and translate his or her view within the 3D representation as well. This representation includes a 3D voxel map of a volume enclosing the youBot’s task space, which is updated with RGB-D sensor information based on a Bayesian statistical model to determine binary occupancy state. Similar methods were explored in [15, 35]; however in this study, the voxel allocation and update is hardware accelerated to ensure real-time acquisition and fast response to motion.

Scenario 3 (VF) provides the operator with a guidance virtual fixture of proper shape for the task and visualization from Scenario 2. This fixture will prevent the operator from deviating from a path known to successfully complete the valve turn and avoid undesired contacts, and overcome confusion caused by occlusion of the valve itself due to the manipulator itself. The trajectory is a series of finely sampled, ordered points. Because the environment is rendered as a 3D voxel grid, it is simple and quick for the operator to place the visualized desired trajectory properly. The three feedback modes are described in detail in Sect. 4.5.

It is of interest to determine whether or not, in this telemanipulation task, 3D-mapping techniques will improve operator performance, even if displayed to the user with 2D visual display. Furthermore, the efficacy of a user-placed guidance trajectory is explored. A comparison is sought between 2D monocular RGB stream (R), 3D voxel mapping techniques (V), and visual + haptic feedback (VF). Three questions are being explored:

  1. 1.

    Does the addition of 3D-mapping techniques improve user performance, decrease workload or increase awareness?

  2. 2.

    Do manually placed haptic virtual fixtures provide additional improvements?

  3. 3.

    Which comparisons warrant immediate further study?

Because of the variability of the above factors, the nature of this experiment is exploratory and investigates a broad range of metrics with a simple, generalizable task. The results of this work will provide insight into the suitability of 3D-mapping methods and manually placed virtual fixtures for feedback in telemanipulation tasks, and inform metrics for future studies.

4 Methods

4.1 Experimental task

The operator is asked to complete a valve turn task. Such a task is motivated from a disaster recovery perspective. In the case of a gas leak during a natural disaster, teleoperation is attractive because it reduces risk to human responders. Moreover, a teleoperated device may be better designed to reach constrained physical scenarios than a human being. In this study, the valve to be turned consists of a ball valve structure with \(90^\circ \) dynamic range. The task can be broken down into two subtasks and turns:

Task A :

turning the ball valve from the 12 o’clock position to the 3 o’clock position.

Task B :

turning the ball valve from the 3 o’clock position to the 12 o’clock position.

The task is depicted in Fig. 3, and it is with this setup that the user study for the project was conducted.

Fig. 3
figure 3

Teleoperation valve turn task. Task A (red) consists of turning the ball valve from the 12 o’clock position to the 3 o’clock position, while Task B (green) consists of turning the ball valve from the 3 o’clock position to the 12 o’clock position (color figure online)

The slave robotic device, as described in Sect. 3.1, consists of a modified Kuka youBot platform and has been assumed to have reached the task location. The user is not required to navigate the robot base.

4.2 Subject recruitment

In this study, recruitment was performed on campus and subjects consisted solely of undergraduate and graduate students. As described previously, a total of three test conditions exist. In this project, a between-user study was employed, in which 21 male subjects participated (seven in each test group). Their age ranged from 18 to 35 years of age (mean age group R: 25.143; V: 23.000; VF: 26.143). Participants were chosen to be male to avoid any effects due to possible differences between males and females in spatial problem solving as described by [20].

Each of the participants used computers at least 10 h per week. In each group (seven participants total), six of the participants played less than 2 h per week of video games, while exactly one participant played more than 10 h per week (mean videogame usage per week R: 2.214; V: 1.857; VF: 2.000). None of the participants had prior experience using the Sensable PHANToM Omni.

4.3 Metrics

In this project, both objective and subjective metrics were employed for comparison. In particular, objective performance metrics included:

  • Time to complete the valve turn task (s)

  • Path length of the end effector (mm)

  • Number of undesired collisions

  • Jerk of the end effector \(\left( \frac{{\hbox {m}}}{{\hbox {s}}^3}\right) \)

Each trial was video recorded, and post-experiment was manually labeled for undesired collision count. After the completion of the task, subjective measures were assessed via post-task questionnaires evaluating:

  • Perceived workload

  • Situational awareness

Perceived workload was measured using the unweighted NASA Task Load Index (TLX) [13], and situational awareness the three-dimensional Situation Awareness Rating Technique (SART) [50] scaled to (0, 120). Situational awareness is critical for producing effectual robot behavior [12].

4.4 Procedure

The experiments were conducted in an office and the hallway corridor outside. The participant teleoperated from the master console within the office, while the simulated remote environment was in the hallway out of view from the subject. In the hallway, the youBot and the valve structure were placed in the same location for each experiment. Prior to the experiment, the users were allowed to see the valve and robot position, and were further allowed to turn the valve manually to obtain a sense of the range of motion as well as the torque needed to turn the valve.

After viewing the valve structure and youBot, the subjects underwent a training period which lasted for 20 min or when the user was satisfied, whichever happened first. (In all cases in this study, the user was satisfied with the training prior to the 20 min.) The training session occurred with the youBot in the office space within view of the operator. During the training session, the user was only allowed to teleoperate with their given feedback mode only. For the monocular RGB mode (R), the user was presented with a \(640\times 480\) video stream of the manipulator in well lit conditions, for (V) a voxelized representation and for (VF) the operator received the voxel map visual feedback and could furthermore place a haptic guidance fixture. Data was acquired at 50 Hz and visual feedback updated at 30 Hz, the data acquisition rate of the Primesense Carmine camera. The haptic update rate was set at 1200 Hz to maintain realistic force feedback. (A minimum rate of 1 KHz is needed for realistic interaction [28].)

For each subject, once the experiment began, noise isolating ear protection was placed on the participant’s ears, and the trial was videotaped for post-processing of unwanted collision. Each subject was asked to perform ten tasks, and they performed in order: Task A five times followed by Task B five times. Between trials, the robot was homed to a fixed starting configuration. The user was timed from movement from this home position until the valve was turned completely. In mode VF, the user needed to place the guidance virtual fixture during each trial, i.e., ten times per subject.

4.5 Visual and haptic feedback design

4.5.1 Monocular RGB, (R)

Fig. 4
figure 4

Monocular RGB visual feedback. The user is unable to change viewing angles

Monocular RGB feedback, (R) is a simple and widely available baseline case. The user was presented only with streaming RGB video feedback displayed via the LCD monitor. Figure 4 shows a typical screenshot of the visual feedback from this mode, which was rendered in OpenGL using components found in RViz.

4.5.2 Voxel-based 3D-mapping, (V)

In the 3D-mapping mode (V), a voxelized cube with side length of one meter and resolution of 5 mm was graphically rendered in front of the youBot and is depicted in Fig. 5.

A simple Bayesian update method using heuristically tuned update parameters was used to determine voxel occupancy. The voxel generation scheme can be summarized as pseudocode below:

figure a
Fig. 5
figure 5

Voxel-based 3D-mapping feedback. The user can change views in the 3D visualization to view otherwise occluded objects and surfaces

In each RGB-D frame, for each voxel, project the voxel onto the depth image. The voxel is represented in the depth image with pixel location p, and depth value of v. In the measured depth image, p also has a camera measured depth representing real-world data, call it s. The two depths, v and s, represent the voxel depth and the sensed surface depth respectively. If \(v\le s\) (i.e., the voxel is closer than the surface sensed by the camera), determine the voxel occupancy state, O, via a Bayesian update rule.

In this way, the occupancy was updated every RGB-D frame while preserving occupancy states of now occluded voxels. \(i,j,k,\tau \) were all heuristically tuned. The algorithm is highly parallelizable and was hardware accelerated to ensure real-time acquisition and fast response to motion. In mode V, the user could view the occupancy grid from various angles using a computer mouse. The RGB-D data was captured at an acquisition rate of 30 Hz, and again the visual feedback was rendered in OpenGL using components found in RViz. The same methods can be used to generate voxel occupancy grids from stereo captured point clouds.

4.5.3 Guidance virtual fixture, (VF)

In the haptic virtual guidance fixture feedback mode (VF), the user was provided with the same 3D visual feedback described for mode V. In addition to this, the user was able to place a visualized path (a green colored arc) on the ball valve structure, as shown in Fig. 6.

This path provided haptic feedback once placed, and ideally passed through occupied voxels representing objects for interaction. For the valve turn in particular, it was desired that the path passes through the voxels representing the handle of the ball valve. The guidance path constituted of a \(90^{\circ }\) circular arc of radius the handle length. This path lies in a plane normal to the ball valve axis of rotation and would ensure sufficient torque applied to the ball valve from the end effector while allowing for acceptable deviation from the ideal circular arc.

Fig. 6
figure 6

Manually placed haptic guidance virtual fixture feedback. The red denotes entry and exits points of the trajectory, while the green signifies the valve-turning portion of the path (color figure online)

In order to render the haptic feedback, the path is first sampled as a set of spatially ordered points. As the operator approaches the guidance path to within X of any sampled point, an attractive haptic well is generated around that point. The force profile of this haptic well is defined by a simple piecewise cubic polynomial, as described by Eq. 1.

$$\begin{aligned} f(x) = {\left\{ \begin{array}{ll} a_2x^2+a_3x^3 &{}0\le x<\frac{X}{2}\\ a_2(X-x)^2+a_3(X-x)^3&{}\frac{X}{2}\le x\le X\\ 0&{}\text {else} \end{array}\right. } \end{aligned}$$
(1)

This cubic polynomial results in a haptic well force profile whose shape is shown in Fig. 7.

The effect of this force profile is twofold. Firstly, the user can remove themselves from the guidance fixture by moving beyond X of the guidance point. Secondly, the user is strongly encouraged to stay within \(\frac{X}{2}\) of the guidance point while receiving force feedback. In this work, \(X =\) 1 cm, and the peak guidance force is scaled to 2.5 N.

To move along the path, the current closest point and directly adjacent ordered points are considered. When the user moves and an adjacent point is now closer, the haptic well around the current point is attenuated, while a new haptic well is enforced at the new closest point. This process is repeated on the subsequent points, guiding the user along the ordered points. If the user leaves the guidance fixture, the entire procedure is repeated.

Fig. 7
figure 7

Haptic well force profile shape

5 Results

5.1 Quantitative metrics

Time to completion was measured from initial movement from the pre-calibrated home position to when the valve turn was completed. Unwanted collisions were manually labeled post-experiment. For this, physical contact of the robot manipulator with any object not the valve handle was considered unwanted. Distinct contacts required lift-off between the contacts; i.e., a dragged contact was counted only once. Path length was calculated via forward calculated end effector trajectory from recorded joint angles. Finally, differentials of the sampled position were used to approximate jerk. A low-pass filter removed high-frequency components introduced through discrete differentiation. These metrics measure operator performance in the valve-turning task.

Fig. 8
figure 8

Boxplots comparing quantitative metrics

Figure 8 shows graphically the results across the various feedback modes (R: monocular video stream, V: 3D-voxel mapping, VF: 3D-voxel mapping and guidance virtual fixture) along the four different quantitative metrics. The boxplots show mean values with standard deviation as error bars.

The four quantitative metrics, completion time, number of unwanted collisions, path length and jerk, were measured for each trial using repeatable and consistent methods. The results were grouped by feedback mode, and then, the mean for each category was calculated. The results are shown in Table 1.

5.2 Qualitative metrics

Two post-experiment questionnaires were administered, the NASA TLX and SART, to evaluate task load and situational awareness respectively. Figure 9 shows these results. The user responses were compiled, and the scores were grouped by feedback mode. The mean results are shown in Table 2.

5.3 Analysis

Several statistical approaches were employed to analyze the experiments, including a multivariate comparison as well as two different post hoc comparisons. Multivariate analysis of variance (MANOVA) was used as an omnibus measure to determine at least how many dependent variables (i.e., time to completion, number of collisions, path length, jerk, TLX and SART) may significantly differentiate the three groups (i.e., operating modes: R, V, and VF), while simultaneously controlling for multiple comparisons.

Table 1 Mean raw values of quantitative metrics
Fig. 9
figure 9

Boxplots comparing qualitative metrics

The MANOVA resulted in at least two degrees of freedom between groups. Further statistical analyses are thus warranted. The entire data is separated along the two identified degrees of freedom as shown in Fig. 10.

Table 2 Mean raw values of qualitative metrics
Fig. 10
figure 10

Data plotted by top two eigenvectors

Figure 10 indicates that the second ranked feature deviates greatly from the top ranked feature. Post hoc approaches investigate pairwise comparisons to identify precisely which measures differentiate the three feedback modes, and future comparisons for study. First consider pairwise two-sample t tests between the three feedback groups within the quantitative metrics, as shown in Table 3. Consider next the pairwise two-sample t tests between the three feedback groups within the qualitative metrics, as shown in Table 4. The statistical significance metrics are explained in detail in the following text.

Table 3 Statistical p values of quantitative metrics
Table 4 Statistical p values of qualitative metrics

5.4 Statistical corrections

Recall that three separate experimental groups and six different metrics were examined. The result is a total of 18 different hypotheses considered from the data set. Thus, a multiplicity problem arises, and statistical analysis must account for this in order to avoid Type I errors, i.e., falsely rejecting a null hypothesis. Two different post hoc measures were employed:

  1. 1.

    family-wise error rate (FWER)

  2. 2.

    false discovery rate (FDR)

The former probes the propensity of making at least one false discovery, whereas the latter characterizes the probability of false positives.

5.4.1 FWER

The multiple comparisons problem is addressed controlling for FWER with a conservative measure, Holm–Bonferroni correction. To begin the Holm–Bonferroni correction, first consider the \(i = 18\) different p values from the two-sample t tests analyzing the 18 null hypotheses (three experimental groups, six metrics). Then sort these p values in ascending order in a list with corresponding null hypotheses:

$$\begin{aligned} p_1, p_2, \ldots , p_{i}~~~~~~n_1, n_2, \ldots , n_{i} \end{aligned}$$

Now take the typical analysis significance level, \(\alpha = 0.05\). Via the Holm–Bonferroni method, let \(j \in [1,i]\) be the least value index such that the inequality below is satisfied:

$$\begin{aligned} p_j > \frac{\alpha }{i-j+1} = \frac{0.05}{19-j} \end{aligned}$$

Then, the null hypotheses \(\{n_1, n_2, \ldots , n_j\}\) are rejected, while the remaining are not. Using this analysis method yields the statistical results in Table 5:

Table 5 Holm–Bonferroni correction

5.4.2 FDR

The multiple comparisons problem is addressed controlling for FDR via Benjamini–Hochberg correction. Similar to the Holm–Bonferroni correction, first label \(i = 18\) different p values from the two-sample t tests sorted in ascending order. Considering a false discovery rate appropriate for exploratory experiments, begin with \(Q = 0.1\). Then proceed via Benjamini–Hochberg, and let \(k \in [1,i]\) be the least value index such that the inequality below is satisfied:

$$\begin{aligned} p_k > \frac{Qk}{i} = \frac{0.1k}{18} \end{aligned}$$

Then, the null hypotheses \(\{n_1, n_2, \ldots , n_k\}\) are rejected, while the remaining are not. Using this analysis method yields the statistical results in Table 6.

Table 6 Benjamini–Hochberg correction

5.5 Post hoc summary

With Holm–Bonferroni corrections, five total pairwise comparisons are shown to be significant despite the exploratory nature of this work. The significance of these comparisons for FEWR is shown in Table 7.

Table 7 Holm–Bonferroni summary

Holm–Bonferroni is a conservative measure of significance. The exploratory nature and motivation of this work lends itself to a more forgiving correction that focuses instead on FDR, which controls the proportion of discoveries that are false, and can indicate potential comparisons for further in-depth study. Thus, Benjamini–Hochberg corrections are more consistent with the stated contributions and goals of this work, and result in ten total pairwise comparisons of interest. These are shown in Table 8.

Table 8 Benjamini–Hochberg summary

6 Discussion

In all quantitative metrics, the direction of the comparisons are encouraging and match expectations; VF yielded better performance than V which performed better than R. Most of the qualitative metrics lacked statistical power, with the exception of TLX. The two post hoc correction methods provide insight from separate vantages.

The metrics compared in light of FEWR were corrected conservatively and seek to validate strong concluding statistical power. These comparisons, five of which are statistically powerful, are shown in Table 7. Of particular interest is that adding a user-placed haptic guidance fixture resulted in better collision avoidance compared to 3D voxel visualization alone. To better suit the exploratory aspect and numerous metrics of interest, statistical analysis focusing on FDR was conducted, the results of which are depicted in Table 8. Significance in this light conveys exploratory indicators for future in-depth work. The particular p values from these comparisons are shown in Tables 3 and 4, with * indicating significance in FEWR and \(^\dagger \) significance in FDR.

6.1 Design and real-world implications

The data suggests that it is beneficial to use RGB-D data over a RGB video stream alone. The teleoperator performance was shown only to improve with this modification. Improvements in collision avoidance, overall path length, jerk and task load can expand accessibility and safe operation of limited and highly trained teleoperated tasks—the novice users in this study realized appreciable gains in these metrics from 3D visualization. One highly skilled teleoperated task is underwater telemanipulation: users operate tools, manipulate valves or match cables underwater. This is accomplished with macro-scale imaging [16, 23, 49] and is challenging and expensive due to the high skill level required [45]. Designing systems with small-scale depth imaging and voxel representation, as demonstrated in this work, can enhance the telemanipulation performance of less trained individuals, thus reducing the skill and subsequent operating costs to execute such operations.

Furthermore, when delicate or critical structures are involved, a collision could be disastrous. For such scenarios, this study shows that the addition of haptic virtual fixtures can further enhance telemanipulation collision avoidance. Search and rescue (SAR) robotics is one such application area, a case where safe control is essential in a hazardous, unstructured environment. Robots in this application, also known as response robots, can be used for incident prevention and support, and can be a useful tool for saving human lives and accelerating the search and rescue process. From data collected by search and rescue initiative ICARUS end users, it was a general consensus that in practical SAR applications, ”the robots will always need to be teleoperated [sic] for safety and legal reasons” [9]. These types of robots have been used in trials-by-fire, including response to the Chernobyl meltdown, the Fukushima Daiichi meltdown, the Sago mine disaster, terrorist attacks of September 11th, Hurricane Katrina and La Conchita Mudslide [4, 21].

Current response robots for SAR applications, however, operate under a minimal amount of autonomy and assistance modes. This may be due to the delicate nature of incorporating novel technology into high-risk operations, where safety and legal issues may arise. The complexity and unstructured context of many crisis and SAR missions also make such missions extremely technology-unfriendly [9]. In terms of current utilization, remote robot responders are used merely to obtain geographical information and to assist human responders—human responders are able to remotely navigate and access robot sensor data [9]. In this way, most current practical applications of rescue robots have been passively assisting human responders in situation assessment.

This user study helps to expand the scope and utility of telerobots in rescue situations. The data provides encouraging results toward integrating the use of 3D visualization and haptic assistance in telemanipulation. Of particular note is the significant improvement in collisions avoidance by adding a guidance virtual fixture. Further gains in jerk indicate that these telemanipulation interface improvements offer safer, more direct and smoother operation, all of which are encouraging for designing SAR application telerobots. This contribution provides results, limitations and future implications of the described user interface features. Such information is needed in order to intelligently investigate performance effects in focused studies, with ultimate goals to adopt new technologies that eliminate the need for and risk of in-the-field human responders, replaced instead with intelligently controlled telerobots with capabilities to safely and efficiently execute sensitive tasks.

7 Conclusion

In this work, feedback modalities were examined in a basic telemanipulation task. In addition to monocular visual feedback, depth information (voxel occupancy grid) and manually placed haptic guidance fixtures were tested. We explored the effects of these modifications to task performance in quantitative and qualitative metrics. While 3D voxelization techniques have been shown to improve performance in navigation, the effect on telemanipulation had not yet been quantified. While the use of predefined fixtures has been shown to improve performance, we evaluated the effect of manually set guidance fixtures in a real-time telemanipulation task. User studies evaluated these methods.

The results of the user study show that even in simple telemanipulation tasks:

  1. 1.

    Guidance virtual fixtures significantly improve collision avoidance.

  2. 2.

    3D visualization significantly reduces the number of collisions compared to 2D.

  3. 3.

    3D visualization significantly improves path smoothness compared to 2D.

Furthermore, this study showed that varying the user feedback mode did not affect situational awareness. There were no detrimental effects using 3D-mapping methods over RGB streams in any of the metrics.

7.1 Future work

This user study provides a baseline assessment of the effects of 3D voxel representations and user-placed haptic guidance virtual fixtures on operator performance in telemanipulation. Results provide a comparative baseline for evaluating the effects of additional augmentations, while the exploratory nature of this work involved numerous performance evaluation metrics. Encouragingly, the quantitative metrics resulted in comparison directions consistent with feedback augmentations, i.e., VF performed better than V, which was superior to R. While testing many metrics reduced statistical power, this experiment gives way and direction to focused studies into feedback modes for telemanipulation. In particular, in light of the false discovery rate corrections (compare Table 7 with Table 8), it would be of interest to investigate effects of guidance haptic fixtures and visual feedback on

  • reducing completion time.

  • path length

  • task load

Furthermore, comparing soley VF and V with more complicated telemanipulation tasks (e.g., increased clutter, nonlinear subtasks etc.) and fewer metrics may increase statistical power. This is consistent with the quantitative comparison directions between VF and V that this study revealed, despite a relatively simplistic task and multiple performance metrics.