Introduction

Robot assisted surgery (RAS) has shown the potential to lead to preferable clinical outcomes for patients1 while reducing the surgeon’s cognitive2 and physical workload3. Robot assistance has contributed to improving surgical interventions, especially in the field of minimally invasive surgery. Consequently, robotic surgery systems have seen broad adoption. For example, the majority of prostatectomies performed in the United States in 2010 utilized robotic assistance4.

An obvious drawback of RAS is the loss of haptic sensation, depriving surgeons of a natural source of information about interaction forces5. This shortcoming has been linked to diminished surgical performance6. Several current research and development programs try to rectify this drawback by artificially restoring haptic sensation to the surgeon using master–slave robotic systems7,8,9. The proposed solutions vary greatly in terms of the methods, how haptic feedback is provided to the surgeon, and their intended fields of application10,11,12. A general distinction can be made between direct haptic feedback and haptic sensory substitution. The latter provides information on forces via auditory13 or visual14 cues, whereas direct haptic feedback, which will be the subject of this study, aims to recreate haptic impressions as naturally as possible. The most common approach is kinesthetic haptic feedback, wherein the impedance incurred on the slave side of the robotic system is mirrored back to the master side. For example, Talasaz, Trejos, and Patel15 used a motorized master controller to mirror the impedance forces picked up by sensors in the robot’s end effectors during a robot assisted suturing task. This resulted in a reduction of the forces the subjects applied and thereby lowered risks of suture breakage. Alternatively, haptic feedback is provided via vibrations of the master device16 or via actuators stimulating the user’s fingertips17. This way, the illusion of holding an object in hand is created.

RAS is most studied in the context of laparoscopic surgery18, but other paradigms also increasingly employ robotic assistance. In retina surgery, for example, robotic assistance is utilized for the high accuracy and tremor-free stability a robotic end-effector provides. The feedback force can be upscaled to allow the surgeon to sense interaction forces below the human perception threshold19. Another motivation for using robots in the operating room is to shield the surgeon from harm. In venous catheterization procedures, for example, x-rays are commonly used for live imaging of the patient’s vascular system. Haptic feedback allows the surgeon to stay outside the operation room while retaining the haptic sensation crucial for preventing vascular puncturing12.

While the benefits of haptic feedback technology seem obvious, they must be set against the incurring costs, not only monetary but also in terms of the required additional space and maintenance procedures20. Some researchers also suggest that with adequate expertise, surgeons can extract sufficient information about interaction forces during surgery from visual feedback alone. The advanced stereoscopic view of modern robotic surgery systems might allow robotic specialists to gauge interaction forces from visible tissue deformation. It can even create a faux haptic sensation, potentially rendering true haptic feedback redundant21. Conversely, other researchers found benefits of haptic feedback even for very experienced surgeons8.

It is, therefore, pivotal to provide clinics with information about the expected benefits of haptic feedback systems to allow for informed decisions and to provide researchers with information on which approaches to haptic feedback are the most promising. Existing reviews are primarily qualitative in nature22,23, focused on a single system24 or no longer reflect the current state of development25. Given the field's broad and rapidly evolving nature, an expansive systematic analysis of recent research is needed. In this meta-analysis, key performance metrics were identified, and the overall effects of haptic feedback on these metrics and potential moderating factors were determined. Hence, the present study aims not just to quantify the general benefits of haptic feedback but also their extent under specific conditions, namely, given a particular surgical task, level of users’ expertise, and type of feedback. By identifying conditions under which haptic interaction technology is particularly useful, this study provides a basis for decision-making for practitioners and researchers.

Methods

Report compilation

This meta-analysis was performed following the guidelines laid out in the PRISMA statement26. Literature was compiled using several online data libraries. The databases PubMed, IEEEXplore, and Scopus were consulted. This selection has been chosen to cover both key disciplines relevant to the subject, namely medicine (PubMed) and engineering (IEEEXplore), as well as a more generalized database in the form of Scopus to additionally cover applicable papers outside these fields. Given the enormous diversity of paradigms and measures in the field, the search aimed to generate a large number of results rather than trying to minimize the number of irrelevant results. Hence, a broad search string was chosen to compile an as complete as possible body of studies.

$$ \left( {telerobotics \;OR \;robot \;assisted \;{\text{OR}}\;{\text{ telesurgery}}} \right){ }\;{\text{AND}}\; \left( {haptics{ }\;{\text{OR }}\;force\; feedback\;{\text{ OR}}\;{ }tactile} \right) $$

The reports were compiled in September and October 2022. Papers published since 2013 were eligible, the last point in time covered in the most recent meta-analysis on haptic feedback in general RAS25. Additional criteria were defined to exclude non-peer-reviewed publications and to reduce the amount of overlap. This process yielded 1637 papers on PubMed, 1783 on IEEEXplore, and 1630 on Scopus, respectively. 996 of these 5050 results were removed as duplicates. The remaining 4054 papers were screened and filtered by the primary author (Fig. 1).

Figure 1
figure 1

Flowchart visualizing the process of sample compilation.

The following inclusion criteria were defined. The study needs to (1) be peer-reviewed, (2) report quantitative data, (3) feature a direct comparison of subjects’ performance with and without haptic feedback, (4) use a master–slave robot system with direct control, (5) be applicable in surgical contexts, (6) make use of direct haptic feedback, (7) be written in English or German. Notably, this excluded papers investigating sensory substitution27, haptic feedback in non-surgical contexts28, or the benefits of haptic feedback for training29. Filtering by these criteria eliminated most search results, leaving N = 115 papers, all of which could be retrieved in full and were consequently subjected to further scrutiny.

The following quality criteria were defined: (1) At least n = 2 subjects per study group, (2) The paper had to account for potential learning effects by counterbalancing or randomizing the order of trials, and (3) No authors participated as test subjects. Nine papers had to be excluded as they failed to meet these criteria. Additionally, 14 papers had been excluded because, upon closer inspection, their compliance with inclusion criteria became dubious. For example, technologically immature systems that would require substantial modification before surgical application were excluded30. 36 papers had to be excluded because of lacking data reporting. The final sample consisted of 56 primary studies with n = 768 individual subjects producing k = 174 observations. A complete list of included primary studies can be found in Table 1.

Table 1 Summary list of studies included in the meta-analysis.

Data extraction

To calculate effect sizes, means and standard deviations had to be extracted. Where available, these metrics were taken directly from the publications. Otherwise, they were retrieved by the primary author by measuring graphics and figures provided in the primary studies using open-source image manipulation software (GIMP 2.10). This method ensured accuracy on the level of a single pixel and, thereby, the numerical value presented by a pixel. Where this was not possible either, effect sizes were estimated based on sample size and p-values31. When papers failed to report precise p-values, the authors of the primary studies were contacted with a request for the data measures in question. If the necessary data could not be retrieved this way either, the study had to be excluded.

Statistical methods

All statistical analyses were performed in “Comprehensive Meta-Analysis Version 4” (CMA, Biostat Inc, Englewood, New Jersey, USA). The following performance metrics have been extracted from the primary studies: (1) Force applied (further divided into average force and peak force), with smaller forces being considered favorable as high forces are associated with increased tissue damage32. (2) Time required, with shorter times being considered desirable. (3) Accuracy, to gauge precision, several sub-measures were combined: deviation from a target angle, deviation from a target point, and deviation from a target path. Lower deviation is desirable; (4) Success rates; in the present data set, this measure chiefly refers to correct tissue identification in palpation tasks.

Since it can be expected that true effect sizes vary between primary studies, a random effects model was chosen. All measurements have been valence-coded such that higher values indicate more desirable outcomes (i.e., shorter times, lower applied forces, less deviation, and higher success rates). The cutoff for statistical significance was set to p = 0.05. Hedges’ g was chosen to calculate effect sizes rather than the more common Cohen’s d since the latter tends to overestimate the effect size, especially for small sample sizes31. Hedges’ g can thus be seen as more conservative as it adjusts Cohen’s d with the factor J.

$$ J = 1 - \frac{3}{4df - 1} $$

The size of effects can be interpreted analogously to Cohen’s d. That is, effect sizes larger than 0.8 are considered large. They are immediately evident to observers and practitioners and make a meaningful difference in clinical practice. On the other hand, effect sizes below 0.2 are considered irrelevant even if they are statistically significant31.

Heterogeneity was assessed with the Q-statistic, I2-statistic, and the prediction interval. Following recommendations by Medina et al.33 it is essential to differentiate these measures. The Q-statistic functions analogously to the F-statistic in primary studies in that it tests against the null hypothesis that all effects in the sample are of equal size. A significant Q-value indicates that not all observed variety between effect sizes results from random error, meaning the effect sizes are heterogeneous. Furthermore, the Q-statistic can be used to test whether the effect sizes observed in different subgroups significantly differ. I2 quantifies which percentage of the observed variety represents the heterogeneity of the true effect sizes. Lastly, the prediction interval (PI) gives the range of effect sizes 95% of individual effects in a comparable population are expected to fall into. It is similar to, but has to be differentiated from, the confidence interval (CI), which estimates the range the median effect falls into. The confidence interval is reported as well. In this study, the conventional 95% confidence interval was chosen.

Subgroup analysis

Based on the reviewed literature, the following categorical variables have been determined as potential moderators and extracted: Level of subjects’ expertise, experimental task, type of haptic feedback, and the distinction between virtual fixtures and contact forces. These potential moderators will be briefly described in the following:

Level of expertise

(1) “Inexperienced” denotes subjects with no prior surgical training. (2) “Novices” denotes subjects with medical training but no or limited, i.e., no more than a year of practical experience in the tested procedure. (3) “Experts” denotes subjects with extensive multiyear expertise on the specific procedure tested in the experiment.

Task

Since surgeries often consist of various tasks that partly overlap between procedures, the investigated surgeries were broken down into more elemental tasks for this meta-analysis. The following tasks were extracted from the reviewed body of literature: (1) Catheterization: insertion of venous catheters into the vascular system; (2) Grasping: tasks consisting of grasping, holding, or pulling primarily of tissue, (3) Insertion: puncturing of tissue with needles and/or trocars, (4) Laser: cutting and/or sectioning of tissue involving surgical lasers, (5) Palpation: physical Investigation of tissue with the robot’s end effector, (6) Scan: Investigation of tissue via additional means such as ultrasound scans, (7) Section: cutting and/or carving of tissue, (8) Suturing: knot-tying and stitching of tissue, and (9) Tracing: following a line on the surface of a tissue sample.

Feedback type

The different types of feedback employed by researchers to restore haptic sensation have been categorized as follows: (1) Kinesthetic feedback: application of a force vector to the master side of the robot mirroring the impedance offered by an object in contact with the slave side. (2) Vibrotactile feedback: provision of information to the user via the vibration of either a master controller or an externally worn device. (3) Cutaneous feedback: provision of haptics via the motion of actuators against the operator’s fingertips, creating an illusion of holding an object. (4) Combined: any combination of the above.

Contact vs. virtual fixtures

(1) Contact forces: Feedback is given about contact with physical objects at the robotic site. (2) Virtual fixture: Feedback is given in response to virtual points, which have been defined pre-operatively in relation to either the patient’s body (e.g., a virtual point close to sensible tissue) or the posture of the robot (e.g., a certain degree on the slave-robot’s joints).

In addition to the moderators, we list the relevant surgical fields in the table below. While several studies are potentially applicable to multiple fields, an intended field of application can be noted for each paper. The following surgical fields were considered in the primary studies: (1) Cardiovascular surgery, (2) Laparoscopic surgery, (3) Neurosurgery, (4) Oncology, (5) Ophthalmology, and (6) Oncology.

Results

Figures 2, 3 and 4 show the effect sizes, their variances, and overall prediction and confidence intervals. Forest plots have been created in CMA. Note that effect sizes in the plots are natively rather than valence-coded due to software limitations.

Figure 2
figure 2

Effects on haptic feedback on applied force.

Figure 3
figure 3

Effects on haptic feedback on completion time.

Figure 4
figure 4

Effects of haptic feedback on accuracy.

Effects on applied force

Main effects

Average forces

Based on the analysis of k = 52 study groups comprised of a total of n = 249 individuals, a strong effect of haptic feedback on the average forces was found (Hedges’ g = 0.83, CI 0.63–1.03). The Q-test indicates that effect sizes varied substantially between studies (Q(51) = 135.95, p < 0.001). 62% of that difference in effect size is accounted for by the heterogeneity of the true effect rather than random error (I2 = 0.62).

Peak force

K = 37 study groups with a total of n = 172 individual subjects were analyzed. A moderate effect on peak forces was found (g = 0.69, CI 0.51–0.87). True effect sizes differed in the population (Q(36) = 51.7, p = 0.043). I2 = 30% of the observed difference can be attributed to the difference in true effect size.

Combined force

The effect size for combined average and maximum applied forces was also calculated. For study groups that reported both outcomes, combined means were used in line with the recommendations of Borenstein et al.31. k = 61 Study groups were analyzed. The overall effect on force was large (g = 0.83, CI 0.66–1.01). True effect sizes in the population varied (Q(60) = 147.20, p < 0.001), and this variance accounts for I2 = 59% of the observed variance.

Moderation

Only combined force was considered for moderation effects to achieve a reasonable number of observations per cell. Only 46 study groups could be analyzed for expertise level due to lacking data reports in the primary studies. Significant differences (Q(2) = 13.05, p < 0.001) between levels of subjects’ expertise were found. Significant effects were found for inexperienced (g = 0.98) and novice subjects (g = 0.84) but not for experts.

Force data was reported for seven different tasks, and significant effect size differences (Q(6) = 16.9, p < 0.001) emerged, with effect sizes being the largest for catheterization tasks (g = 1.72). Only for section tasks, no significant effect of haptic feedback was found.

No study using vibrotactile feedback reported data for applied forces. Effect sizes were different between subgroups (Q(2) = 17.92, p < 0.001), being largest for combined (g = 1.39) and smallest for cutaneous feedback (g = 0.36). As Virtual Fixtures aim to provide haptic feedback before contact occurs, no comparison between Virtual Fixtures and Contact Forces can be made. For a full list of effects, refer to Table 2.

Table 2 Effect size of haptic feedback on applied forces overall and across subgroups.

Effects on completion time

Main effects

An analysis of n = 372 subjects across k = 49 study groups found a large effect (g = 0.83, CI 0.57–1.09). As for forces, the effects on time showed a significant heterogeneity (Q(48) = 229.10, p < 0.001), accounting for I2 = 79% of the observed variance.

Moderation

36 observations reported subjects’ expertise. Significant effect size differences (Q(2) = 37.46, p < 0.001) between levels of expertise were found, with time reduction only manifesting for inexperienced subjects (g = 1.21).

For the eight different tasks, effect sizes differed significantly (Q(7) = 10.98, p < 0.001). Significant effects were found for catheterization (g = 2.07), grasping (g = 1.30), insertion (g = 0.53) and palpation (g = 0.69).

No significant effect size difference between feedback types was found. Similarly, contact force rendering and virtual fixtures provided significant time savings but did not differ significantly. Full results are reported in Table 3.

Table 3 Effect size of haptic feedback on completion time across subgroups.

Effects on accuracy

Main effects

K = 25 study groups with n = 192 individuals were analyzed. Of these 25 observations, 13 had natively reported point-deviation, 7 path-deviation, 4 degree-deviation, and 1 had already natively aggregated these measures (see Fig. 4). An overall effect of g = 1.50 (CI 1.07–1.92) was found. The true effect sizes varied in the population (Q(24) = 145.94, p < 0.001). The variance of the true effect accounted for I2 = 0.83 of the observed variance.

For 20 study groups, expertise was reported, but no effect size difference between levels of expertise was found. Interestingly, the observed effect was highest for experts (g = 2.78), but the sample was very small, with only k = 2 expert groups that reported accuracy measures.

Accuracy measures for seven tasks were reported. A significant effect for haptic feedback was found for every task except for suturing. However, the size of the effects did not differ significantly from each other.

Comparing feedback types, significant effects were found for every type except combined feedback. Again, the number of groups in this category was rather small (k = 2). No significant effect size difference between groups was found.

A comparison of the effect sizes between Virtual Fixtures and Contact Forces showed significant effects of haptic feedback for both groups. There was no significant difference in effect sizes. A full list of results can be found in Table 4.

Table 4 Effect size of haptic feedback on accuracy across subgroups.

Success rates

Lastly, success rates showed, based on the analysis of k = 15 study groups with n = 181 subjects, a large effect of g = 0.80 (CI 0.24–1.35). Substantial heterogeneity was present (Q(14) = 71.33, p < 0.001), which accounted for I2 = 8037% of the observed variance. The prediction i3dnterval is PI − 1.36 to 2.96. Due to the small number of observations for this measure, most of which fall under the same paradigm (i.e., palpation), this measure had to be excluded from subgroup analysis.

Publication bias

To assess a potential publication bias, the classic Fail-Safe-N was calculated for force, time, and accuracy measures. It indicates how many studies with an effect size of zero would have to be found for the meta-analysis to no longer be significant on a level of p < 0.0580. The test returned a value of N = 9627. This means that 9627 hypothetical unpublished or otherwise missed papers with no effect for haptic feedback had to exist to invalidate the findings of this analysis via publication bias. Although it is important to acknowledge the limitations of this measure, given the large number, potential bias can be deemed too weak to fundamentally alter results. A visual inspection of the funnel plot (see Appendix 1) indicates no strong bias either. If bias is present in the sample, the analysis can be expected to react robustly.

Discussion

To quantify the effect of haptic feedback on surgical performance, the results of 56 primary studies from 2013 onwards were aggregated using meta-analytical methods. Significant overall effects were found for all investigated measures: applied forces, completion time, accuracy, and success rates. However, the present study goes beyond a simple analysis of the main effect. To identify the specific conditions under which the benefits of haptic feedback are largest, the moderating effects of expertise, task, and feedback type were investigated via subgroup analyses. The subject’s experience and the specific nature of the task were identified to have the greatest influence on how beneficial haptic feedback is.

Large overall effects were found across paradigms and measures. The effects found in the present study were larger than those described by Weber and Eichberger25 in an earlier meta-analysis. In particular, the present study's effects on completion time (g = 0.83 vs. g = 0.22) and accuracy (g = 1.5 vs. g = 0.69) were substantially larger. This is likely to reflect both technological progress and an application of haptic feedback to new surgical procedures (e.g., catheterization tasks). This underscores haptic feedback's continued and expanding role in medical practice.

A strong effect on applied forces was found overall and in most subgroups, with a notable exception being experts. This finding is of critical importance as some research suggests higher rates of patient tissue damage during RAS6, which has been linked to higher interaction forces79. This may be the most considerable benefit of haptic feedback for robot-assisted procedures. Furthermore, the effects were largest when combined feedback was employed, suggesting it is a fruitful avenue of development.

Effects on completion times were mixed, with no significant effects of haptic feedback for tracing, suturing, and laser surgery tasks. This is also in line with inconsistent findings of previous meta-analyses: Nitsch and Färber80, e.g., found a significant effect for completion time (g = 0.75), whereas Weber and Schneider81 did not report such an effect. This might be because the added guidance gives users the security necessary for some tasks to complete the task quickly. However, for other tasks, this might be canceled out by the haptic feedback, prompting a more cautious behavior. This is corroborated by the higher accuracy and reduced force application found in the present study.

Effects on Accuracy were overall largest and crucially the only metric on which significant effects for experts were found. Due to the very small number of studies in this subgroup (k = 2), this finding should be interpreted cautiously.

Overall, the level of expertise emerged as an important moderator, with effects generally smaller for experts. This is consistent with expectations among practitioners and researchers suggesting that long-term practitioners of RAS develop techniques to partially compensate for the loss of haptic via their practice82. However, whereas, e.g., Hagen et al.21 suggest that even beginning surgeons can compensate for missing haptic feedback by visual cues alone, this meta-analysis found substantial effects for novices. Combined with results suggesting that haptic feedback enhances surgeons' training6, it can be constituted that especially less experienced surgeons benefit from haptic feedback systems.

This study found that haptic feedback benefits depend on the task demands, with the strongest effects for catheterization tasks and the weakest effects for sectioning tasks. Catheterization is a relatively new field of applications for haptic feedback. During vascular catheterization, the surgeon has to rely on low-fidelity vision provided by MRT or X-ray scans while performing a task that requires extreme caution to prevent injuring the patient. Based on the strong positive effects on completion time and force application found in this review, it can be predicted that haptic feedback will play a crucial role in this intervention. Especially considering the use of x-ray imaging is a strong motivator for reduced patient exposure times and the fact that instrument contact forces well below one Newton can lead to vascular puncturing83.

Benefits on all measures were found for grasping and palpation tasks. This is an important finding considering the high prevalence of laparoscopic interventions among RAS, where these tasks play a crucial role22. For suturing—another common laparoscopic task—only force regulation strongly benefitted from haptic feedback. This is, however, crucial to prevent the common problem of suture breakage. Many researchers have argued that the widespread adoption of haptic feedback systems would greatly benefit this robot assisted laparoscopy4,20. The present study substantiates this argument.

Insertion tasks showed substantially reduced forces and completion time and improved accuracy when haptic feedback was available. In non-robot-assisted surgery, the shear forces incurred by different layers of tissue are the primary means for surgeons to guide the correct insertion of instruments10. Artificial haptic feedback restores this method for RAS.

Tracing, a task common in retinal surgery, showed strong effects for force and accuracy but not for time. Given that in retina surgery, time is usually a less critical metric due to the absence of time-constraining circumstances like heavy blood loss, these results are nonetheless encouraging.

For the other tasks (sectioning, laser surgery, and scanning), results were comparatively mixed but still overall positive (except for sectioning tasks). For these tasks, advancements in stereoscopic 3D vision and visual feedback may already provide enough guidance for surgeons, as suggested by the findings of Weber and Schneider80.

That no form of haptic feedback emerged clearly superior contrasts with the findings of Weber and Eichberger25, who found larger effect sizes for kinesthetic feedback than for vibrotactile feedback. The studies of the present meta-analysis skewed heavily away from the latter modality, with only k = 11 observations for pure vibrotactile feedback. The analysis by Weber and Eichberger25 included a total of k = 55 observations for this feedback type. This discrepancy in data may explain the different findings. Furthermore, this indicates a trend away from pure vibrotactile feedback in favor of kinesthetic and combined feedback. Future studies could integrate the data of several meta-analyses on the topic to further investigate the differences.

No significant effect size difference was found between haptic feedback and virtual fixtures. There is the potential of an interaction here precluding an effect, as a look into the studies revealed that virtual fixtures were more commonly combined with vibrotactile feedback and force rendering more commonly with kinesthetic feedback (see Table 1).

In fact, a deeper three-way interaction between measure, task, and feedback can be conceived. For example, vibrotactile feedback might be more effective in reducing completion time than kinesthetic feedback, specifically for palpation tasks31. The present study, however, lacks the power to test for such interactions.

Another potential limitation of this analysis is that several related sub-measures had to be aggregated as each sub-measure by itself had too few observations per factor level to allow for subgroup analysis. Combining similar outcomes to organize and investigate large data sets is a core purpose of meta-analyses, but in doing so, the comparability of data must be considered31. Since peak and average forces just constitute two different data points of the same measurement, comparability can be readily assumed. In the case of deviation, it can be noted that the different sub-measures are inherently linked. During a needle insertion task, the surgeon may have to guide the instrument’s tip to an exact location to administer a substance precisely and avoid unnecessary tissue damage36. In this example, deviation from this target point and deviation from the ideal angle of insertion are intrinsically linked, and combining them can, therefore, be considered valid.

Lastly, there is the potential that the image-based retrieval mentioned in the method section, made necessary by lacking direct reporting in the primary studies, is less accurate than direct numerical retrieval. A total of 33 observations in the final study were retrieved this way. To test whether this retrieval method has skewed the data, the effect size of all image-retrieved outcomes was compared to the effect size of all numerically retrieved outcomes. If the image-based retrieval leads to comparable reliability as direct numerical retrieval, no difference in effect sizes should exist. Indeed, no significant difference in effect sizes was found (Q(1) = 0.11, p = 0.74). It was thus concluded that the retrieval method yielded sufficient reliability, and image-retrieved data was included in the analysis.

Conclusion

With the benefits of haptic feedback firmly established and application fields where its impact is especially strongly identified, research can focus on these fields to investigate further factor interactions and compare different forms of haptic feedback. In particular, a comparison between direct haptic feedback and haptic substitution to identify in which scenarios one is preferable over the other appears to be a fruitful next step (Table 5).

Table 5 Summary of most important results.

In summary, providing haptic feedback was found to have desirable effects across task and feedback paradigms. The effects are diminished for practitioners with high levels of expertise but remain descriptively present. Reduced interaction forces and completion times have great potential to limit tissue damage and blood loss during the operation and thereby vastly improve patient safety. These findings can serve as a basis to inform equipment acquisition and future research directions.