Robotic surgery represents a fundamental innovation in health care that is designed to enhance the quality of care for patients. Puma 560 was the first surgical robot applied in a clinical setting to obtain neurosurgical biopsies in 1985 [1]. The authors concluded that the robot contributed to improved accuracy. Since then, increasingly advanced surgical robots have been developed to assist in a rapidly expanding range of operative procedures and anatomical targets (Fig. 1). The drivers for continuous innovation stem from the potential to offer greater operative precision that may translate into enhanced clinical outcomes and the accompanying background of corporate revenues within the healthcare technology sector.

Fig. 1
figure 1

Timeline demonstrating selected events in the history and development of surgical robots

To achieve these goals, current robotic platforms are designed to incorporate advanced features, such as, (i) dexterous capability with accompanying instrumentation, (ii) augmented visualisation, (iii) improved stability, (iv) natural coordination, (v) accurate cutting capacity, (vi) reliable execution, and (vii) enhanced surgeon ergonomics. These features can theoretically increase surgical precision by rendering difficult operative tasks easier to perform safely. Moreover, surgical robots have retained the capacity to enable surgery through smaller incisions. Collectively, these characteristics aim to enhance outcomes beyond that achievable through conventional operative methods.

The adoption and diffusion of robotic surgery demonstrate a positive trend in some geographical areas, particularly for advanced economies. This can be illustrated by the prominent application of the da Vinci® Surgical System (dVSS; Intuitive Surgical Inc., Mountain View, Sunnyvale, California, USA), which has US Food and Drug Administration (FDA) clearance across a multitude of specialties [2], demonstrating its greatest exposure for urological and gynaecological procedures [3]. For example, more than half of radical prostatectomies and about a third of benign hysterectomies are already performed robotically in the USA [3, 4].

Despite offering some elements of innovative technology, the necessary evidence to justify the expanding investment in robotic surgery remains ambiguous. Whilst the concept of robotic surgery is almost universally favoured, its widespread promotion across all healthcare sectors requires robust justification, not least because it can be very costly [5]. Studies comparing outcomes of robotic surgery with conventional approaches for specific robots and procedures are certainly not scarce. However, the systematic assessment of robotic surgery collectively as a single entity has not been performed. As we approach the end of the third decade following the pioneering use of the first surgical robot, an overview of this innovation may be useful for understanding the adoption of innovations in health care.

The aim of this comprehensive systematic review and meta-analysis was to draw evidence from comparative studies in robotic surgery, regardless of specialty and procedure type, and irrespective of patient age and sex. We avoided the biases of retrospective studies that dominate the literature by focussing only on randomised controlled trials (RCTs) and non-randomised prospective studies. In comparing potentially very heterogeneous studies, we emphasised a methodology that identified the proportional benefit of robotic surgical outcomes compared with controls in each study. This offered internal consistency from each study. We were then able to calculate a pooled proportional benefit for specific robotic surgical outcomes for all studies.

In this review, we evaluated core perioperative variables as our primary outcomes. These were (i) blood loss, (ii) blood transfusion rate, (iii) operative time, (iv) length of hospital stay, and (v) 30-day overall complication rate. In robotic surgical studies, these perioperative variables were most commonly addressed. Analyses were performed separately for robotic versus open surgery (OS) and robotic versus minimally invasive surgery (MIS). As a secondary outcome, we calculated the proportion of studies that demonstrated adequate statistical power for the evaluation of these clinical outcomes.

Materials and methods

This review was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement [6].

Inclusion and exclusion criteria

We defined surgery as any interventional procedure involving alteration in anatomy and that either requires a skin (or mucosal) incision or puncture. Patients requiring surgery for which a robotic approach was a feasible alternative approach to OS or MIS were included. There was no age or sex restriction. Controls were eligible only if patients underwent surgery and no robotic assistance was provided. RCTs and prospective studies that addressed one or more core perioperative surgical outcomes (blood loss, blood transfusion rate, operative time, length of hospital stay, and 30-day overall complication rate) were included. For operative time, we included studies that explicitly defined it as starting from skin incision to skin closure (for intravascular procedures, we used procedure time, which was generally defined as time from first venous puncture to sheath withdrawal at the end of the procedure). Whilst this measure does not represent the total theatre occupation time, it was selected to improve comparability because operative time was variedly defined in the literature.

We excluded studies where surgical robots were used for stereotactic, endoscopic, or single-incision laparoscopic surgery. Robotic instrument positioners without concurrent use of other robotic instrumentation tools were also excluded, as were innovations that are generally not considered robotic technology, such as remote magnetic catheter navigation and pure computer navigation systems. We also discounted studies with historical controls that preceded the robotic arm considerably (that is, greater than a year) as well as those that retrospectively reviewed and analysed prospective databases. Laboratory studies involving synthetic models, animals, or cadavers were not considered.

Search methodology

Using the OvidSP search engine, the MEDLINE, EMBASE, and PsycINFO databases were searched on 2 September 2013 with the terms: robot* (tw) AND [intervention* (tw) OR surg* (tw)]. The same search terms were used to search the ClinicalTrials.gov registry to identify potentially relevant trials. On 26 May 2014, these trials were reviewed to identify any relevant published data. To avoid losing generally older papers which had used the term computer-assisted instead of robot, we also performed a search on 7 October 2013 with the terms: [surgery, computer-assisted (MESH, exp) OR computer-assisted surg* (tw) OR computer-aided surg* (tw)] AND [intervention* (tw) OR surg* (tw)]. Studies from 1990 to the search dates were included. There was no language restriction. Relevant review articles, including health technology assessments, found through our search strategy were also hand-searched to identify any remaining studies.

Data collection and analysis

Study selection

Articles were screened from titles and abstracts by three authors independently (AT, SM, and AS). Potentially relevant articles that appear to fit the inclusion and exclusion criteria were obtained in full text. These were independently assessed for eligibility by the same authors. Articles were excluded if they had duplicate or incomplete data, or if they were only available in abstract form. Any disagreement was resolved through discussion with a senior author (HA).

Dealing with duplicate publications

If several articles reported outcomes from a single study, the article with the most comprehensive results (most number of patients and/or most recent publication) was included. If this article failed to report outcomes that were otherwise available in the duplicate article, then the additional data from the duplicate article were included.

Data extraction

One author (AT) extracted data into an Excel 2011 database (Microsoft Corp., Redmond, Washington, USA), which were then reviewed independently by three authors (SM, AS, and HA). For each article, the year of publication, study design, total number of patients, number of patients in each arm, robot and control type, baseline characteristics, and results of outcome measures of interest were extracted. For continuous outcomes, we extracted the mean and standard deviation (or if unavailable, the median and standard error, range, or interquartile range). For categorical outcomes, we recorded the number of events.

Risk of bias assessment

Three authors (AT, SM, and AS) independently assessed the risk of bias of eligible articles. Quality of articles with more than one study was assessed on their overall methodology. The Cochrane risk of bias tool [7] was applied to RCTs. Seven key domains were assessed: method of random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessors, completeness of outcome data, selective reporting, and other potential sources of bias. Based on a set of listed criteria, each domain was judged to have either a low, high, or unclear risk of bias. If a study had unclear or high risk of bias for one or more key domains, then it was classified as having, respectively, an unclear or high risk of bias overall. If instead all the key domains had low bias risk, then the study was judged to have a low risk of bias overall [7].

For prospective studies, the Newcastle–Ottawa scale (NOS) [8] was used for quality scoring. The NOS judges studies on three categories: the selection of the study groups (comprising four numbered items: representativeness of exposed cohort, selection of non-exposed cohort, ascertainment of exposure, demonstration that outcomes were not present at start of study), the comparability of the groups (comprising one numbered item: comparability of cohorts on basis of study design or analysis), and outcomes (comprising three numbered items: assessment of outcome, appropriateness of length of follow-up, adequacy of follow-up of cohorts). From a set of listed criteria, a maximum of one star can be awarded for each numbered item, except for comparability where a maximum of two stars can be awarded. The possible NOS score ranges from 0 to 9 stars. We classified studies with ≥7 stars as “higher” quality and <7 stars as “lower” quality.

Risk of bias assessment was made at the level of outcomes. We assessed perioperative outcomes together as a class [7, 9]. If a study addressed several perioperative outcomes, the risk of bias for a particular domain was judged based on the outcome that was most affected by the study methodology. Any disagreement with risk of bias assessment was resolved through discussion with a senior author (HA).

Data synthesis and statistical methods

Meta-analysis was based on control type, that is, either robotic versus OS or robotic versus MIS. Wherever possible, we used results from intention-to-treat analyses. Continuous outcomes were analysed by calculating the ratio of means (RoM) for each study, with expression of uncertainty of each result represented by the 95 % confidence intervals (CI) [10]. We substituted median for mean in studies where only the median was reported. When the calculated RoM was 1, computation was not possible. Consequently, these results were excluded. Categorical outcomes were analysed using risk ratio (RR) with 95 % CI [7]. Studies reporting categorical outcomes with no events in both the robotic and control groups were excluded, as their effect sizes were not computable. We performed meta-analysis if two or more separate studies were available. The inverse-variance, random-effects model of DerSimonian and Laird [11] was used for both continuous and categorical outcomes. This was accomplished using Stata 13 (StataCorp., College Station, Texas, USA). Sensitivity analysis on RCTs was also performed. The I 2 statistic was used to estimate the degree of heterogeneity between studies, where larger values indicate increasing heterogeneity [12].

Post hoc power analysis (significant at the 5 % level, two-tailed t test) was conducted for all eligible studies using the G*Power 3.1 programme [13]. Power was calculated for large (d = 0.8), medium (d = 0.5), and small (d = 0.2) effect sizes. We defined adequate statistical power as >80 %. We also identified studies with clearly specified primary outcomes and where power analysis was performed to determine the required sample size for adequate assessment of these outcomes.

Results

Search results

A total of 43,132 articles were identified from the databases. This included 104 trials from the ClinicalTrials.gov registry, of which one [14] was subsequently found to contain relevant published data. After removing duplicates, 28,574 articles were screened based on their titles and abstracts. Of these, 1702 potentially relevant full-text articles were retrieved for further evaluation. We found 97 articles that met the inclusion criteria. Two additional articles were identified through hand-searching. In total, 99 articles, involving 14,448 patients overall, were included in this review (Fig. 2).

Fig. 2
figure 2

Flow chart of included studies. *Some articles contained more than one comparison or study (see text). OS open surgery, MIS minimally invasive surgery, RCT randomised controlled trial

Description of included studies

Of the included articles, all but one [15] investigated outcomes in adult patients. Overall, there were 31 and 68 articles, respectively, that were based on RCT and non-randomised prospective comparative designs. They encompassed a wide range of specialties and procedures (Tables 1, 2). Some articles comprised more than one comparison or study [1623].

Table 1 Studies comparing robotic versus open surgery
Table 2 Studies comparing robotic versus minimally invasive surgery

Robotic versus OS

For robotic versus OS, there were 50 studies (11 RCTs and 39 prospective studies) (Table 1). The year of publication ranged from 1998 to 2013. In total, there were 5910 and 4237 patients in the robotic and OS groups, respectively. The smallest and largest sample sizes were 14 and 1738, respectively. The surgical robots used in these studies were the dVSS, Zeus® Robotic Surgical System (ZRSS; Computer Motion Inc., Santa Barbara, California, USA), ROBODOC® Surgical System (Curexo Technology Corp., Fremont, California, USA), Acrobot® Surgical System (The Acrobot Co. Ltd., London, UK), CASPAR system (OrtoMaquet, Rastatt, Germany), and SpineAssist® (Mazor Robotics Ltd., Caesarea, Israel).

Robotic versus MIS

For robotic versus MIS, there were 58 studies (21 RCTs and 37 prospective studies), which were published between 2001 and 2014 (Table 2). Taking into account all studies, the robotic and MIS groups consisted of 1991 and 2310 patients, respectively. Sample sizes ranged from 12 to 390. The surgical robots used were the dVSS, ZRSS, Mona (Intuitive Surgical), and Sensei® Robotic Catheter System (Hansen Medical Inc., Mountain View, California, USA).

Risk of bias assessment

All included articles were assessed for the quality of their methodology. Of note, all 31 RCT articles suffered from a high risk of bias because they all showed a high risk of bias in the performance bias domain (Fig. 3). This was primarily due to the lack of surgeon blinding, which is unlikely to be possible in clinical trials of robotic surgery. As perioperative outcomes are especially vulnerable to performance bias, this risk was judged to be high. The subject of patient blinding, which is difficult in surgical trials but potentially feasible [24], was frequently unaddressed or unreported by authors. Most RCTs showed low risk of attrition bias, with complete perioperative outcome data. In many trials, however, the risk of bias related to sequence generation, allocation concealment, blinding of outcome assessor, and selective reporting was unclear, as sufficient information was not available due to poor reporting.

Fig. 3
figure 3

Risk of bias graphs of randomised controlled trials comparing robotic versus open surgery (above) and robotic versus minimally invasive surgery (below)

Of 68 articles of non-randomised prospective design, 55 (80.9 %) were of “higher” quality (Tables 3, 4). All prospective studies met the criteria for ascertainment of exposure, absence of outcome at the start of study, outcome assessment, and duration of follow-up. Most prospective studies also selected their control cohort from the same community as the robotic cohort and showed adequate follow-up. Many of them suffered from poor comparability, as expected from the lack of randomisation where selection bias is a caveat. In some cases, the representativeness of the robotic cohort in the community was felt not be adequate.

Table 3 Risk of bias of non-randomised prospective comparative cohort studies comparing robotic versus open surgery based on the Newcastle–Ottawa scale
Table 4 Risk of bias of non-randomised prospective comparative cohort studies comparing robotic versus minimally invasive surgery based on the Newcastle–Ottawa scale

Meta-analyses of perioperative surgical outcomes

  1. (i)

    Blood loss

Robotic versus OS

There were six RCT [2530] and 23 prospective [16, 17, 20, 3149] studies reporting on blood loss, giving a total of 29 studies overall. Meta-analysis demonstrated blood loss in the robotic arm to be 50.5 % of that in the OS arm (Fig. 4). This reduction was significant (95 % CI 0.408–0.602). There was high heterogeneity in the results (I 2 = 98.0 %). Sensitivity analysis on RCTs showed reduction in blood loss, but this was no longer significant (pooled RoM: 0.807, 95 % CI 0.563–1.051, I 2 = 96.3 %).

Fig. 4
figure 4

Forest plots of blood loss; robotic versus open surgery (above), robotic versus minimally invasive surgery (below)

Robotic versus MIS

Twenty-two studies reported blood loss as an outcome measure. Of these, six were RCT studies [14, 5054] and 16 were prospective studies [16, 17, 20, 5567]. Meta-analysis of these studies confirmed a significant reduction in blood loss in favour of robotic surgery, which was 85.3 % of that experienced by patients in the MIS arm (95 % CI 0.736–0.969) (Fig. 4). The heterogeneity was high (I 2 = 98.2 %). Sensitivity analysis performed on RCTs and, however, revealed a non-robust result (pooled RoM: 0.830, 95 % CI 0.653–1.008, I 2 = 95.9 %).

  1. (ii)

    Blood transfusion rate

Robotic versus OS

Blood transfusion rate was investigated in two RCT [26, 27] and 16 prospective [17, 19, 3136, 38, 42, 45, 49, 6871] studies. Forty-two of 2127 patients (2.0 %) in the robotic group needed blood transfusion compared with 249 of 1869 patients (13.3 %) in the open group. One study [27] was excluded from quantitative synthesis, as its effect size was not computable. Meta-analysis of the remaining 17 studies demonstrated the risk of blood transfusion with robotic surgery to be 27.2 % of that of OS. This reduction in favour of robotic surgery was significant (95 % CI 0.165–0.449). The results showed moderate heterogeneity (I 2 = 55.2 %). Sensitivity analysis on RCTs was not done, as only one study was available. In this RCT, no significant difference in blood transfusion requirement was demonstrated (RR 0.800, 95 % CI 0.400–1.600) [26].

Robotic versus MIS

Six RCT [21, 51, 7274] and ten prospective [17, 55, 5963, 67, 75] studies reported blood transfusion requirement. Taking all these studies together, 4.2 % (33/789) of patients who underwent robotic intervention compared with 6.5 % (56/856) of MIS patients received blood transfusion. Computation of valid RR was not possible in three studies [21, 51, 63], hence their exclusion from meta-analysis. From the remaining 13 studies, we demonstrated a significant reduction in the requirement for blood transfusion in patients who underwent robotic surgery compared with MIS (pooled RR 0.621, 95 % CI 0.390–0.988). The heterogeneity was low (I 2 = 0.0 %). Nevertheless, the result of sensitivity analysis on RCTs was inconsistent (pooled RR 1.329, 95 % CI 0.325–5.438, I 2 = 0.0 %).

  1. (iii)

    Operative time (skin-to-skin)

Robotic versus OS

Sixteen studies assessed operative time. These comprised three RCT [26, 76, 77] and 13 prospective [17, 19, 20, 23, 34, 36, 41, 42, 4547, 49] studies. Meta-analysis showed robotic surgery to increase operative time by 7.3 %, which was significant (95 % CI 1.022–1.124). High heterogeneity was found (I 2 = 91.8 %). Sensitivity analysis on RCTs showed a consistent result (pooled RoM: 1.162, 95 % CI 1.016–1.308, I 2 = 86.8 %).

Robotic versus MIS

Operative time was investigated by 12 RCT [21, 50, 53, 54, 73, 7883] and 18 prospective [15, 17, 19, 20, 22, 58, 75, 8493] studies. There was a significant prolongation of operative time by 13.5 % over MIS when surgical robots were utilised (95 % CI 1.096–1.173). Heterogeneity was high (I 2 = 92.3 %). When only RCTs were considered in a sensitivity analysis, the result remained robust (pooled RoM: 1.202, 95 % CI 1.119–1.286, I 2 = 87.1 %).

  1. (iv)

    Length of hospital stay

Robotic versus OS

Thirty studies compared length of hospital stay between robotic and open interventions. There were 4 RCT [25, 26, 28, 77] and 26 prospective [1620, 32, 3437, 3943, 4547, 49, 68, 69, 71, 9496] studies. The result for one study [26] was not computable. Meta-analysis of the remaining 29 studies revealed length of stay for patients who underwent robotic surgery to be 69.5 % of those who underwent OS. This decrease was significant (95 % CI 0.615–0.774). Heterogeneity was high (I 2 = 98.5 %). In contrast, when only RCTs were considered, the improvement in length of stay was lost (pooled RoM: 1.038, 95 % CI 0.878–1.197, I 2 = 89.4 %).

Robotic versus MIS

Length of hospital stay was addressed by 40 studies, of which 13 were RCT [5054, 74, 78, 80, 81, 97100] and were prospective [1620, 5557, 5966, 75, 85, 89, 91, 101105] studies. Ten studies [16, 20, 50, 52, 57, 91, 97, 100, 104, 105] were excluded from meta-analysis, as their effect sizes were not computable. Meta-analysis of the remaining 30 studies showed no significant difference in duration of stay (pooled RoM: 0.982, 95 % CI 0.936–1.027). High heterogeneity was noted (I 2 = 93.4 %). Sensitivity analysis on RCTs remained robust (pooled RoM: 1.001, 95 % CI 0.955–1.047, I 2 = 80.2 %).

  1. (v)

    Overall complication rate (30 day)

Robotic versus OS

Overall complications were compared in nine RCT [25, 26, 2830, 76, 106108] and 28 prospective [16, 17, 19, 20, 32, 3437, 39, 41, 43, 4547, 49, 6870, 9496, 109113] studies. From these studies, the overall complication rate was 11.6 % (515/4453) in the robotic arm compared with 21.4 % (693/3245) in the open arm. Results from three studies [29, 96, 109] did not allow for computable RRs. From the remaining 34 studies, meta-analysis demonstrated a significant decrease in overall complication rate in favour of robotic surgery, which was 63.7 % of that with OS (95 % CI 0.483–0.838). High heterogeneity was present (I 2 = 81.9 %). Sensitivity analysis on RCTs was, however, inconsistent with the primary analysis (pooled RR 1.090, 95 % CI 0.631–1.881, I 2 = 59.9 %).

Robotic versus MIS

Forty-eight studies investigated complications. There were 18 RCT [14, 21, 5054, 72, 74, 78, 80, 81, 83, 97100] and 30 prospective [1517, 19, 20, 5566, 75, 8487, 89, 91, 92, 101, 104, 105] studies. Taking all these studies into consideration, the overall complication rate in the robotic arm was 16.1 % (288/1789) compared with 15.7 % (317/2025) in the MIS arm. Valid effect sizes in the form of RR were not producible from results of nine studies [15, 52, 66, 78, 83, 84, 87, 104, 105]. Meta-analysis involving the remaining 39 studies demonstrated no significant difference in overall complication rate between robotic and MIS (pooled RR 0.988, 95 % CI 0.822–1.188). Heterogeneity was low (I 2 = 23.0 %). When sensitivity analysis was performed on RCTs, the result remained robust (pooled RR 1.187, 95 % CI 0.851–1.654, I 2 = 15.4 %).

Results of our meta-analyses are summarised in Fig. 5.

Fig. 5
figure 5

Pooled proportional change in perioperative outcomes for robotic versus open surgery and robotic versus minimally invasive surgery, with 95 % confidence interval. RoM ratio of means, RR risk ratio, OS open surgery, MIS minimally invasive surgery. *Significant effect

Post hoc power analyses

With respect to RCT studies, for large effect sizes, just 17 [14, 2730, 51, 53, 54, 7274, 76, 77, 80, 82, 107, 108] of 32 studies (53.1 %) had adequate statistical power (that is, power >80 %). This fell to four studies [27, 28, 76, 107] (12.5 %) for medium effect sizes. For small effect sizes, no RCT study had adequate power.

Analysis of the 76 prospective studies revealed that just 47 [1620, 23, 3137, 39, 41, 43, 47, 49, 55, 57, 59, 61, 64, 65, 6771, 75, 85, 86, 88, 89, 91, 9496, 101, 111113] of them (61.8 %) had adequate power for outcome evaluation, assuming large effect sizes. For medium effect sizes, 20 studies [3133, 3537, 41, 43, 59, 61, 6770, 75, 85, 86, 95, 111, 112] (26·3 %) were sufficiently powered. Only three studies [33, 95, 111] (4.2 %) had adequate power for small effect sizes.

The lack of statistical power in many studies is not surprising given that in only 16 RCT (50 %) and six prospective (7.9 %) studies were primary outcomes clearly defined and a priori power analysis performed (Table 5). Furthermore, only a handful of these studies [51, 54, 73, 80, 82, 85] were powered to the outcomes investigated in this review.

Table 5 Studies with clearly defined primary outcomes and where power analysis was undertaken a priori

Results of post hoc power analyses for individual studies are presented in Tables 1 and 2.

Discussion

The term “disruptive innovation” represents a process where a product establishes itself at the bottom of a market and climbs through this sector to displace competitors [114]. Initial characteristics of a disruptive innovation model include: (i) simpler products and services, (ii) smaller target markets, and (iii) lower gross margins. As a result, these innovations can “create space” at the bottom of the market to allow new disruptive competitors to emerge. Currently in the field of robotic surgery, the promise of simplicity has yet to be translated into daily practice. Furthermore, the evidence regarding cost efficacy and gross margins has been poorly documented so that decisions regarding the adoption of robotic surgery remain controversial.

However, to disregard robotic surgery completely as an unfulfilled promised in its 30 years of existence may be imbalanced. Our meta-analyses of all RCTs and prospective studies to date, regardless of specialty and procedure type, revealed a decrease in blood loss and blood transfusion rate with robotic surgery when compared with both OS and MIS. Additionally, comparison against OS demonstrated a reduction in length of hospital stay and overall complication rate in favour of robotic surgery.

The ability of robotic surgery to reduce blood loss and need for blood transfusion may be attributed to its advanced features, which could improve surgical precision. This would be important in avoiding injury to vessels and other structures that can cause unintended bleeding. The additional benefits of robotic surgery over OS, in the form of shorter length of hospital stay and fewer complications, may partly be due to its capacity for minimal access. These benefits have been demonstrated in conventional minimally invasive surgical procedures [115118], where the positive effect of reduced tissue trauma has been implicated [118]. Given its added features, the inability of robotic surgery to achieve improved length of stay and complication rate over MIS can be considered surprising. This may be reflective of the status that surgical robots have not yet exceeded their effects beyond those of conventional minimally invasive platforms for these outcomes. Alternatively, these outcomes may be inadequate markers for accurately capturing the increased precision of robotic surgery. More sensitive assessment tools of precision are advocated in future trials, which might include video appraisal of intraoperative tissue handling, errors, and efficiency [52, 105].

When RCTs were analysed separately, the proportional benefits of robotic surgery were lost. Given their higher level of evidence, these RCTs may be considered as more representative of the true population effect, although they are limited by a profound lack of numbers. We identified only 31 clinical RCTs on robotic surgery, which is a fraction (0.1 %) of the 28,574 potentially relevant articles. Many RCTs failed to clearly define primary outcomes and perform a priori power analysis, which led to inadequate sample sizes and hence, statistical power necessary for outcome evaluation. Through post hoc analyses, we showed that just over half of all RCTs were adequately powered to detect a true difference in outcomes for large effect sizes. For smaller effect sizes, this deficiency, inevitably, was further amplified. These findings are probably related to common barriers in undertaking successful surgical RCTs, including ethical issues, challenging patient recruitment and randomisation due partly to lack of equipoise, learning curve, inexperience in designing trials, inadequate medical statistical knowledge, problematic long-term follow-up, and insufficient funding and resources [24, 119]. Furthermore, difficulty in blinding is a major methodological barrier [120, 121]. Consequently, all included RCTs were considered to suffer from a high risk of performance bias, and accordingly, a high risk of bias overall [7]. Together, these factors could explain the non-robust results.

The demonstration of longer operative time with robotic surgery contradicts its proposed aims of facilitating operative tasks that would otherwise be difficult to perform efficiently with conventional tools. One possible explanation is the requirement for additional steps in their deployment. For example, docking is needed for surgical robots such as the dVSS [73, 80]. Hardware issues could also explain the longer operative time, as surgical robotic instruments may be cumbersome to place or switch efficiently, or may be insufficiently adapted for the specific purpose [78, 80, 81, 97].

The surgical learning curve has implications on our findings. Before study commencement, individual surgeons have typically performed far fewer robotic cases than conventional ones [51, 53, 54, 72, 73, 107]. This disparity could disadvantage robotic surgery due to relatively less familiarity. This could further explain the prolonged operative time of robotic surgery. Nevertheless, our demonstration of at least equivalent outcomes for other perioperative variables may be regarded as a favourable effect of robotic surgery. By allowing achievement of similar or better outcomes despite the relative lack of user experience, surgical robots may be important in facilitating training and attainment of competences. Furthermore, many surgeons may view surgical robots as an “enabling technology”, without which it would not be possible for them to perform certain complex minimally invasive procedures [122]. Pure laparoscopic radical prostatectomy, which demonstrates significant technical challenges, is an example of a procedure where robotic assistance in suturing and other laparoscopic tasks is important [123]. Although robotic surgery needs to demonstrate more than just equivalent patient outcomes to be cost-effective due to its substantial costs, its potential positive effects on surgeon ability must also be considered.

This systematic review has some limitations. Our focus on blood loss, blood transfusion rate, operative time, length of hospital stay, and complications was based primarily on the fact that these were the most commonly reported outcomes in the robotic surgery literature. However, these standard parameters may not fully demonstrate the true value of robotic surgery, especially when the overall benefits are not always clearly perceptible in the short term. Utilisation of dedicated research parameters should be encouraged [124]. Already, there is an increasing inclination towards such parameters that are probably more relevant, including functional, oncological, and quality of life outcomes, specific anatomical–pathological endpoints (such as nerve damage control), and ergonomics. With continuing improvement in outcome parameter selection by clinical research teams, future evidence synthesis centred on these parameters may better reflect the added value of robotic surgery.

Our appraisal of robotic surgery through an exclusively clinical viewpoint has also meant that other elements of innovation evaluation could not be incorporated into our conclusions. These include the impact of surgical robotics on intellectual property and patent generation, resource management, healthcare leadership, mentorship, training, cost efficacy, marketing strategy, business strategy, and stakeholder value generation.

When meta-analyses were possible, the heterogeneity was frequently high. However, this is not unexpected given the wide variability in patient cohorts and interventions. There was additional variability within specific procedures. For instance, Nissen [50, 78, 84, 9799], Toupet [84], Dor [101], and Thal [15] fundoplications were variant techniques performed in different studies. Furthermore, the extent of robotic assistance varied from its utilisation in anastomotic suturing only [103] to totally robotic procedures [21, 22, 82, 83, 8588, 92, 100]. Methodological diversity in the form of different study designs and risks of bias also contributed to the heterogeneity.

We incorporated different surgical robots in our review, including those that are no longer in use, such as the ZRSS. However, our intention was not to compare outcomes of specific procedures obtainable through currently available robots but to evaluate, via an overview of commonly addressed perioperative outcomes, whether the goals of robotic surgery in general have been achieved. Hence, we offered a unique perspective on robotic surgery by covering the 30 years of its existence. Accordingly, we also elected not to stratify our analysis based on robot or procedure type. Consequently, this restricts the applicability of this review, so that the individual stakeholder interested in outcomes for a specific intervention may not be able to draw sufficiently relevant evidence from our results.

Prospective studies were included to address the paucity of RCTs. Although practical, their inclusion inevitably introduces other biases associated with this study design. Moreover, caution is advised in the interpretation of complication data, as there were inconsistencies in their reporting. Many authors failed to comply with the quality criteria [125] for complication reporting. There was also a lack of agreement in terms of what constitutes complications, such as with regard to blood transfusion and conversion. Nevertheless, this issue is not unique to our included studies [126, 127]. Additionally, studies on robotic surgery continue to suffer from several methodological flaws, including a lack of studies that offer multiple endpoint analysis [128] in such a complex field.

The Society of American Gastrointestinal and Endoscopic Surgeons [122] and European Association of Endoscopic Surgeons [124] consensus statements on robotic surgery have also highlighted the lack of high-quality data in evaluating the health outcomes of this technology. Upcoming research efforts should improve on current methodological deficiencies. The implementation of outcome registries for robotic surgery is important to document and compare benefits and harms and in identifying the direction for future development [122]. More robust controlled trials should be undertaken, particularly in areas where robotic surgery has shown some potential, such as complex hepatobiliary surgery, bariatric and upper gastrointestinal revisional surgery, gastric and oesophageal cancer surgery, rectal surgery, and surgery for large adrenal masses [124].

Conclusions

After the promising pioneering clinical application of PUMA 560 in 1985, the stage was set for robotic surgery to assume the role of a significant disruptive innovation in health care. Three decades on, our analysis across a wide range of surgical robots identified their overall positive contribution in reducing blood loss and blood transfusion rate over OS and MIS. Additionally, against OS, they showed overall proportional improvement in length of hospital stay and overall complication rate. These beneficial effects were lost when only RCTs were appraised, although these RCTs were themselves limited. Longer operative time was a common caveat. Further well-conducted surgical trials are needed to confirm these findings. Whilst the barriers for these trials may seem insurmountable, solutions to overcoming them are now increasingly recognised. These may involve ensuring protocol transparency, improving trial dissemination, creating specialised trial units, establishing dedicated outcome monitoring groups, implementing appropriate minimum surgeon experience to reduce the impact of learning curves, and incorporating research training in the surgical curriculum [119]. To ensure better outcomes for future robotic surgery, a multidisciplinary approach during product development involving close collaboration between surgeons and engineers, in addition to inclusive patient engagement, is mandatory. With the advent of more affordable, enriching technologies can be modularly incorporated into conventional surgical approaches such as intraoperative fluorescence imaging, high-definition 3-D visualisation, wristed endoscopic hand tools, and navigation systems, robotic surgery risks degenerating into an unfulfilled promise if it fails to innovate in line with stakeholders’ needs.