Robotic surgery: disruptive innovation or unfulfilled promise? A systematic review and meta-analysis of the first 30 years

Background Robotic surgery has been in existence for 30 years. This study aimed to evaluate the overall perioperative outcomes of robotic surgery compared with open surgery (OS) and conventional minimally invasive surgery (MIS) across various surgical procedures. Methods MEDLINE, EMBASE, PsycINFO, and ClinicalTrials.gov were searched from 1990 up to October 2013 with no language restriction. Relevant review articles were hand-searched for remaining studies. Randomised controlled trials (RCTs) and prospective comparative studies (PROs) on perioperative outcomes, regardless of patient age and sex, were included. Primary outcomes were blood loss, blood transfusion rate, operative time, length of hospital stay, and 30-day overall complication rate. Results We identified 99 relevant articles (108 studies, 14,448 patients). For robotic versus OS, 50 studies (11 RCTs, 39 PROs) demonstrated reduction in blood loss [ratio of means (RoM) 0.505, 95 % confidence interval (CI) 0.408–0.602], transfusion rate [risk ratio (RR) 0.272, 95 % CI 0.165–0.449], length of hospital stay (RoM 0.695, 0.615–0.774), and 30-day overall complication rate (RR 0.637, 0.483–0.838) in favour of robotic surgery. For robotic versus MIS, 58 studies (21 RCTs, 37 PROs) demonstrated reduced blood loss (RoM 0.853, 0.736–0.969) and transfusion rate (RR 0.621, 0.390–0.988) in favour of robotic surgery but similar length of hospital stay (RoM 0.982, 0.936–1.027) and 30-day overall complication rate (RR 0.988, 0.822–1.188). In both comparisons, robotic surgery prolonged operative time (OS: RoM 1.073, 1.022–1.124; MIS: RoM 1.135, 1.096–1.173). The benefits of robotic surgery lacked robustness on RCT-sensitivity analyses. However, many studies, including the relatively few available RCTs, suffered from high risk of bias and inadequate statistical power. Conclusions Our results showed that robotic surgery contributed positively to some perioperative outcomes but longer operative times remained a shortcoming. Better quality evidence is needed to guide surgical decision making regarding the precise clinical targets of this innovation in the next generation of its use.

augmented visualisation, (iii) improved stability, (iv) natural coordination, (v) accurate cutting capacity, (vi) reliable execution, and (vii) enhanced surgeon ergonomics. These features can theoretically increase surgical precision by rendering difficult operative tasks easier to perform safely. Moreover, surgical robots have retained the capacity to enable surgery through smaller incisions. Collectively, these characteristics aim to enhance outcomes beyond that achievable through conventional operative methods.
The adoption and diffusion of robotic surgery demonstrate a positive trend in some geographical areas, particularly for advanced economies. This can be illustrated by the prominent application of the da Vinci Ò Surgical System (dVSS; Intuitive Surgical Inc., Mountain View, Sunnyvale, California, USA), which has US Food and Drug Administration (FDA) clearance across a multitude of specialties [2], demonstrating its greatest exposure for urological and gynaecological procedures [3]. For example, more than half of radical prostatectomies and about a third of benign hysterectomies are already performed robotically in the USA [3,4].
Despite offering some elements of innovative technology, the necessary evidence to justify the expanding investment in robotic surgery remains ambiguous. Whilst the concept of robotic surgery is almost universally favoured, its widespread promotion across all healthcare sectors requires robust justification, not least because it can be very costly [5]. Studies comparing outcomes of robotic surgery with conventional approaches for specific robots and procedures are certainly not scarce. However, the systematic assessment of robotic surgery collectively as a single entity has not been performed. As we approach the end of the third decade following the pioneering use of the first surgical robot, an overview of this innovation may be useful for understanding the adoption of innovations in health care.
The aim of this comprehensive systematic review and meta-analysis was to draw evidence from comparative studies in robotic surgery, regardless of specialty and procedure type, and irrespective of patient age and sex. We avoided the biases of retrospective studies that dominate the literature by focussing only on randomised controlled trials (RCTs) and non-randomised prospective studies. In comparing potentially very heterogeneous studies, we emphasised a methodology that identified the proportional benefit of robotic surgical outcomes compared with controls in each study. This offered internal consistency from each study. We were then able to calculate a pooled proportional benefit for specific robotic surgical outcomes for all studies.
In this review, we evaluated core perioperative variables as our primary outcomes. These were (i) blood loss, (ii) blood transfusion rate, (iii) operative time, (iv) length of hospital stay, and (v) 30-day overall complication rate. In robotic surgical studies, these perioperative variables were most commonly addressed. Analyses were performed separately for robotic versus open surgery (OS) and robotic versus minimally invasive surgery (MIS). As a secondary outcome, we calculated the proportion of studies that demonstrated adequate statistical power for the evaluation of these clinical outcomes.

Materials and methods
This review was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement [6].

Inclusion and exclusion criteria
We defined surgery as any interventional procedure involving alteration in anatomy and that either requires a skin (or mucosal) incision or puncture. Patients requiring surgery for which a robotic approach was a feasible alternative approach to OS or MIS were included. There was no age or sex restriction. Controls were eligible only if patients underwent surgery and no robotic assistance was provided. RCTs and prospective studies that addressed one or more core perioperative surgical outcomes (blood loss, blood transfusion rate, operative time, length of hospital stay, and 30-day overall complication rate) were included. For operative time, we included studies that explicitly defined it as starting from skin incision to skin closure (for intravascular procedures, we used procedure time, which was generally defined as time from first venous puncture to sheath withdrawal at the end of the procedure). Whilst this measure does not represent the total theatre occupation time, it was selected to improve comparability because operative time was variedly defined in the literature. We excluded studies where surgical robots were used for stereotactic, endoscopic, or single-incision laparoscopic surgery. Robotic instrument positioners without concurrent use of other robotic instrumentation tools were also excluded, as were innovations that are generally not considered robotic technology, such as remote magnetic catheter navigation and pure computer navigation systems. We also discounted studies with historical controls that preceded the robotic arm considerably (that is, greater than a year) as well as those that retrospectively reviewed and analysed prospective databases. Laboratory studies involving synthetic models, animals, or cadavers were not considered.

Search methodology
Using the OvidSP search engine, the MEDLINE, EMBASE, and PsycINFO databases were searched on 2 September 2013 with the terms: robot* (tw) AND [intervention* (tw) OR surg* (tw)]. The same search terms were used to search the ClinicalTrials.gov registry to identify potentially relevant trials. On 26 May 2014, these trials were reviewed to identify any relevant published data. To avoid losing generally older papers which had used the term computer-assisted instead of robot, we also performed a search on 7 October 2013 with the terms: [surgery, computer-assisted (MESH, exp) OR computer-assisted surg* (tw) OR computer-aided surg* (tw)] AND [intervention* (tw) OR surg* (tw)]. Studies from 1990 to the search dates were included. There was no language restriction. Relevant review articles, including health technology assessments, found through our search strategy were also hand-searched to identify any remaining studies.

Study selection
Articles were screened from titles and abstracts by three authors independently (AT, SM, and AS). Potentially relevant articles that appear to fit the inclusion and exclusion criteria were obtained in full text. These were independently assessed for eligibility by the same authors. Articles were excluded if they had duplicate or incomplete data, or if they were only available in abstract form. Any disagreement was resolved through discussion with a senior author (HA).

Dealing with duplicate publications
If several articles reported outcomes from a single study, the article with the most comprehensive results (most number of patients and/or most recent publication) was included. If this article failed to report outcomes that were otherwise available in the duplicate article, then the additional data from the duplicate article were included.

Data extraction
One author (AT) extracted data into an Excel 2011 database (Microsoft Corp., Redmond, Washington, USA), which were then reviewed independently by three authors (SM, AS, and HA). For each article, the year of publication, study design, total number of patients, number of patients in each arm, robot and control type, baseline characteristics, and results of outcome measures of interest were extracted. For continuous outcomes, we extracted the mean and standard deviation (or if unavailable, the median and standard error, range, or interquartile range). For categorical outcomes, we recorded the number of events.

Risk of bias assessment
Three authors (AT, SM, and AS) independently assessed the risk of bias of eligible articles. Quality of articles with more than one study was assessed on their overall methodology. The Cochrane risk of bias tool [7] was applied to RCTs. Seven key domains were assessed: method of random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessors, completeness of outcome data, selective reporting, and other potential sources of bias. Based on a set of listed criteria, each domain was judged to have either a low, high, or unclear risk of bias. If a study had unclear or high risk of bias for one or more key domains, then it was classified as having, respectively, an unclear or high risk of bias overall. If instead all the key domains had low bias risk, then the study was judged to have a low risk of bias overall [7].
For prospective studies, the Newcastle-Ottawa scale (NOS) [8] was used for quality scoring. The NOS judges studies on three categories: the selection of the study groups (comprising four numbered items: representativeness of exposed cohort, selection of non-exposed cohort, ascertainment of exposure, demonstration that outcomes were not present at start of study), the comparability of the groups (comprising one numbered item: comparability of cohorts on basis of study design or analysis), and outcomes (comprising three numbered items: assessment of outcome, appropriateness of length of follow-up, adequacy of follow-up of cohorts). From a set of listed criteria, a maximum of one star can be awarded for each numbered item, except for comparability where a maximum of two stars can be awarded. The possible NOS score ranges from 0 to 9 stars. We classified studies with C7 stars as ''higher'' quality and \7 stars as ''lower'' quality.
Risk of bias assessment was made at the level of outcomes. We assessed perioperative outcomes together as a class [7,9]. If a study addressed several perioperative outcomes, the risk of bias for a particular domain was judged based on the outcome that was most affected by the study methodology. Any disagreement with risk of bias assessment was resolved through discussion with a senior author (HA).

Data synthesis and statistical methods
Meta-analysis was based on control type, that is, either robotic versus OS or robotic versus MIS. Wherever possible, we used results from intention-to-treat analyses. Continuous outcomes were analysed by calculating the ratio of means (RoM) for each study, with expression of uncertainty of each result represented by the 95 % confidence intervals (CI) [10]. We substituted median for mean in studies where only the median was reported. When the calculated RoM was 1, computation was not possible. Consequently, these results were excluded. Categorical outcomes were analysed using risk ratio (RR) with 95 % CI [7]. Studies reporting categorical outcomes with no events in both the robotic and control groups were excluded, as their effect sizes were not computable. We performed meta-analysis if two or more separate studies were available. The inverse-variance, random-effects model of DerSimonian and Laird [11] was used for both continuous and categorical outcomes. This was accomplished using Stata 13 (StataCorp., College Station, Texas, USA). Sensitivity analysis on RCTs was also performed. The I 2 statistic was used to estimate the degree of heterogeneity between studies, where larger values indicate increasing heterogeneity [12].
Post hoc power analysis (significant at the 5 % level, two-tailed t test) was conducted for all eligible studies using the G*Power 3.1 programme [13]. Power was calculated for large (d = 0.8), medium (d = 0.5), and small (d = 0.2) effect sizes. We defined adequate statistical power as [80 %. We also identified studies with clearly specified primary outcomes and where power analysis was performed to determine the required sample size for adequate assessment of these outcomes.

Search results
A total of 43,132 articles were identified from the databases. This included 104 trials from the ClinicalTrials.gov registry, of which one [14] was subsequently found to contain relevant published data. After removing duplicates, 28,574 articles were screened based on their titles and abstracts. Of these, 1702 potentially relevant full-text articles were retrieved for further evaluation. We found 97 articles that met the inclusion criteria. Two additional articles were identified through hand-searching. In total, 99 articles, involving 14,448 patients overall, were included in this review (Fig. 2).

Description of included studies
Of the included articles, all but one [15] investigated outcomes in adult patients. Overall, there were 31 and 68 articles, respectively, that were based on RCT and nonrandomised prospective comparative designs. They encompassed a wide range of specialties and procedures (Tables 1, 2). Some articles comprised more than one comparison or study [16][17][18][19][20][21][22][23].

Robotic versus OS
For robotic versus OS, there were 50 studies (11 RCTs and 39 prospective studies) ( Table 1). The year of publication ranged from 1998 to 2013. In total, there were 5910 and 4237 patients in the robotic and OS groups, respectively. The smallest and largest sample sizes were 14 and 1738, respectively. The surgical robots used in these studies were the dVSS, Zeus Ò Robotic Surgical System (ZRSS; Computer Motion Inc., Santa Barbara, California, USA), ROBODOC Ò Surgical System (Curexo Technology Corp., Fremont, California, USA), Acrobot Ò Surgical System (The Acrobot Co. Ltd., London, UK), CASPAR system (OrtoMaquet, Rastatt, Germany), and SpineAssist Ò (Mazor Robotics Ltd., Caesarea, Israel).

Risk of bias assessment
All included articles were assessed for the quality of their methodology. Of note, all 31 RCT articles suffered from a high risk of bias because they all showed a high risk of bias in the performance bias domain (Fig. 3). This was primarily  due to the lack of surgeon blinding, which is unlikely to be possible in clinical trials of robotic surgery. As perioperative outcomes are especially vulnerable to performance bias, this risk was judged to be high. The subject of patient blinding, which is difficult in surgical trials but potentially feasible [24], was frequently unaddressed or unreported by authors. Most RCTs showed low risk of attrition bias, with complete perioperative outcome data. In many trials, however, the risk of bias related to sequence generation, allocation concealment, blinding of outcome assessor, and selective reporting was unclear, as sufficient information was not available due to poor reporting. Of 68 articles of non-randomised prospective design, 55 (80.9 %) were of ''higher'' quality (Tables 3, 4). All prospective studies met the criteria for ascertainment of exposure, absence of outcome at the start of study, outcome assessment, and duration of follow-up. Most prospective studies also selected their control cohort from the same community as the robotic cohort and showed adequate follow-up. Many of them suffered from poor comparability, as expected from the lack of randomisation where selection bias is a caveat. In some cases, the representativeness of the robotic cohort in the community was felt not be adequate.

Post hoc power analyses
With respect to RCT studies, for large effect sizes, just 17 [14, 27- The lack of statistical power in many studies is not surprising given that in only 16 RCT (50 %) and six prospective (7.9 %) studies were primary outcomes clearly defined and a priori power analysis performed (Table 5). Furthermore, only a handful of these studies [51,54,73,80,82,85] were powered to the outcomes investigated in this review.
Results of post hoc power analyses for individual studies are presented in Tables 1 and 2.

Discussion
The term ''disruptive innovation'' represents a process where a product establishes itself at the bottom of a market and climbs through this sector to displace competitors [114]. Initial characteristics of a disruptive innovation model include: (i) simpler products and services, (ii) smaller target markets, and (iii) lower gross margins. As a result, these innovations can ''create space'' at the bottom of the market to allow new disruptive competitors to emerge. Currently in the field of robotic surgery, the promise of simplicity has yet to be translated into daily practice. Furthermore, the evidence regarding cost efficacy Table 4 continued and gross margins has been poorly documented so that decisions regarding the adoption of robotic surgery remain controversial. However, to disregard robotic surgery completely as an unfulfilled promised in its 30 years of existence may be imbalanced. Our meta-analyses of all RCTs and prospective studies to date, regardless of specialty and procedure type, revealed a decrease in blood loss and blood transfusion rate with robotic surgery when compared with both OS and MIS. Additionally, comparison against OS demonstrated a reduction in length of hospital stay and overall complication rate in favour of robotic surgery.

References
The ability of robotic surgery to reduce blood loss and need for blood transfusion may be attributed to its advanced features, which could improve surgical precision. This would be important in avoiding injury to vessels and other structures that can cause unintended bleeding. The additional benefits of robotic surgery over OS, in the form of shorter length of hospital stay and fewer complications, may partly be due to its capacity for minimal access. These benefits have been demonstrated in conventional minimally invasive surgical procedures [115][116][117][118], where the positive effect of reduced tissue trauma has been implicated [118]. Given its added features, the inability of robotic surgery to achieve improved length of stay and complication rate over MIS can be considered surprising. This may be reflective of the status that surgical robots have not yet exceeded their effects beyond those of conventional minimally invasive platforms for these outcomes. Alternatively, these outcomes may be inadequate markers for accurately capturing the increased precision of robotic surgery. More sensitive assessment tools of precision are advocated in future trials, which might include video appraisal of intraoperative tissue handling, errors, and efficiency [52,105].
When RCTs were analysed separately, the proportional benefits of robotic surgery were lost. Given their higher level of evidence, these RCTs may be considered as more representative of the true population effect, although they are limited by a profound lack of numbers. We identified only 31 clinical RCTs on robotic surgery, which is a fraction (0.1 %) of the 28,574 potentially relevant articles. Many RCTs failed to clearly define primary outcomes and perform a priori power analysis, which led to inadequate sample sizes and hence, statistical power necessary for outcome evaluation. Through post hoc analyses, we showed that just over half of all RCTs were adequately powered to detect a true difference in outcomes for large effect sizes. For smaller effect sizes, this deficiency, inevitably, was further amplified. These findings are probably related to common barriers in undertaking successful surgical RCTs, including ethical issues, challenging patient recruitment and randomisation due partly to lack of equipoise, learning curve, inexperience in designing trials, inadequate medical statistical knowledge, problematic long-term follow-up, and insufficient funding and resources [24,119]. Furthermore, difficulty in blinding is a major methodological barrier [120,121]. Consequently, all included RCTs were considered to suffer from a high risk of performance bias, and accordingly, a high risk of bias overall [7]. Together, these factors could explain the nonrobust results.
The demonstration of longer operative time with robotic surgery contradicts its proposed aims of facilitating operative tasks that would otherwise be difficult to perform efficiently with conventional tools. One possible explanation is the requirement for additional steps in their deployment. For example, docking is needed for surgical robots such as the dVSS [73,80]. Hardware issues could also explain the longer operative time, as surgical robotic instruments may be cumbersome to place or switch efficiently, or may be insufficiently adapted for the specific purpose [78,80,81,97]. The surgical learning curve has implications on our findings. Before study commencement, individual surgeons have typically performed far fewer robotic cases than conventional ones [51,53,54,72,73,107]. This disparity could disadvantage robotic surgery due to relatively less familiarity. This could further explain the prolonged operative time of robotic surgery. Nevertheless, our demonstration of at least equivalent outcomes for other perioperative variables may be regarded as a favourable effect of robotic surgery. By allowing achievement of similar or better outcomes despite the relative lack of user experience, surgical robots may be important in facilitating training and attainment of competences. Furthermore, many surgeons may view surgical robots as an ''enabling technology'', without which it would not be possible for them to perform certain complex minimally invasive procedures [122]. Pure laparoscopic radical prostatectomy, which demonstrates significant technical challenges, is an example of a procedure where robotic assistance in suturing and other laparoscopic tasks is important [123]. Although robotic surgery needs to demonstrate more than just equivalent patient outcomes to be cost-effective due to its substantial costs, its potential positive effects on surgeon ability must also be considered.
This systematic review has some limitations. Our focus on blood loss, blood transfusion rate, operative time, length of hospital stay, and complications was based primarily on the fact that these were the most commonly reported Utilisation of dedicated research parameters should be encouraged [124]. Already, there is an increasing inclination towards such parameters that are probably more relevant, including functional, oncological, and quality of life outcomes, specific anatomical-pathological endpoints (such as nerve damage control), and ergonomics. With continuing improvement in outcome parameter selection by clinical research teams, future evidence synthesis centred on these parameters may better reflect the added value of robotic surgery. Our appraisal of robotic surgery through an exclusively clinical viewpoint has also meant that other elements of innovation evaluation could not be incorporated into our conclusions. These include the impact of surgical robotics on intellectual property and patent generation, resource management, healthcare leadership, mentorship, training, cost efficacy, marketing strategy, business strategy, and stakeholder value generation.
When meta-analyses were possible, the heterogeneity was frequently high. However, this is not unexpected given the wide variability in patient cohorts and interventions. There was additional variability within specific procedures. For instance, Nissen [50,78,84,[97][98][99], Toupet [84], Dor [101], and Thal [15] fundoplications were variant techniques performed in different studies. Furthermore, the extent of robotic assistance varied from its utilisation in anastomotic suturing only [103] to totally robotic procedures [21, 22, 82, 83, 85-88, 92, 100]. Methodological diversity in the form of different study designs and risks of bias also contributed to the heterogeneity.
We incorporated different surgical robots in our review, including those that are no longer in use, such as the ZRSS. However, our intention was not to compare outcomes of specific procedures obtainable through currently available robots but to evaluate, via an overview of commonly addressed perioperative outcomes, whether the goals of robotic surgery in general have been achieved. Hence, we offered a unique perspective on robotic surgery by covering the 30 years of its existence. Accordingly, we also elected not to stratify our analysis based on robot or procedure type. Consequently, this restricts the applicability of this review, so that the individual stakeholder interested in outcomes for a specific intervention may not be able to draw sufficiently relevant evidence from our results.
Prospective studies were included to address the paucity of RCTs. Although practical, their inclusion inevitably introduces other biases associated with this study design. Moreover, caution is advised in the interpretation of complication data, as there were inconsistencies in their reporting. Many authors failed to comply with the quality criteria [125] for complication reporting. There was also a lack of agreement in terms of what constitutes complications, such as with regard to blood transfusion and conversion. Nevertheless, this issue is not unique to our included studies [126,127]. Additionally, studies on robotic surgery continue to suffer from several methodological flaws, including a lack of studies that offer multiple endpoint analysis [128] in such a complex field.
The Society of American Gastrointestinal and Endoscopic Surgeons [122] and European Association of Endoscopic Surgeons [124] consensus statements on robotic surgery have also highlighted the lack of high-quality data in evaluating the health outcomes of this technology. Upcoming research efforts should improve on current methodological deficiencies. The implementation of outcome registries for robotic surgery is important to document and compare benefits and harms and in identifying the direction for future development [122]. More robust controlled trials should be undertaken, particularly in areas where robotic surgery has shown some potential, such as complex hepatobiliary surgery, bariatric and upper gastrointestinal revisional surgery, gastric and oesophageal cancer surgery, rectal surgery, and surgery for large adrenal masses [124].

Conclusions
After the promising pioneering clinical application of PUMA 560 in 1985, the stage was set for robotic surgery to assume the role of a significant disruptive innovation in health care. Three decades on, our analysis across a wide range of surgical robots identified their overall positive contribution in reducing blood loss and blood transfusion rate over OS and MIS. Additionally, against OS, they showed overall proportional improvement in length of hospital stay and overall complication rate. These beneficial effects were lost when only RCTs were appraised, although these RCTs were themselves limited. Longer operative time was a common caveat. Further well-conducted surgical trials are needed to confirm these findings. Whilst the barriers for these trials may seem insurmountable, solutions to overcoming them are now increasingly recognised. These may involve ensuring protocol transparency, improving trial dissemination, creating specialised trial units, establishing dedicated outcome monitoring groups, implementing appropriate minimum surgeon experience to reduce the impact of learning curves, and incorporating research training in the surgical curriculum [119]. To ensure better outcomes for future robotic surgery, a multidisciplinary approach during product development involving close collaboration between surgeons and engineers, in addition to inclusive patient engagement, is mandatory. With the advent of more affordable, enriching technologies can be modularly incorporated into conventional surgical approaches such as intraoperative fluorescence imaging, high-definition 3-D visualisation, wristed endoscopic hand tools, and navigation systems, robotic surgery risks degenerating into an unfulfilled promise if it fails to innovate in line with stakeholders' needs.

Compliance with ethical standards
Disclosures Alan Tan, Hutan Ashrafian, Alasdair J. Scott, Sam E. Mason, Leanne Harling, Thanos Athanasiou, and Ara Darzi have no conflicts of interest or financial ties to disclose.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://crea tivecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.