A systematic review on the effectiveness of robot-assisted minimally invasive gastrectomy

Background Robot-assisted minimally invasive gastrectomy (RAMIG) is increasingly used as a surgical approach for gastric cancer. This study assessed the effectiveness of RAMIG and studied which stages of the IDEAL-framework (1 = Idea, 2A = Development, 2B = Exploration, 3 = Assessment, 4 = Long-term follow-up) were followed. Methods The Cochrane Library, Embase, Pubmed, and Web of Science were searched for studies on RAMIG up to January 2023. Data collection included the IDEAL-stage, demographics, number of participants, and study design. For randomized controlled trials (RCTs) and long-term studies, data on intra-, postoperative, and oncologic outcomes, survival, and costs of RAMIG were collected and summarized. Results Of the 114 included studies, none reported the IDEAL-stage. After full-text reading, 18 (16%) studies were considered IDEAL-2A, 75 (66%) IDEAL-2B, 4 (4%) IDEAL-3, and 17 (15%) IDEAL-4. The IDEAL-stages were followed sequentially (2A-4), with IDEAL-2A studies still ongoing. IDEAL-3 RCTs showed lower overall complications (8.5–9.2% RAMIG versus 17.6–19.3% laparoscopic total/subtotal gastrectomy), equal 30-day mortality (0%), and equal length of hospital stay for RAMIG (mean 5.7–8.5 days RAMIG versus 6.4–8.2 days open/laparoscopic total/subtotal gastrectomy). Lymph node yield was similar across techniques, but RAMIG incurred significantly higher costs than laparoscopic total/subtotal gastrectomy ($13,423–15,262 versus $10,165–10,945). IDEAL-4 studies showed similar or improved overall/disease-free survival for RAMIG. Conclusion During worldwide RAMIG implementation, the IDEAL-framework was followed in sequential order. IDEAL-3 and 4 long-term studies showed that RAMIG is similar or even better to conventional surgery in terms of hospital stay, lymph node yield, and overall/disease-free survival. In addition, RAMIG showed reduced postoperative complication rates, despite higher costs. Supplementary Information The online version contains supplementary material available at 10.1007/s10120-024-01534-1.


Introduction
Globally, gastric cancer is the third leading cause of cancer-related mortality [1].Open gastrectomy has been the traditional surgical approach for decades, either combined with peri-operative chemotherapy or not [2][3][4].Laparoscopic gastrectomy was first introduced in 1994 to reduce surgical trauma through small incisions, resulting in less morbidity, shorter hospital stay, and improved cosmetic outcome [5].Since then, the routine use of minimally invasive gastrectomy (MIG) has rapidly gained acceptance over the years [5][6][7].Robot-assisted minimally invasive gastrectomy (RAMIG) was first described by Hashizume et al. in 2002 to overcome technical drawbacks such as limited range of motion and uncomfortable surgical positioning of conventional MIG [8].Previous systematic reviews and metaanalyses on the safety and efficacy of RAMIG concluded that RAMIG provides favorable or comparable short-term outcomes to conventional laparoscopic gastrectomy (LG) or open gastrectomy (OG) in cardia and non-cardia gastric cancer patients, including less intraoperative blood loss, shorter hospital stays, and fewer postoperative complications [9][10][11][12].In addition, similar or improved oncologic results such as total lymph node yield, radicality of resection, and mortality rates were reported.
Although RAMIG is gaining popularity, little is known about how it has been evaluated during its implementation into clinical practice.The evaluation of surgical procedures is complicated by factors related to the complexity of the surgical procedures, and surgeon-related factors including learning curve differences, and variability between hospitals [13].In addition, surgical techniques are constantly evolving and subject to change even after their broad implementation in clinical practice.As a result, it is difficult to determine the appropriate timing for evaluating surgical procedures with well-designed and conducted randomized controlled trials (RCTs) [9].
The idea, development, exploration, assessment, and long-term follow up (IDEAL) framework was developed to describe the specific study designs and reporting standards to use at different stages of the implementation of surgical procedures and medical devices and to evaluate their introduction [14].The IDEAL-stages range from stage 1 (first-in-human studies) to long-term follow-up of widely implemented techniques, stage 4.This framework aims to improve transparency, evaluation, and reporting of surgical innovations and medical devices to improve evidence-based practice [15].This systematic review examined how RAMIG was evaluated during implementation into clinical practice based on the IDEAL-framework.Additionally, the current evidence for RAMIG was reviewed based on IDEAL-3 and 4 studies.

Methods
This study protocol was prospectively registered in the online international PROSPERO database for systematic reviews under registration number CRD42022352208.This review was conducted in line with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines [16].

Search strategy
A systematic literature search was undertaken in online databases including the Cochrane Library, Embase, Pubmed, and Web of Science for studies on RAMIG published up to January 2023.The search consisted of subject headings and text words, combining terms for robotics AND gastrectomy and synonyms.Table 1 provides the complete search strategy.

Study eligibility
Studies reporting on RAMIG compared to conventional techniques (laparoscopic or open approach) for cardia and non-cardia gastric cancer treatment were included.Only articles written in the English language and containing > 10 patients were eligible for inclusion in this systematic review.Reviews, study protocols, studies without available full texts, invited commentaries, and duplicates were excluded.In addition, we excluded non-comparative studies/robot-only studies, as we were interested in the reporting and evaluation of outcomes of RAMIG compared to conventional gastric cancer surgery.

Study selection
Two researchers (LT and RdB) performed the title and abstract screening individually.In case of conflicts, disagreements were resolved through discussion to reach a consensus.The full-text screening was performed using the same approach, carried out by both researchers.

Data extraction
Both researchers (LT and RdB) performed the data extraction of included studies.The following characteristics were included: IDEAL-stage, country of origin, year of publication, study design, number of included patients, and comparative approach (RAMIG versus laparoscopic and/ or open total/subtotal gastrectomy).The IDEAL-stage of each included study was initially scored by LT and RdB using the flow diagram designed by the IDEAL collaboration (Table 2) [17].Conflicts were resolved by a third and fourth researcher with extensive experience with the IDEAL-framework (JG and MR).If papers with different research aims from the same hospital were published at different times, only the paper with the majority of patients was used to calculate the total number of patients to avoid duplications.The results from both studies were incorporated in this systematic review.For IDEAL-3 and 4 studies we predefined a selection of outcomes, including extent of surgery, intra-and postoperative outcomes, oncological outcomes, overall/disease-free survival, overall quality of life, costs, surgical experience, and impact of learning curve (Supplementary Table 1).

Data synthesis
Data were summarized in a narrative synthesis.Additionally, a schematic overview was produced for each country of origin, categorizing the included studies according to the different IDEAL-stages.In addition, over time, the corresponding IDEAL-stage of each included study was displayed graphically.No meta-analysis was performed, due to inter-study differences such as interval of postoperative complications (up to 30 days versus 90 days postoperatively), heterogeneity in treatment between patient groups (treatment with versus without neoadjuvant chemotherapy), and the variety in study designs (cohort studies versus RCT's).Where feasible, comparisons were made between clinical outcomes based on the extent of gastrectomy (total or subtotal) and surgical

Included studies
In total, 2338 records were identified in the initial search.Following the removal of 632 duplicates, 1706 abstracts were screened.Ultimately, 114 studies were included in this study after screening 353 full texts (Fig. 1).

Study characteristics IDEAL 3-and 4 studies
Of the 114 included studies, n = 21 (18%) were classified as IDEAL-3 and 4 studies.These IDEAL-

Risk of bias within IDEAL-3 and 4 studies
Among the 4 RCTs, 2 (50%) showed a risk of bias with 'some concerns' for specific domains.In one study these concerns were related to a lack of a previously published study protocol and in another study concerns were caused by performing a per-protocol analysis instead of an intention-totreat analysis excluding trial participants who did not receive their assigned intervention (Fig. 4) [21,24].In 17 N-RCTs, 6 (35%) showed a risk of bias with 'some concerns' regarding confounding bias.This was particularly the case in propensity-score matching studies, where important confounding factors such as comorbidities, neoadjuvant chemotherapy treatment, tumor size were not addressed, and/or information on data for which matching was performed was not reported in the baseline characteristics tables (Fig. 5) [28][29][30][31][32][33].Three N-RCTS (18%) were at 'high risk' of bias due to baseline differences after PSM (n = 1) or no matching (n = 2) [34][35][36].
Each IDEAL-3 RCT reported information on the surgical experience and the impact of the learning curve in their centers (Supplementary Table 2).In the Eastern IDEAL-3 studies, surgeons performed a minimum of 20, 50, or 430 RAMIG cases, and > 40 or > 300 LG cases before participating in the RCT [21][22][23].Furthermore, Ojima et al. stated that for a surgeon with extensive experience in open radical gastrectomies, the learning curve required no less than 30 cases for RAMIG [23].The 4 surgeons participating in the Brazilian RCT were highly experienced in both OG and LG, certified as console surgeons in the Da Vinci platform, and standardized their technique using laboratory swine models [24].This high-volume institution conducts over 100 surgical gastrectomies annually, and a qualified tutor was present during each performed RAMIG procedure.
The nationwide database study by Kamarajah et al., which included 30.324 patients, compared rates of textbook outcomes and survival between RAMIG, LG, and OG.The results of this extensive database study indicated that RAMIG resulted in similar or improved 5-year survival rates compared to LG or OG (median 66.4 months RAMIG versus 63.6 months LG and 42.5 months OG, p = 0.800/p = 0.006), and similar complication rates (34% RAMIG versus 35% LG and 36% OG, p = 0.319) [32].

Discussion
This systematic review examined how the adoption of RAMIG into clinical practice has proceeded based on the IDEAL-framework.Furthermore, it summarized the current evidence base regarding RAMIG through IDEAL-3 and 4 long-term follow-up studies.It took 14 years after the description of the first RAMIG procedure until the results of the first RCT comparing total/subtotal RAMIG with open total/subtotal gastrectomy were published.During the implementation of RAMIG, the different stages of the IDEAL-framework were predominantly followed in sequential order (2A-2B-3-4).Although results of IDEAL-3 RCTs are available, IDEAL-2A studies are currently still ongoing.Moreover, results of the IDEAL-3 and 4 long-term follow-up studies included in this systematic review showed that total/ subtotal RAMIG results in similar or improved outcomes compared to conventional surgery in terms of overall complications, hospitalization, lymph node yield, and overalland disease-free survival.
This systematic review reveals that the different IDEALstages were followed mainly in chronological order during RAMIG implementation worldwide.However, the initial IDEAL-3 RCT results were published in China in 2016, while subsequent IDEAL-2A studies were conducted and published in France and Japan three years later [21,43,44].Currently, both IDEAL-2B and long-term IDEAL-4 studies are being conducted simultaneously, both on the same and different continents [33,39,45,46].This publication sequence highlights the uneven progress of the IDEALstages worldwide, reflecting variations in the timing and implementation of RAMIG in various gastric cancer centers.The question arises regarding the necessity of strictly following each IDEAL-stage in chronological order when initiating RAMIG in a center, country, or continent.Given the ever-evolving nature of surgical techniques and the necessity for adaptability, it is inevitable that different IDEAL-stages in the field of RAMIG research are completed at varying times and in different continents, as observed in our findings.Remarkably, IDEAL-2B studies are repeatedly conducted worldwide, serving as a crucial precursor to advancing to an RCT or long-term follow-up study (IDEAL-3/4).IDEAL-2B assesses the safety and feasibility of performing RAMIG within a specific center based on short-term clinical outcomes and the impact of surgeons' learning curves.The generalisability of results from high-volume centers in Asia to other continents with distinct presentations of cardia and non-cardia gastric cancer is questionable.Hence, it is reasonable that IDEAL-2B studies are carried out on various continents to guarantee the safety and skill advancement of surgeons in performing RAMIG.
Previous systematic reviews and meta-analyses have shown that total/subtotal RAMIG offers similar or improved short-term outcomes compared to total/subtotal OG and/or total/subtotal LG, including reduced blood loss, fewer postoperative complications, and shorter hospital stays [9][10][11][12].These findings align with the IDEAL-4 long-term follow-up studies included in this systematic review.However, it is essential to exercise caution when interpreting these findings as the meta-analyses heavily relied on retrospective studies, leading to variations in clinical and methodological approaches across the included studies due to differences in study designs, potential confounding factors, and the risk of selection bias.In addition, some of the included studies did not specify the surgical methods for radical gastrectomy, and variety in surgical experience and proficiency of the robotic system in different surgeries existed [11].Moreover, 9 (53%) IDEAL-4 studies included in this review raised 'some concerns' or showed a 'high risk' of bias due to unaddressed confounding factors [28][29][30][31][32][33][34][35][36].
To address these methodological limitations, conducting prospective, multicenter RCTs or alternative design IDEAL-3 studies seem the next step in the IDEAL-framework to impartially compare new surgical innovations with current standard surgical therapies and minimize the risk of bias [14,15].A key consideration is the feasibility and necessity of conducting 'original' RCTs to distinguish between two surgical innovations.Indeed, challenges in design and implementation emerge when conducting surgical RCTs, especially when both surgical techniques are already established as standard practice or when patient preferences for treatments hinder achieving the required number of participants within the allocated timeframe.According to a systematic review of the characteristics of RCTs in surgery, the sample sizes in most surgical RCTs are small and they focus mainly on minor clinical events [47].Moreover, this systematic review revealed that most RCTs exhibited bias with some concerns (54.4%), a finding consistent with the 50% bias of some concerns in our study [21,24,47].The previously mentioned challenges can, consequently, lead to a lack of statistical power in RCTs, hindering their ability to provide compelling evidence.
When comparing new surgical innovations to current standard surgical therapies, reporting of patient outcomes, including short-and long-term morbidity, mortality rates, oncologic outcomes, and quality-of-life outcomes, is crucial [14,15].Despite IDEAL-3 and 4 studies showing consistency in reported outcome measures, certain outcome measures like radicality of resection and quality-of-life outcomes, such as postoperative pain, were not routinely reported or investigated.This is in line with two previous studies comparing reporting standards for robot-assisted anti-reflux surgery and robot-assisted cholecystectomy, in which overall consistency in standardized reporting of outcomes was lacking [48,49].This heterogeneity in reported outcome measures makes the robust evaluation of different surgical techniques challenging among studies.Consequently, it is difficult to demonstrate any advantages or disadvantages of a new surgical technique, over conventional techniques.However, unlike other studies on reporting standards for other robotic surgical procedures, in the current study we found that the progression of the different IDEAL-stages and the associated reporting standards were in chronological order.This suggests that the IDEAL-framework was followed in the adoption of RAMIG, allowing for a comparison between RAMIG and conventional surgery.
Many studies outlined the potential technical benefits of robotic surgery, such as offering surgeons a three-dimensional high definition, tenfold magnified stable camera view, tremor suppression, improved ergonomics, and increased surgeon's freedom of movement due to the articulated wrist instruments.The potential benefits to patients, and improved ergonomics for surgeons, form the theoretical basis for the superiority of RAMIG.However, limited data exist on the impact of RAMIG on patients' quality of life and surgeons' ergonomics.A report by the Netherlands Healthcare Institute, which focused on how to implement new surgical innovations in the future, pointed out that there is a divergence between the viewpoints presented in studies on robotic surgery for gastric cancer and the actual outcomes reported in practice [50].The report emphasizes the need to involve all stakeholders in agreeing on the evaluation process and the outcome measures to be assessed before implementing new surgical innovations.These outcome measures should extend beyond traditional metrics like complications to include societal factors such as total cost.Additionally, 'softer' outcome measures like patient satisfaction, surgeon career satisfaction, and ergonomic benefits should also be considered.Moreover, surgeons who perform robot-assisted gastrectomies indicated that although they feel that this technique has added value, the measured 'hard' outcomes often do not demonstrate this added value in a statistically significant difference.In addition, the use of the robot during surgery may improve surgeon's long-term sustainable employability, by preventing physical discomfort and fatigue, and work-related risk of musculoskeletal disorders related to laparoscopic surgery [51][52][53].These shortcomings highlight the importance of standardization and adequate reporting of outcomes in studies comparing robotic surgery with conventional approaches, for example using a standardized set of outcomes, to enable transparent and robust conclusions to be drawn regarding the potential advantages or disadvantages of RAMIG [54].
The core outcome measures in effectiveness trials (COMET) Initiative develops Core Outcome Sets (COS) to standardize outcome measurements in clinical trials, benefiting medical decision-making and patients information [55,56].The RoboCOS study recently established a COS for robotic procedures involving various stakeholders [57].This RoboCOS comprised 10 outcomes, including patient (treatment effectiveness, overall quality of life, diseasespecific quality of life, complications including mortality), surgeon (precision/accuracy, visualization), organization (equipment failure, standardization of operative quality, cost-effectiveness), and population-level (equity of access) aspects.Future clinical trials on robotic surgery should adopt these outcomes to enable unbiased comparisons between interventions.The Upper GI International Robotic Association (UGIRA) is another initiative to facilitate the effective implementation and advancement of robotic gastric surgery worldwide and standardize robot-assisted gastric cancer surgery [58].UGIRA developed a comprehensive international prospective registry for upper GI surgeons to enter intra-and postoperative outcomes of their RAMIG cases.This registry facilitates collaborative research on robotic gastric surgery to enhance surgical practices within the Upper GI community [59].By integrating standardized outcome measures like RoboCOS, incorporating these outcomes into prospective databases such as UGIRA, and adhering to the reporting standards outlined by the IDEAL-framework throughout the various stages of development and implementation of new surgical techniques, transparent, reliable, and robust outcomes for procedures like RAMIG can be facilitated.
The economic viability of robotic surgery remains a subject of debate.Critics argue that the established benefits are insufficient, and the technique's high costs and longer operative times are concerning.Studies on the cost-effectiveness of RAMIG included in this systematic review show approximately 3000-5000 USD higher operation costs compared to conventional surgery, mainly due to robot acquisition, maintenance, and expensive disposable instruments [22,30,[42][43][44][60][61][62][63][64][65].Interestingly, the Chinese RCT showed lower direct costs for RDG, and conversely higher indirect costs for RAMIG compared to LDG.However, the cause for the increased indirect costs for RDG could not be elucidated by the authors [22].It is expected that these high costs will decrease in the future as more competitive robotic providers emerge.Additionally, the impact of robotic surgery on surgical career sustainability and patient quality-of-life, including the ability to return to work post-surgery, has not been factored into cost-effective analyses.The cost analysis of the ROBOT-trial, comparing robot-assisted esophagectomy (RAMIE) with open esophagectomy, showed that RAMIE resulted in fewer postoperative complications without increasing overall hospital costs [66].Including these factors is crucial for a comprehensive evaluation of RAMIG's costeffectiveness and potential benefits at the patient, surgical, organizational, and population levels.The health economic model of Patel et al. could be used for this purpose to analyze whether the benefits of a robot-assisted procedure do compensate for the additional costs [67].

Limitations and strengths
To the best of our knowledge, this is the first systematic review that comprehensively examined the worldwide implementation of RAMIG according to the different IDEAL-framework stages (2A-2B-3-4) and its current evidence base.However, there are several limitations associated to consider.First, since we only included comparative studies and excluded studies with less than 10 patients, outcomes of IDEAL-1 proof-ofconcept studies were omitted.Second, some included studies predate the introduction of the IDEAL-framework in 2009, raising potential concerns about fairness in classifying these studies using the IDEAL-criteria over time.Third, although our analysis focused on the chronological order of publication years as an indicative of sequential progression through the IDEAL-stages, we recognize that this approach may not fully reflect the nuanced differences in study duration between small phase II-studies (IDEAL 2A/2B) compared to larger RCT's (IDEAL-3).Factors such as study size, complexity, and resource availability influence this and should be taken into account when interpreting the timeline of RAMIG implementation in this systematic review.Fourth, the quality assessment was not performed following the PRISMA guidelines by two researchers, but by 1 researcher (LT).However, the second researcher (RdB) critically assessed the cases on which there was uncertainty about the risk of bias through the critical assessment of the first researcher (LT) to reach a consensus.Last, the exclusion of non-English articles may have inadvertently led to the omission of crucial studies, particularly those in Asian languages.These limitations should be considered when interpreting the findings and implications of this review.

Future implications
Future research on robotic surgery for gastric cancer should prioritize examining long-term outcomes, quality of life, potential ergonomic benefits, and longevity of the surgeon and surgical staff with potential cost savings.Due to the lack of robust outcomes from a prospective multicenter RCT (IDEAL-3), a European RCT is currently being designed that will examine total lymph node yield, complications and long-term survival for RAMIG versus conventional LG in patients with locally advanced gastric carcinoma after neoadjuvant treatment.Additionally, over 1000 patients are currently being recruited for a Japanese multicenter phase-III RCT kown as the MONA LISA study, which aims to assess the superiority of RAMIG over LG for both early and avanced gastric cancer [68].As robotic surgery is constantly evolving, the question remains whether every incremental change within robotic surgery requires going through every IDEAL-stage starting from IDEAL-1, or whether certain stages can be omitted.With future implementation of supporting algorithms in robotic devices, surgical quality and safety are expected to further improve.Moreover, robotic surgery and its technology can be applied as a tool to facilitate teaching highly complex procedures, such as gastrectomy, to future surgeons.This could potentially result in progressing through the learning curve faster, in a safe manner.

Conclusions
During the implementation of RAMIG, the different stages of the IDEAL-framework were mainly followed in sequential order, although IDEAL-2A studies are still ongoing.IDEAL-3 and 4 long-term studies showed that total/subtotal RAMIG is similar or even better to conventional surgery in terms of postoperative recovery, oncological outcomes, and survival.In addition, total/subtotal RAMIG showed reduced postoperative complication rates, despite higher costs.However, evidence from large-scale prospective RCTs using standardized outcomes for potential benefits of RAMIG is currently lacking.To improve evidence transparency and robustness for future new robotic surgical procedures, utilizing the IDEAL-reporting guidelines and specific Robotic Core Outcome Sets (RoboCOS) is recommended.

Fig. 3
Fig. 3 Course of IDEAL-stages 2A, 2B, 3, and 4 of included studies over time Worldwide (upper graph) and in Europe (lower graph)

Table 2
measurement of the outcome, and selective reporting.The risk of bias of each domain was judged as 'low', 'high', or 'some concerns'.The ROBINS-E tool assesses the N-RCTs against 7 domains of bias for validity: confounding factors, measurement of exposure, participant selection, post-exposure interventions, missing data, outcome measurement, and selective reporting.The risk of bias for each domain was recorded as 'low', 'some concerns', 'high', or 'very high'.