Defining benchmarks for robotic-assisted low anterior rectum resection in low-morbid patients: a multicenter analysis

Purpose To define the best possible outcomes for robotic-assisted low anterior rectum resection (RLAR) using total mesorectal excision (TME) in low-morbid patients, performed by expert robotic surgeons in German robotic centers. The benchmark values were derived from these results. Methods The data was retrospectively collected from five German expert centers. After patient exclusion (prior surgery, extended surgery, no prior anastomosis, hand-sewn anastomosis), the benchmark cohort was defined (n = 226). The median with interquartile range was first calculated for the individual centers. The 75th percentile of the median results was defined as the benchmark cutoff and represents the “perfect” achievable outcome. This applied to all benchmark values apart from lymph node yield, where the cutoff was defined as the 25th percentile (more lymph nodes are better). Results The benchmark values for conversion and intraoperative complication rates were ≤ 4.0% and ≤ 1.4%, respectively. For postoperative complications, the benchmark was ≤ 28% for “any” and ≤ 18.0% for major complications. The R0 and complete TME rate benchmarks were both 100%, with a lymph node yield of > 18. The benchmark for rate of anastomotic insufficiency was < 12.5% and 90-day mortality was 0%. Readmission rates should not exceed 4%. Conclusion This outcome analysis of patients with low comorbidity undergoing RLAR may serve as a reference to evaluate surgical performance in robotic rectum resection.


Introduction
Rectal resection, in addition to emerging total neoadjuvant therapy [1], is currently the common curative therapy for localized rectal carcinoma [2]. Robotic-assisted low anterior rectum resection (RLAR) can overcome many known limitations of conventional laparoscopy (LLAR). The feasibility and safety of RLAR are now well established, and there is growing evidence that it may offer better peri-and postoperative outcomes compared to LLAR [3]. A metaanalysis published by Han et al. in 2020, which compared the perioperative outcomes of LLAR and RLAR from eight RCTs involving 999 patients, showed that while RLAR led to significantly longer operative time, the conversion rate was lower [4]. However, most of the available literature consists of retrospectively collected datasets, including patients who are operated within the surgeon's learning curve for RLAR to increase the cohort. Thus, results often Jan-Hendrik Egberts and Jan-Niclas Kersebaum contributed equally to this work. demonstrate longer operative times, increased peri-and postoperative complications, and at times worse oncologic outcomes. The only prospective randomized controlled trial (RCT) to compare conversion rates between RLAR and LLAR (the ROLARR study) also had this weakness; participating surgeons were only required to have performed 25 robot-assisted procedures [5]. Thus, it can be assumed that the incomplete learning curve had a negative impact on the surgical results. Furthermore, the implementation of an RCT is difficult. Centers are specialized, so that a comparison between LLAR and RLAR within one center is rarely possible. In addition, there has been an increase in the number of patients actively deciding the surgical technique; thus, generation of two equivalent study arms is often problematic.
To enable a well-founded evaluation of a new technique, standardization is required after implementation. This enables efficient training and further education of the entire surgical team, but also requires regular re-evaluation and further development. For robot-assisted colorectal surgery, this is done at regular intervals by internal reviews of five German centers in which all surgeons work as proctors for Intuitive and therefore have proven expertise in robotassisted colorectal surgery. All centers operate according to a standardized refined surgical technique [6], which is a full robotic approach without laparoscopic assistance. Our study aims to evaluate the perioperative outcomes of RLAR after completion of the learning curve in an ideal cohort of patients, and thus establish the first benchmark values worldwide that can be used as a comparison for other centers or even other techniques.

Data collection
Data were collected from the five German proctor centers (University Hospital Schleswig-Holstein, Campus Kiel, University Hospital Eppendorf, KRH Klinikum Robert Koch Gehrden, Augusta-Kranken-Anstalten Bochum, and Klinikum Worms; Table 1). To map the learning curve overcomes, all patients were included after the first 100 robot-assisted procedures performed by each surgeon. Therefore, patients operated by other surgeons in the centers, who were within their learning curve, were not included. The centers each contributed the outcome of one experienced surgeon, except center two, where two surgeons performed the surgeries. Data entry into a standardized questionnaire was performed by the centers. The data collected consisted of patient demographics, operative date, operative time, technical characteristics, peri-, postoperative, and oncologic outcomes, conversion rates, readmission, and 30-and 90-day mortalities. If available, additional data were entered for follow-up. These were then analyzed anonymously at Center 1. A positive ethical vote was available for all participating hospitals.

Performance metrics for benchmarking
Primary endpoints for the benchmark analysis were intraoperative complications and conversion rates, positive circumferential resection margin (CRM), total mesorectal excision (TME) quality, and lymph node yield. Pathologic examination was performed according to the guidelines of  [2], which are based on international standards. The TME quality was assessed by an independent pathologist using a standardized procedure that was checked and certified externally.
Secondary endpoints consisted of postoperative Clavien-Dindo complications (CDC), split by "any" complication as well as major complications (CDC ≥ III), the anastomosis insufficiency rate, readmission within 90 days, and the 30and 90-day mortalities.

Statistical analysis
Data are presented as numbers (n) with proportions (%) or as median and IQR for continuous variables. For subgroup analysis, the chi-squared test or T-test was used where appropriate. Survival rates were calculated using the Kaplan-Meier function. All p-values were two-sided and considered significant at p ≤ 0.05.
Benchmark values were calculated solely from the benchmark cohort (n = 226). We first calculated the median with IQR for the individual centers. From those median, the 75th percentile was found and those were defined as our benchmark values. Thus, outcome parameters above the benchmark value (75th percentile) indicate high morbidity, whereas outcome parameters below the benchmark value indicate acceptable morbidity. This was applied to all benchmark values apart from the lymph node yield, where the cutoff was defined as the 25th percentile, because a higher lymph node yield is better.
We also conducted descriptive statistics for peri-and postoperative parameters where applicable.
Statistical analysis was performed using Statistical Package for Social Sciences software (version 26.0, SPSS Inc., Chicago, IL).

Basic characteristics of benchmark patients
The benchmark group consisted of 226 patients; 133 patients (58.9%) were men, and the median age was 64 (IQR 49-70) years with a median BMI of 25.8 (IQR 22.7-28.2) kg/m 2 . The remaining patient characteristics are outlined in Table 2.
The indication for rectal resection was carcinoma in 224 patients (99.1%), with adenocarcinoma being the most common tumor entity (97.3%). Only 69 patients (30.5%) received neoadjuvant therapy prior to resection. The remaining 157 patients underwent primary surgery.

Intraoperative outcomes in benchmark patients
The median operative time was 266 (IQR 211-310) min (Table 3). In 75 patients (33.2%), a so-called dual-docking procedure with intraoperative repositioning and redocking was performed. One patient experienced intraoperative bleeding (requiring conversion to open rectal resection) and another patient experienced an unspecified intraoperative 322 RLAR 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 22 2 2 2 2 2 2 2 2 2 2 2 2 2 2 27 27 27 27 27 27 27 27 27 27 27 27 27 27 272 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 p p p p p p p p p p p p p p pa a a a a a a a a a a a a a a aƟ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵe e e e e e e e e e e e e e e en n n n n n n n n n n n n n n nts ts ts ts ts ts ts ts ts ts ts ts ts ts ts ts   p pa a a a a a a a a a a a a a a aƟ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵe e e e e e e e e e e e e e e en n n n n n n n n n n n n n nts ts ts ts ts ts ts ts ts ts ts ts ts ts ts ts (73,3%) 14 excluded with 1 1 1 1 1 1 1 1 1 1 1 1 1 14 4 4 4 4 4 4 4 4 4 4 4 4 4 4 22 22 22 22 22 22 22 22 22 22 22 22 22 22 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 p p p p p p p p p p p p p p p pa a a a a a a a a a a a a a a aƟ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵ Ɵe e e e e e e e e e e e e e e en n n n n n n n n n n n n n n nts ts ts ts ts ts ts ts ts ts t ts ts ts ts ts (70.2%)  (n = 1). Two of these cases were converted to laparoscopy (for suturing the anastomosis and in the case of the aortic aneurysm) and the remaining six to laparotomy. The mean anastomosis height was 5.8 (IQR 4.0-7.0) cm from the ano cutaneous line, but data were missing in 110 patients (48.7%). A protective ileostomy was created in 163 cases (72.1%).

Postoperative outcomes in benchmark patients
Overall morbidity at 30 days was 29.2% (n = 66), of which 14.2% (n = 32) suffered a major complication (CDC ≥ III). The readmission rate within 90 days of discharge was 4% (n = 9), two of which were non-surgery associated (one patient for planned liver metastasectomy and the other with symptomatic ascites due to tumor progression). Five patients showed late insufficiency, which was treated endoscopically in three cases (in two cases by endoluminal vacuum therapy and in one using an over-the-scope clip) and surgically in two cases (one anastomosis redo and one discontinuity resection). One readmission in each case was due to constipation and diarrhea, respectively. The 30-and 90-day mortality rates were 0.5% (n = 1) and 1.3% (n = 3), respectively (Table 4).

Benchmark and excluded patients
The benchmark and comparison (n = 96) cohorts showed no statistical differences in age, BMI, gender, histologic entity, UICC stage, and tumor size ( Table 2). In terms of ASA classification, the benchmark group was significantly healthier (p = 0.022) and less frequently pretreated with neoadjuvant therapy (30.5% vs. 51.0%, respectively; p = 0.001).

Oncological outcomes in benchmark and excluded patients
There was no significant difference in terms of lymph node yield in the benchmark (19 (IQR 13-21)) and excluded (19 (IQR 14-22)) cohorts (Table 5). Although the R1 rate in the comparison group (3.1%) was more than three times higher than in the benchmark group (0.9%), the difference did not reach statistical significance. Very good TME quality was achieved in 99.1% of patients in the benchmark cohort (good TME quality in 0.9%) (Fig. 3). These results were significantly better than the TME quality in the comparison group (very good 90.6%, good 6.2%, poor 3.1%; p = 0.001). This is also reflected in the local recurrence rate, which was three times higher in the comparison cohort (5.6%) than in the benchmark group (1.5%) at a mean follow-up of 24.8 months (no significant difference). The overall survival, disease-free survival, and local recurrence rates were comparable between groups; however, there was a high rate of missing follow-up data in the benchmark (45.1%) and comparison (62.5%) groups.

Benchmark values
The 30-day benchmark values are based on the results of 226 patients from five centers ( Table 6). The cutoff values for conversion and intraoperative complication rates were ≤ 4.0% and ≤ 1.4%, respectively. In terms of postoperative complications, the cutoff was ≤ 28% for "any" and ≤ 18.0% for major complications. The R0 and complete TME benchmark rates at 30 days were 100%, with a lymph node yield > 18. The benchmark for rate of anastomotic insufficiency was < 12.5% and 90-day mortality was 0%. Readmission rates should not exceed 4%.

Discussion
Robotic-assisted rectal resection can achieve outstanding results when performed by an experienced surgeon at an expert center. To evaluate a newer procedure, evidence of "non-inferiority" compared to the gold standard is first needed. In a second step, superiority should be demonstrated in studies so that the newer intervention can be established as the gold standard after widespread standardization. This is exemplified by robot-assisted prostatectomy. Unfortunately, this concept of evaluation has some pitfalls. If complication rates are already low, a very large cohort is required to be able to prove a significant difference. In addition, the participating surgeons in a multicenter prospective comparative study would have to be experts in the new and old surgical procedures. This is hardly feasible with today's standardized procedures and the specialization of hospitals and surgeons. Thus, another tool is needed to evaluate interventions.
Our study aimed to make this evaluation possible. It provides benchmark values for several clinically relevant endpoints that can be immediately adopted by other institutions. Our study corresponds in large parts to the proposal for a standardized benchmarking report, which was established in the context of major liver resections [7]. The strength of our study is that the patients were all operated according to a standardized surgical procedure by designated robotic experts in high-volume centers and the data were interrogated in a standardized manner. This allows first publication of the best achievable outcomes in robotic-assisted low anterior rectal resection. In 2019, the results of the largest prospective, randomized multicenter study comparing RLAR with LLAR were published [5]. The endpoints analyzed were conversion rate (RLAR 12.2%, LLAR 8.1%), intraoperative (RLAR 14.8%, LLAR 15.3%) and postoperative (RLAR 31.7%, LLAR 33.1%) complication rates, as well as TME quality (very good: RLAR 76.4%, LLAR 77.6%). Notably, the conversion rate is associated with an increased rate of local recurrence, as well as increased morbidity and mortality [8][9][10]. All these results were inferior to our benchmark values, which demonstrate the advantage of the proven surgical robotic expertise in our centers. A limitation of this comparison is that rectal amputations were included in the ROLLAR RCT and surgeons at different stages of the learning curve participated in this RCT.
In 2020, Diers et al. published their paper reporting the nationwide in-hospital mortality rate following rectal resection for rectal cancer [11]. They found a mortality rate of 1.5% in very high output centers (case load > 50 per year) and 1.4% in high output center (case load around 32 patients per year), but with approximately 15% of the cases being emergency procedures. The anastomotic leakage rate was 11.8%  Fig. 3 A Total mesorectal excision quality (%). B Box-plot graph of harvested lymph nodes with 10th to 90th percentile. Ns, not significant in the very high and 12.4% in the high output centers. Those results are similar to our benchmark values, but are hardly comparable because there was no differentiation in those leakage rates towards an open or laparoscopic approach and the performed resection (i.e., low anterior, anterior, tubular/ segmental, or sigmoid/left resection). There are limitations to our study. Our data are from only one continent, whereas three are recommended [7]. There were also differences in the number of patients per center, with > 100 patients from one center, > 50 from two centers, and ≤ 30 from the last two. While this fact better reflects reality than results from a highoutput center, some differences in terms of experience with the procedure must also be considered; there may also have been an influence of learning curves on our results. Furthermore, this inhomogeneity in numbers per center means that there is also increased case weighting. This is reflected in the intercentral comparison of the anastomotic leakage rate: two centers reported the same number of anastomotic leakages but with twice the number of patients in the center, and the insufficiency rate was twice as high in the smaller group. Another limitation is that we cannot exclude the possibility that complications may have been documented incorrectly or not at all, especially with regard to CDC grade I. From the benchmark proposal by Rössler et al., we know that there is often a lack of documentation of pathologically elevated laboratory values, for example [7]. This would mean that our complication rate of any severity would be falsely low. In addition, our benchmark cohort showed a low rate of neoadjuvant therapy. We could identify two possible explanations. The first is a potential understaging preoperatively. The second one could be upon patients' request for a primary surgery. However, a further comparison between the clinical and pathological tumor stage should be performed. There was no selection for this, but it must be assumed that this resulted in a lower complication rate and higher lymph node yield (mean 19.5 in patients without neoadjuvant therapy vs. 17.9 in neoadjuvanttreated patients, without statistical significance). As a further weakness, the rate of oncological follow-up was unfortunately very low, so that only a weak statement can be made. With increasing cost pressure for hospitals, clinics, and ultimately the individual surgeon, there is a need for publication of performance parameters. Performance measurements not only enable better argumentation regarding increased costs, but also allow patients to decide regarding the clinic, type of intervention, and ultimately their preferred surgeon, which significantly improves their autonomy [12].

Conclusion
Our study is the first to provide benchmark values on the periand postoperative outcome of robotic-assisted rectal resection. Our benchmark cohort is based on databases of designated robotic experts from national expert centers. Critical patient selection, including no prior surgery, low comorbidity, and operated using a standardized technique, has allowed us to achieve "ideal" outcomes. However, the learning curve continues to be a factor that influences outcomes and only national centers could be recruited. Thus, it can be assumed that as national and international implementations of RLAR continue, and experience grows as a result, outcomes will also change, and this study will need to be updated. Nevertheless, we are convinced that these benchmark values will be used as comparison values for other centers and that the concept of benchmarking will continue to expand.