Introduction

Rectal resection, in addition to emerging total neoadjuvant therapy [1], is currently the common curative therapy for localized rectal carcinoma [2]. Robotic-assisted low anterior rectum resection (RLAR) can overcome many known limitations of conventional laparoscopy (LLAR). The feasibility and safety of RLAR are now well established, and there is growing evidence that it may offer better peri- and postoperative outcomes compared to LLAR [3]. A meta-analysis published by Han et al. in 2020, which compared the perioperative outcomes of LLAR and RLAR from eight RCTs involving 999 patients, showed that while RLAR led to significantly longer operative time, the conversion rate was lower [4]. However, most of the available literature consists of retrospectively collected datasets, including patients who are operated within the surgeon’s learning curve for RLAR to increase the cohort. Thus, results often demonstrate longer operative times, increased peri- and postoperative complications, and at times worse oncologic outcomes. The only prospective randomized controlled trial (RCT) to compare conversion rates between RLAR and LLAR (the ROLARR study) also had this weakness; participating surgeons were only required to have performed 25 robot-assisted procedures [5]. Thus, it can be assumed that the incomplete learning curve had a negative impact on the surgical results. Furthermore, the implementation of an RCT is difficult. Centers are specialized, so that a comparison between LLAR and RLAR within one center is rarely possible. In addition, there has been an increase in the number of patients actively deciding the surgical technique; thus, generation of two equivalent study arms is often problematic.

To enable a well-founded evaluation of a new technique, standardization is required after implementation. This enables efficient training and further education of the entire surgical team, but also requires regular re-evaluation and further development. For robot-assisted colorectal surgery, this is done at regular intervals by internal reviews of five German centers in which all surgeons work as proctors for Intuitive and therefore have proven expertise in robot-assisted colorectal surgery. All centers operate according to a standardized refined surgical technique [6], which is a full robotic approach without laparoscopic assistance. Our study aims to evaluate the perioperative outcomes of RLAR after completion of the learning curve in an ideal cohort of patients, and thus establish the first benchmark values worldwide that can be used as a comparison for other centers or even other techniques.

Methods

Data collection

Data were collected from the five German proctor centers (University Hospital Schleswig–Holstein, Campus Kiel, University Hospital Eppendorf, KRH Klinikum Robert Koch Gehrden, Augusta-Kranken-Anstalten Bochum, and Klinikum Worms; Table 1). To map the learning curve overcomes, all patients were included after the first 100 robot-assisted procedures performed by each surgeon. Therefore, patients operated by other surgeons in the centers, who were within their learning curve, were not included. The centers each contributed the outcome of one experienced surgeon, except center two, where two surgeons performed the surgeries. Data entry into a standardized questionnaire was performed by the centers. The data collected consisted of patient demographics, operative date, operative time, technical characteristics, peri-, postoperative, and oncologic outcomes, conversion rates, readmission, and 30- and 90-day mortalities. If available, additional data were entered for follow-up. These were then analyzed anonymously at Center 1. A positive ethical vote was available for all participating hospitals.

Table 1 Participating centers

Study cohort and inclusion criteria for low comorbidity

The overall cohort consisted of 322 patients from five centers who underwent surgery between January 2013 and January 2020. The median age was 64 (interquartile range (IQR) 56–73) years, and the median body mass index (BMI) was 25.9 (IQR 22.6–28.6) kg/m2. The proportion of men was 59.9% (n = 193). To define the benchmark cohort, patients who had prior surgery were excluded (50 patients). Further exclusion criteria were primary discontinuity resection (22 patients), non-machine anastomosis (14 patients), and procedure extension (10 patients; atypical liver resection (n = 2), atypical lung resection via video-assisted thoracoscopic surgery (n = 1), uterine myoma resection (n = 1), multivisceral resection and hysterectomy (n = 1), bladder resection (n = 1), seminal vesicle resection (n = 1), Meckel’s diverticulum resection (n = 1), peritonectomy (n = 1), and creation of a colonic pouch-anal anastomosis (n = 1)). The inclusion criteria are listed in Fig. 1.

Fig. 1
figure 1

Number of patients included per center and exclusion criteria. RLAR, robot-assisted low anterior rectum resection

Performance metrics for benchmarking

Primary endpoints for the benchmark analysis were intraoperative complications and conversion rates, positive circumferential resection margin (CRM), total mesorectal excision (TME) quality, and lymph node yield. Pathologic examination was performed according to the guidelines of the German Cancer Society (DKG) [2], which are based on international standards. The TME quality was assessed by an independent pathologist using a standardized procedure that was checked and certified externally.

Secondary endpoints consisted of postoperative Clavien-Dindo complications (CDC), split by “any” complication as well as major complications (CDC ≥ III), the anastomosis insufficiency rate, readmission within 90 days, and the 30- and 90-day mortalities.

Statistical analysis

Data are presented as numbers (n) with proportions (%) or as median and IQR for continuous variables. For subgroup analysis, the chi-squared test or T-test was used where appropriate. Survival rates were calculated using the Kaplan–Meier function. All p-values were two-sided and considered significant at p ≤ 0.05.

Benchmark values were calculated solely from the benchmark cohort (n = 226). We first calculated the median with IQR for the individual centers. From those median, the 75th percentile was found and those were defined as our benchmark values. Thus, outcome parameters above the benchmark value (75th percentile) indicate high morbidity, whereas outcome parameters below the benchmark value indicate acceptable morbidity. This was applied to all benchmark values apart from the lymph node yield, where the cutoff was defined as the 25th percentile, because a higher lymph node yield is better.

We also conducted descriptive statistics for peri-and postoperative parameters where applicable.

Statistical analysis was performed using Statistical Package for Social Sciences software (version 26.0, SPSS Inc., Chicago, IL).

Results

Basic characteristics of benchmark patients

The benchmark group consisted of 226 patients; 133 patients (58.9%) were men, and the median age was 64 (IQR 49–70) years with a median BMI of 25.8 (IQR 22.7–28.2) kg/m2. The remaining patient characteristics are outlined in Table 2.

Table 2 Patient characteristics

The indication for rectal resection was carcinoma in 224 patients (99.1%), with adenocarcinoma being the most common tumor entity (97.3%). Only 69 patients (30.5%) received neoadjuvant therapy prior to resection. The remaining 157 patients underwent primary surgery.

Intraoperative outcomes in benchmark patients

The median operative time was 266 (IQR 211–310) min (Table 3). In 75 patients (33.2%), a so-called dual-docking procedure with intraoperative repositioning and redocking was performed. One patient experienced intraoperative bleeding (requiring conversion to open rectal resection) and another patient experienced an unspecified intraoperative complication. Conversion was performed in eight cases (4.5%), although only one was tumor-associated (UICC II, R0, CRM negative); the others were converted because of adhesions (n = 3), incidental findings of infrarenal aortic aneurysm (n = 1), unclear anatomy (n = 1), anastomosis creation (n = 1), and suturing over an insufficient anastomosis (n = 1). Two of these cases were converted to laparoscopy (for suturing the anastomosis and in the case of the aortic aneurysm) and the remaining six to laparotomy.

Table 3 Intraoperative outcomes

The mean anastomosis height was 5.8 (IQR 4.0–7.0) cm from the ano cutaneous line, but data were missing in 110 patients (48.7%). A protective ileostomy was created in 163 cases (72.1%).

Postoperative outcomes in benchmark patients

Overall morbidity at 30 days was 29.2% (n = 66), of which 14.2% (n = 32) suffered a major complication (CDC ≥ III). The readmission rate within 90 days of discharge was 4% (n = 9), two of which were non-surgery associated (one patient for planned liver metastasectomy and the other with symptomatic ascites due to tumor progression). Five patients showed late insufficiency, which was treated endoscopically in three cases (in two cases by endoluminal vacuum therapy and in one using an over-the-scope clip) and surgically in two cases (one anastomosis redo and one discontinuity resection). One readmission in each case was due to constipation and diarrhea, respectively. The 30- and 90-day mortality rates were 0.5% (n = 1) and 1.3% (n = 3), respectively (Table 4).

Table 4 Postoperative outcomes

Benchmark and excluded patients

The benchmark and comparison (n = 96) cohorts showed no statistical differences in age, BMI, gender, histologic entity, UICC stage, and tumor size (Table 2). In terms of ASA classification, the benchmark group was significantly healthier (p = 0.022) and less frequently pretreated with neoadjuvant therapy (30.5% vs. 51.0%, respectively; p = 0.001).

The inter-cohort operative time was similar between groups [266 (IQR 211–310) min in the benchmark and 276 (IQR 215–328) min in the comparison group]. However, in the benchmark cohort there was a significantly lower complication rate (0.9% vs. 5.2%, respectively; p = 0.017) and conversion rate (4.5% vs. 10.4%; p = 0.015) (Table 2). In terms of postoperative outcomes, the “any” complication rate was higher in the benchmark cohort but did not reach significance (38.5% vs. 29.2%, respectively; p = 0.066). However, the rate of insufficiency was more than twice as high in the comparison cohort compared with the benchmark group (19.7% vs. 9.3%; p = 0.017). This increased morbidity was not reflected in the rate of readmissions (4.0% vs. 3.1%, respectively) or in the 30- and 90-day mortality rates (0.5% vs. 1.4% and 1.3% vs. 2.1%, respectively) (Table 3 and Fig. 2).

Fig. 2
figure 2

(A) Box plot graph with 10th to 90th percentile of duration of surgery in minutes. For clarity, statistical outliers were ignored. (B) Intraoperative conversion and complication rates (%). (C) 30- and 90-day mortality rate and 30-day readmission rate (%). (D) Postoperative complications divided into “any”, Clavien-Dindo classification I–II and III–IV, and anastomotic leakage rate. Ns, non significant

Oncological outcomes in benchmark and excluded patients

There was no significant difference in terms of lymph node yield in the benchmark (19 (IQR 13–21)) and excluded (19 (IQR 14–22)) cohorts (Table 5). Although the R1 rate in the comparison group (3.1%) was more than three times higher than in the benchmark group (0.9%), the difference did not reach statistical significance. Very good TME quality was achieved in 99.1% of patients in the benchmark cohort (good TME quality in 0.9%) (Fig. 3). These results were significantly better than the TME quality in the comparison group (very good 90.6%, good 6.2%, poor 3.1%; p = 0.001). This is also reflected in the local recurrence rate, which was three times higher in the comparison cohort (5.6%) than in the benchmark group (1.5%) at a mean follow-up of 24.8 months (no significant difference). The overall survival, disease-free survival, and local recurrence rates were comparable between groups; however, there was a high rate of missing follow-up data in the benchmark (45.1%) and comparison (62.5%) groups.

Table 5 Oncological outcomes
Fig. 3
figure 3

A Total mesorectal excision quality (%). B Box-plot graph of harvested lymph nodes with 10th to 90th percentile. Ns, not significant

Benchmark values

The 30-day benchmark values are based on the results of 226 patients from five centers (Table 6). The cutoff values for conversion and intraoperative complication rates were ≤ 4.0% and ≤ 1.4%, respectively. In terms of postoperative complications, the cutoff was ≤ 28% for “any” and ≤ 18.0% for major complications. The R0 and complete TME benchmark rates at 30 days were 100%, with a lymph node yield > 18. The benchmark for rate of anastomotic insufficiency was < 12.5% and 90-day mortality was 0%. Readmission rates should not exceed 4%.

Table 6 Benchmark results

Discussion

Robotic-assisted rectal resection can achieve outstanding results when performed by an experienced surgeon at an expert center. To evaluate a newer procedure, evidence of “non-inferiority” compared to the gold standard is first needed. In a second step, superiority should be demonstrated in studies so that the newer intervention can be established as the gold standard after widespread standardization. This is exemplified by robot-assisted prostatectomy. Unfortunately, this concept of evaluation has some pitfalls. If complication rates are already low, a very large cohort is required to be able to prove a significant difference. In addition, the participating surgeons in a multicenter prospective comparative study would have to be experts in the new and old surgical procedures. This is hardly feasible with today’s standardized procedures and the specialization of hospitals and surgeons. Thus, another tool is needed to evaluate interventions.

Our study aimed to make this evaluation possible. It provides benchmark values for several clinically relevant endpoints that can be immediately adopted by other institutions. Our study corresponds in large parts to the proposal for a standardized benchmarking report, which was established in the context of major liver resections [7]. The strength of our study is that the patients were all operated according to a standardized surgical procedure by designated robotic experts in high-volume centers and the data were interrogated in a standardized manner. This allows first publication of the best achievable outcomes in robotic-assisted low anterior rectal resection.

In 2019, the results of the largest prospective, randomized multicenter study comparing RLAR with LLAR were published [5]. The endpoints analyzed were conversion rate (RLAR 12.2%, LLAR 8.1%), intraoperative (RLAR 14.8%, LLAR 15.3%) and postoperative (RLAR 31.7%, LLAR 33.1%) complication rates, as well as TME quality (very good: RLAR 76.4%, LLAR 77.6%). Notably, the conversion rate is associated with an increased rate of local recurrence, as well as increased morbidity and mortality [8,9,10]. All these results were inferior to our benchmark values, which demonstrate the advantage of the proven surgical robotic expertise in our centers. A limitation of this comparison is that rectal amputations were included in the ROLLAR RCT and surgeons at different stages of the learning curve participated in this RCT.

Compared to the meta-analysis by Han et al. (eight RCTs, 999 patients: RLAR 495, LLAR 504) [4], our median operative times were significantly longer (266 (IQR 211–310) min) than in the meta-analysis (211 (IQR 191–259) min), but with significantly lower rates of incomplete TME quality (benchmark cohort 0% vs. RLAR 22.2% and LLAR 25.65%) and a higher average lymph node yield (benchmark cohort 18 vs. RLAR 17.5 and LLAR 17).

In 2020, Diers et al. published their paper reporting the nationwide in-hospital mortality rate following rectal resection for rectal cancer [11]. They found a mortality rate of 1.5% in very high output centers (case load > 50 per year) and 1.4% in high output center (case load around 32 patients per year), but with approximately 15% of the cases being emergency procedures. The anastomotic leakage rate was 11.8% in the very high and 12.4% in the high output centers. Those results are similar to our benchmark values, but are hardly comparable because there was no differentiation in those leakage rates towards an open or laparoscopic approach and the performed resection (i.e., low anterior, anterior, tubular/segmental, or sigmoid/left resection). There are limitations to our study. Our data are from only one continent, whereas three are recommended [7]. There were also differences in the number of patients per center, with > 100 patients from one center, > 50 from two centers, and ≤ 30 from the last two. While this fact better reflects reality than results from a high-output center, some differences in terms of experience with the procedure must also be considered; there may also have been an influence of learning curves on our results. Furthermore, this inhomogeneity in numbers per center means that there is also increased case weighting. This is reflected in the intercentral comparison of the anastomotic leakage rate: two centers reported the same number of anastomotic leakages but with twice the number of patients in the center, and the insufficiency rate was twice as high in the smaller group. Another limitation is that we cannot exclude the possibility that complications may have been documented incorrectly or not at all, especially with regard to CDC grade I. From the benchmark proposal by Rössler et al., we know that there is often a lack of documentation of pathologically elevated laboratory values, for example [7]. This would mean that our complication rate of any severity would be falsely low. In addition, our benchmark cohort showed a low rate of neoadjuvant therapy. We could identify two possible explanations. The first is a potential understaging preoperatively. The second one could be upon patients’ request for a primary surgery. However, a further comparison between the clinical and pathological tumor stage should be performed. There was no selection for this, but it must be assumed that this resulted in a lower complication rate and higher lymph node yield (mean 19.5 in patients without neoadjuvant therapy vs. 17.9 in neoadjuvant-treated patients, without statistical significance). As a further weakness, the rate of oncological follow-up was unfortunately very low, so that only a weak statement can be made.

With increasing cost pressure for hospitals, clinics, and ultimately the individual surgeon, there is a need for publication of performance parameters. Performance measurements not only enable better argumentation regarding increased costs, but also allow patients to decide regarding the clinic, type of intervention, and ultimately their preferred surgeon, which significantly improves their autonomy [12].

Conclusion

Our study is the first to provide benchmark values on the peri- and postoperative outcome of robotic-assisted rectal resection. Our benchmark cohort is based on databases of designated robotic experts from national expert centers. Critical patient selection, including no prior surgery, low comorbidity, and operated using a standardized technique, has allowed us to achieve “ideal” outcomes. However, the learning curve continues to be a factor that influences outcomes and only national centers could be recruited. Thus, it can be assumed that as national and international implementations of RLAR continue, and experience grows as a result, outcomes will also change, and this study will need to be updated. Nevertheless, we are convinced that these benchmark values will be used as comparison values for other centers and that the concept of benchmarking will continue to expand.