Introduction

Current practice and recommendations regarding bowel preparation before elective colorectal surgery to reduce the incidence of anastomotic leakage (AL) and surgical site infections (SSIs) remain controversial. Mechanical bowel preparation (MBP), once routinely used, may cause preoperative dehydration, electrolyte disturbance, and discomfort, and failed to demonstrate any clear benefit over no bowel preparation (NBP) [1,2,3,4,5]. European [6] and Italian [7] enhanced recovery after surgery (ERAS) societies’ guidelines currently recommend NBP, albeit leaving room for oral antibiotics (oA) alone or in combination with MBP [8]. At the same time, results of large retrospective population-based studies of the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) suggested that MBP combined with oral antibiotics (MoABP) significantly decreased the rates of SSIs and overall morbidity (OM) compared to NBP [9,10,11,12,13], inducing four large North-American societies (The American Society of Colon and Rectal Surgeons, the Society of American Gastrointestinal and Endoscopic Surgeons, the American Society for Enhanced Recovery, and the Perioperative Quality Initiative) to recommend MoABP [14,15,16]. As a consequence, the use of MoABP is currently reported by 50% of Austrian–German [17] and by 80% of North American [18] surgeons. During the last 8 years, one RCT was launched comparing NBP with MoABP [19], two MoABP with oA [20, 21], and one MoABP with MBP for rectal cancer [22]. To the best of our knowledge, only one [22] of these trials recently completed the planned enrollment and none published its final results yet [23]. An interesting four-arm RCT comparing NBP with oA, MBP, and MoABP for colon resections [24] was recently closed before completion due to poor accrual. Meanwhile, one RCT comparing NBP with MoABP [25] failed to detect significant differences in SSIs and AL rates but was largely underpowered; oA showed a significant reduction of SSI rates in two RCTs, either alone [26, 27] or combined with MBP [26], and an international multicenter RCT comparing oA with MoABP [28] is currently still recruiting. Finally, one RCT reported that MoABP significantly reduced SSI rates compared to MBP after colorectal resections [29], and another that MoABP significantly reduced both SSI and AL rates compared to MBP after rectal resections [30].

Very recently, the European Association of Endoscopic Surgery, the European Society of ColoProctology, and the Society of American Gastrointestinal and Endoscopic Surgeons published a joint guideline [31] based on a previous systematic review and network meta-analysis [32], with a conditional recommendation for MoABP, supported by low-quality evidence due to variable adherence to preoperative intravenous antibiotic prophylaxis (PIVAP) and great heterogeneity regarding oA schedules [33].

The relevant heterogeneity of the available evidence induced the Italian ColoRectal Anastomotic Leakage (iCral) study group to estimate the effects of NBP in patients treated with PIVAP before elective colorectal surgery (treatment variable) in comparison to three other treatments (oA, MBP, MoABP) on a large dataset derived from two prospective multicenter open-label observational studies [34, 35]. Several recent studies of propensity score estimation showed that machine learning methods outperform logistic regression models with iterative variable sections in terms of bias reduction and mean-squared error [36] and may be advantageous in multiple treatment settings [37]. Therefore, a multi-treatment analysis based on machine learning procedures was used to compare four bowel preparation modalities before elective colorectal surgery.

Methods

Study design, participants, and setting

This was a secondary unplanned ad hoc multi-treatment re-analysis of two prospective cohorts of patients who had undergone colorectal surgery for malignant and benign diseases based on machine-learning procedures. A total of 8359 patients who underwent colorectal resection with anastomosis were enrolled in two consecutive studies upon explicit inclusion/exclusion criteria in 78 surgical centers in Italy from January 2019 to September 2021: iCral2 [34] and iCral3 [35].

To control for data imbalance derived from several treatment confounders, the present analysis included 6241 patients (74.7%) out of 8359 available in the parent studies, based on explicit exclusion criteria (Fig. 1). Any record with missing information regarding preoperative bowel preparation or with MBP performed using anything different from polyethylene glycol (PEG) was excluded; patients treated without PIVAP were excluded considering its significant impact on the risk of SSIs [23]; delayed urgencies were excluded because this study is focused on elective resections; any anastomosis protected by a proximal stoma and patients treated with neo-adjuvant therapy, perioperative steroids, or dialysis were excluded because these treatments were impacting only on subgroups of subjects; patients treated by anterior resection with anastomosis at less than 6 cm from the anal verge and without protective stoma were excluded in relation to the significant impact of this procedure on the risk of AL. The study adhered to the Strengthening the Reporting of Observational Studies in Epidemiology statement [39] and checklist (online supplemental material).

Fig. 1
figure 1

Study flowchart. PEG, polyethylene glycol; MNA-SF, mini nutritional assessment–short form [38]; ERAS, enhanced recovery after surgery; NBP, no bowel preparation; oA, oral antibiotics; MBP, mechanical bowel preparation; MoABP, mechanical bowel preparation and oral antibiotics

Four different treatment groups were considered: (a) no mechanical bowel preparation and no oral antibiotics (NBP; No. = 3742; 60.0%); (b) oral antibiotics alone (oA; No. = 406; 6.5%); (c) mechanical bowel preparation alone (MBP; No. = 1486; 23.8%); (d) mechanical bowel preparation and oral antibiotics (MoABP; No. = 607; 9.7%). All patients in the MBP and MoABP groups received products containing PEG on the day before surgery. Patients in the oAB and MoABP groups received several different oral antibiotic schedules, the majority of which contained metronidazole (Table 1).

Table 1 Oral antibiotics schedules in the oA and MoABP groups

Clinical data

The parent studies recorded both continuous and discrete variables related to biometric data, patient information, indication and type of surgical procedure, adherence to ERAS program items, and outcomes. Local investigators ensured data quality control, which was validated by the study coordinator, resolving any discrepancies through strict cooperation. Perioperative care was provided by local investigators, who were left free to decide on any complimentary imaging and/or any further action according to local criteria.

The descriptive variables considered in the 6241 patients are shown in Table 2. Continuous variables were categorized according to their median values to optimize the effectiveness of the analysis by reducing the number of unmatched cases.

Table 2 Descriptive analysis of the variables considered in the 6241 patients before matching

Outcomes

All the outcomes were calculated at 60 days after surgery. Any adverse event was recorded and graded [40, 41], as well as any reoperation, readmission, or death.

The primary endpoints were AL, defined according to the international consensus criteria [42], SSIs, according to the criteria of the Centers for Disease Control and Prevention/National Healthcare Safety Network (CDC/NHSN) [43], and overall morbidity (OM; any adverse event). The secondary endpoints were superficial and/or deep incisional surgical site infections (sdiSSIs), defined as specific complications including purulent drainage from superficial incisions, positive culture of fluid or tissue from superficial incisions, pain or tenderness, localized swelling, redness, heat, and/or infections involving deep fascial and muscle layers without fascial dehiscence; deep wound dehiscence; abdominal collection/abscess, defined as any intraperitoneal postoperative collection altering the normal postoperative course, requiring either medical, radiological, endoscopic, or surgical intervention [43]; major morbidity (any adverse event grade > II); reoperation (any unplanned operation); mortality (any death).

Ethics

Both studies were conducted in accordance with the Declaration of Helsinki and guidelines for good clinical practice E6 (R2). All enrolled patients signed a consent to be included in the studies. The study protocols were approved by the ethics committee of the coordinating center (Marche Regional Ethics Committee (CERM) 2018/334 released on 11/28/2018 for iCral2 and 2020/192 released on 07/30/2020 for iCral3) and registered at ClinicalTrials.gov (NCT03771456 for iCral2 and NCT04397627 for iCral3). Subsequently, all other centers were authorized to participate in their local ethics committees. Both studies were approved for planned primary and any unplanned secondary analyses; therefore, no further authorization for the current analysis was requested. Individual participant-level anonymized datasets were made available upon reasonable request by contacting the study coordinator.

Statistical analysis

Sample sizes were calculated and reported in the respective core papers [34, 35]. Events per variable guideline were followed [44]. There were no missing data in the database of 6241 patients. The target of estimands was represented by the average treatment effect in the true population of interest (ATT) answering the question “How would the average outcome(s) change if anyone receiving the reference treatment (NBP) had instead received another treatment?” A machine-learning technique, named the Generalized Boosted Model (GBM), was used to estimate the propensity score weights for the binary comparisons between the reference treatment and the other treatment arms. GBM estimation involves an iterative process with multiple regression trees to capture complex and nonlinear relationships between treatment assignment and the covariates without over-fitting the data [37]. The choice of GBM is due to a better balance of the features [37] and to an enhanced bias reduction [35] compared to other multinomial logistic regression models such as inverse probability weighting (IPWT). The analysis was performed using the “twang library” (Toolkit for Weighting and Analysis of Nonequivalent Groups,) of the software “R©” (Version 4.2.2, The R Foundation© for Statistical Computing, Vienna, Austria, 2022). As GBM works iteratively estimating the propensity scores according to the minimization of the distance of the weighted distributions of the covariates given the baseline treatment, balance comparisons have been estimated by performing 10,000 iterations and using the Kolmogorov–Smirnov (KS.mean) metrics with a threshold of 0.2 (a KS-mean difference less than 0.2 typically indicates a negligible difference between the means of the groups) [37]. The KS.mean was preferred based on the availability of a large sample size allowing comparison of the entire distribution rather than just of the mean.

Twenty covariates potentially affecting the four-treatments variable assignments [45] were included in the model (Fig. 1).

For the outcome analysis, weighted logistic regression models for both primary and secondary endpoints defined as dichotomous variables, according to the baseline treatment (NBP) and the other three treatment arms (oA, MBP, and MoABP), were estimated using the “svyglm library” (Survey General Linear Models) of the software “R©” (Version 4.2.2, The R Foundation© for Statistical Computing, Vienna, Austria, 2022). The logistic regression models for the endpoints were adjusted considering the same 20 covariates used in the weight estimation, using a “doubly robust” estimation of the treatment effects [37]. Considering that the primary endpoints were not independent, having been selected based on available evidence [23], a Sidak–Bonferroni adjustment for multiple comparisons/outcomes was applied, calculating α = 0.012. Statistical significance, therefore, was accepted for p values < 0.012. All the instructions used with the software “R©” are available upon reasonable request to the study coordinator.

Results

The population of 6241 patients included data deriving from 72 (92.3%) of the original 78 centers. NBP group included data deriving from 61 (84.7%), oA from 12 (16.7%), MBP from 52 (72.2%), and MoABP from 18 (25.0%) of the 72 centers. All the 20 covariates included in the model showed an optimal balance among treatment groups (Fig. 2).

Fig. 2
figure 2

Love plot of covariates’ Kolmogorov–Smirnov mean differences before and after adjustment using a machine learning technique, comparing the reference treatment (no bowel preparation, named “0” in the figure) with the other 3 treatments (oral antibiotics alone, named “1”; mechanical bowel preparation alone, named “2”; mechanical bowel preparation and oral antibiotics, named “3”); ERAS, enhanced recovery after surgery

The multi-treatment weighted logistic regression analysis for primary endpoints (Fig. 3) showed the AL risk (3.3% after NBP) to be significantly higher after MBP (5.6%; OR 1.82; 95% CI 1.23–2.71; p = 0.003) and comparable after oA (3.9%) and MoABP (3.5%). The SSI risk (5.0% after NBP) was significantly lower after MoABP (2.8%; OR 0.42; 95% CI 0.22–0.80; p = 0.008) and comparable after oA (5.4%) and MBP (6.8%). The OM risk (26.6% after NBP) was significantly higher after MBP (28.9%; OR 1.38; 95% CI 1.10–1.72; p = 0.005), comparable after oA (25.6%) and MoABP (22.2%).

Fig. 3
figure 3

Multi-treatment weighted logistic regression analysis for primary endpoints (log scale); NBP, no bowel preparation; oA, oral antibiotics alone; MBP, mechanical bowel preparation alone; MoABP, mechanical bowel preparation and oral antibiotics

Concerning secondary endpoints (Table 3), no significant differences were recorded concerning the risk of deep wound dehiscence, abdominal collection/abscess, reoperation, and mortality. The risk of sdiSSI (3.3% after NBP) was significantly reduced after MoABP (1.7%; OR 0.29; 95% CI 0.14–0.60; p = 0.001), and the risk of major morbidity (5.3% after NBP) was significantly higher after oA (7.6%; OR 2.07; 95% CI 1.31–3.28; p = 0.002).

Table 3 Multi-treatment weighted logistic regression analysis for secondary endpoints

All the details regarding the multi-treatment machine learning adjusted comparisons are reported in the online supplemental material.

Discussion

To the best of our knowledge, this is the first multi-treatment propensity score weighting analysis performed using the machine-learning weighted/adjusted regression model to assess different bowel preparation methods before elective colorectal surgery. When conclusive evidence from randomized trials is lacking or when researchers need to assess treatment effects based on real-life data, multiple treatments propensity score weighting analysis based on machine-learning methods performed on data from prospective observational studies offers an alternative approach for estimating treatment effects. The machine learning GBM model adopted in this study provides an improvement in bias reduction and external validity (not reducing the sample size analyzed) in comparison with propensity score-matching analyses between the ATT and the other treatments (three in the present study) and enhances bias reduction in comparison with IPWT [36, 37].

The main finding of the present analysis is that MoABP, compared to NBP, showed a significantly lower SSI risk, with no significant difference concerning the AL risk and a borderline reduction of the OM risk (Fig. 3). As the severity of complications comprised into OM rates may be skewed between groups and not captured by aggregate analysis, a detailed list of adverse events is reported in Table S4 in online supplemental material. This finding remained consistent with the analysis of secondary endpoints, with a significant reduction of the sdiSSI risk, without any significant difference regarding the risks of major morbidity, mortality, and reoperation (Table 3). Although the only available, though largely underpowered, randomized trial comparing NBP with MoABP [25] failed to detect any significant difference regarding SSI rates in the two arms, our results support the findings of the ACS-NSQIP retrospective series [9,10,11,12,13], the North American societies guidelines [14,15,16], and the most recent European guideline [31] towards the recommendation of MoABP in elective colorectal surgery. However, since both oA and MBP determine deep alterations of gut microbiota with possible impact on SSIs and AL rates [46], and considering that an optimal oral antibiotics administration schedule is far from being established in clinical practice (Table 1), the results of ongoing randomized trials comparing oA alone for colon resection [28] and MBP for rectal resections [22] with MoABP are eagerly awaited.

At the same time, no significant differences were recorded for all the primary endpoints concerning oA (Fig. 3), whereas it determined a significantly higher major morbidity risk (Table 3), possibly linked to a higher, though not significant, rate of major deep wound dehiscence, sdiSSIs, anastomotic leakage, and cardiac dysfunction events (Table S4 in online supplemental material).

Finally, MBP determined significantly higher AL and OM risks (Fig. 3), confirming the available evidence from randomized trials [1,2,3,4] and the findings of a recent propensity score-matched comparison of NBP vs. MBP alone performed on a more limited number of cases derived by the iCral database [5]. Considering that MBP alone was still used in nearly one-quarter of our cases, a de-implementation strategy or, according to the preference of some surgeons for a clean colon, a shift towards MoABP is highly advisable.

The main strength of the present study is represented by a large number of prospectively enrolled patients in a well-defined time-lapse in a large number of centers, treated by mini-invasive surgery in more than 80% of cases, representing a wide sample of surgical units performing colorectal resections in Italy. Although the multicenter nature of the data may be a definite source of clustering bias, it is undoubtedly representative of real-life clinical practice. Another strength is represented by its methodology (Fig. 1): (a) a reasoned selection of patients from the parent database was performed upon explicit criteria, limiting data imbalance; (b) the inclusion of 20 covariates into the model allowed to account for the potential clustering bias of multicenter data, for any confounder due to different perioperative pathways, to surgical approach and techniques, to blood transfusion-related morbidity [47], and to patient-related factors; (c) evaluation of the treatments effect through a weighted-adjusted regression model including the same 20 covariates [48]. Although the treatment groups were significantly unbalanced before GBM weighting (Table 2) concerning several well-known risk factors for the endpoints (i.e.,: age, sex, ASA class, nutritional status, minimally invasive surgery, type of resection, type and caseload of the recruiting center), the machine-learning generalized boosted model used in this study markedly improves bias reduction minimizing the distance of the weighted distributions of the 20 covariates (Fig. 2) compared to alternative methods such as IPWT [36, 37].

However, this study has several limitations, and its results should be interpreted with caution: (a) a relevant heterogeneity of oral antibiotic schedules (Table 1), as within and between previously published RCT and related meta-analyses [33]; (b) the exclusion criteria applied to the parent database (Fig. 1) practically excluded any resection performed for low rectal cancer, making the results not applicable to this subgroup of patients; (c) several aspects of health-acquired infections preventive bundle (preoperative whole-body bathing, hair removal, and skin decontamination) and single surgeon’s experience [49] were not measured in the parent studies; (d) finally, further bias from residual unknown factors and potential measurement errors by the participating investigators may have had an impact on the results.

Conclusions

This multi-treatment machine learning analysis, despite the limitations mentioned above, showed that mechanical bowel preparation combined with oral antibiotics significantly reduced the SSI risk after elective colorectal surgery.