Complications after pancreaticoduodenectomy (PD) can be observed in more than 50% of patients1 and are frequently related to clinically relevant postoperative pancreatic fistula (CR-POPF).2 Abdominal drainage tube placement is an effective method for early recognition of CR-POPF and related complications, such as postpancrectomy hemorrhage (PPH).3 Moreover, abdominal drainage can help mitigate the negative consequences of POPF by early evacuation of pancreatic and enteric juice from the peritoneal cavity.2 In the enhanced recovery after surgery (ERAS) era, the role of prophylactic abdominal drainage in abdominal surgery has been questioned. Nowadays, the omission of drainage is encouraged in several major abdominal procedures, such as liver or colonic resection.4,5 However, the abandonment of prophylactic abdominal drainage seems unwelcome by pancreatic surgeons, who prefer early removal (EDR).6 The rationale of EDR is based on the hypothesis that retrograde bacterial infection through abdominal drainage tube could trigger CR-POPF.7 A recent meta-analysis8 supports the safety of the EDR approach, even though including eight nonrandomized clinical trials (RCTs) weakened the credibility of the results. A new meta-analysis, including only RCTs, was carried out to clarify this critical point. The trial sequential analysis (TSA) methodology was used to avoid false positive or false negative results owing to the small sample size.9,10 TSA analyses all published RCTs by including them chronologically, and estimate the required sample size (RIS) needed to accept or reject the null hypothesis without including type I or II errors.11,12

Patients and Methods

The study protocol was preregistered in PROSPERO (CRD42023397030). Preferred Reporting Items for Systematic Reviews and Meta-Analyses checklist (PRISMA) was used to check the manuscript.13

Eligibility Criteria

The Population Intervention Control Outcomes Study (PICOS) approach was used to define the inclusion and exclusion criteria.14 “Population” was represented by patients who underwent pancreaticoduodenectomy (PD). “Intervention” was considered the EDR. The removal was defined early when intrabdominal drainage(s) was removed by postoperative day 3 (POD3). “Control” group included any approach where the drainage removal started from postoperative day 5 (POD5). “Studies” were included only when the design was randomized.

Information Source, Search, Study Selection, and Data Collection Process

The last search was carried out on 15 July 2023. The search string was managed using the systematic review (SR) accelerator15 and reported in the Supplementary Methods. The PubMed/Medline, Scopus, and Cochrane databases were used.

Data Items

The following information was described for each study: authors, affiliation and country, year of publication, registration number, the type and the number of the drain(s), the characteristics of the pancreatic remnant, and the type of tumors. The importance of outcomes was evaluated using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach.16 Postoperative mortality, major morbidity defined according to the Clavien–Dindo system17 > grade II, need for percutaneous redrainage, CR-POPF according to the updated International Study Group of Pancreatic Fistula (ISGPF) definition2 and relaparotomy was considered “critical.” Postpancreatectomy hemorrhage (PPH),3 delayed gastric emptying (DGE),18 length of stay (LOS), and readmission were defined as “important” but not “critical.” Two authors (D.G.G. and E.D.D.) evaluated the quality of the included studies using the revised tool for assessing the risk of bias in randomized trials (RoB 2).19 Any disagreement was solved after a collegial discussion involving the first author (C.R.). All variables were described using frequencies and percentages or mean and standard deviations (S.D.). When the study reported median and interquartile ranges, a dedicated statistical algorithm was used to obtain mean and SD.20,21

Summary Measures and Synthesis of Results

Two main measures were calculated: (i) the measure of risk association, such as the risk ratio (RR) and mean difference (MD), reported with a 95% confidence interval (CI) and (ii) the required information size (RIS). RIS is the a priori sample size that should be reached to obtain credible results without occurring in type I and type II errors.6 RIS is calculated by considering the heterogeneity of included studies and setting the threshold of type I errors at 5% and type II at 20%.7 The TSA data were reported in the Cartesian plane in which the y-axis indicates the Z-score while the x-axis is the cumulative sample size. Z-score is associated with the P value; when it is higher than 1.96, the P value is less than 0.05. The Z-curve is obtained by adding each study sequentially. The Z-curve can cross three thresholds: the conventional (dotted red horizontal lines), monitoring boundaries (dotted black logarithmic lines), and futility boundaries (dotted black lines). The conventional edge is to the nominal threshold of P value = 0.05. False-positive results (type I error) can be observed when the Z-curve crosses this limit, but RIS still needs to be obtained. The monitoring thresholds are the Z-scores at which type I errors are absent. On the contrary, a false-negative result (type II error) should be hypothesized when the Z-curve remains within conventional and monitoring threshold, and RIS is not yet reached.8,9 Finally, if the RIS is reached with no significant effect, type II error could be rejected. In this case, the Z-score is within the futility boundaries, representing the threshold for non-superiority and non-inferiority effects. In other words, any additional randomization is useless for finding differences between the two arms. The meta-analysis was carried out in line with recommendations from the Cochrane Collaboration,22 and the Mantel–Haenszel random-effects model was used to calculate effect sizes.23

Risk of Bias Across Studies and Meta-regression Analysis

The heterogeneity was described using I2 and Cochran’s Q statistics.24 Heterogeneity was also calculated as diversity (D2).25 The effect of covariates was measured with a meta-regression analysis.26,27 The publication bias was evaluated using Egger tests,28 and a P value < 0.05 indicated a non-negligible “small-study effect.” Statistical analysis was carried out using dedicated packages for R. TSA was performed using the trial sequential analysis software.9

Results

Studies Selection, Characteristics, and Risk of Bias Within the Studies

The PRISMA flowchart is shown in Supplementary Fig. 1. The systematic search identified 18,501 potential articles: 6215 from the Medline/PubMed database, 6952 from the ISI Web of Science, and 5334 from the Cochrane database. After deduplication, 12,921 papers remained. Of these, 12,505 were excluded by screening the title and abstract because they were not pertinent to the study question. In total, 223 were reviewed in full-text form, and 220 were excluded. The paper by Bassi et al.29 was excluded for the impossibility of extracting the data about PDs. Finally, four studies were included in the analysis.30,31,32,33 The accrued sample size was 632: 317 (50.1%) in the EDR arm and 315 (49.8%) in the LDR arm. In Table 1, the characteristics of the included studies were reported.

Table 1 Characteristics of the four included studies

Results of Individual Studies and Synthesis of the Results

Results regarding critical endpoints are presented in Table 2. The mortality rate was nil in four of five studies and, for this reason, was not analyzed. The major morbidity rate was lower (RR 0.55; 95% CI 0.32–0.97) in the EDR than in the LDR group (Fig. 1, panel A). The RIS of 798 still needs to be reached, but false-positive results can be excluded because the Z-line crosses the monitoring boundary (Fig. 1, panel B). The need for percutaneous drain placement was similar between the two arms (RR 0.75; 95% CI 0.24–2.35), and 23,787 additional patients should be randomized before excluding a false equivalence. Additionally, the rate of relaparotomy was similar between the two groups (RR 0.96, 95% CI 0.35–2.60). At the current RR, the RIS required to reject the null hypothesis without type II error was 42,952, indicating that 42,320 additional patients should be randomized to obtain credible information. POPF has a lower prevalence in EDR than LDR (Fig. 2, panel A), without statistical significance (RR 0.50; 95% CI 0.24–1.03). As shown in Fig. 2, panel B, the Z-curve showed that by randomizing 5959 patients, the “true” positive effect of EDR in reducing POPF could be demonstrated.

Table 2 Meta-analysis of critical and non-critical endpoints
Fig. 1
figure 1

Major morbidity; A: forest plot; B: the x-axis is the number of patients yet to be randomized; the y-axis is the cumulative Z-score value representing the effect of each arm; the blue line is the cumulative Z-score obtained by combining the studies; and the dotted red horizontal lines are the conventional boundaries (P value < 0.05); when the Z-curve crosses the conventional boundaries, and the required information size (RIS) is not reached, the result is a false positive (type I error); when the Z-curve does not cross the conventional boundaries and RIS is not reached, the result is a false negative (type II error); the dotted black near-logarithmic lines are the monitoring boundaries; when the Z-curve crosses the monitoring boundaries, the result is a true positive; the inverse dotted black lines are the futility boundaries (area in which any further randomization is useful); EDR early drain removal, LDR late drain removal, RR risk ratio, RIS required information size

Fig. 2
figure 2

Clinically relevant POPF; A: forest plot; B: the x-axis is the number of patients yet to be randomized; the y-axis is the cumulative Z-score value representing the effect of each arm; the blue line is the cumulative Z-score obtained cumulating the studies; and the dotted red horizontal lines are the conventional boundaries (P value < 0.05); when the Z-curve crosses the conventional boundaries, and the required information size (RIS) is not reached, the result is a false positive (type I error); when the Z-curve does not cross the conventional boundaries and RIS is not reached, the result is a false negative (type II error); the dotted black near-logarithmic lines are the monitoring boundaries; when the Z-curve crosses the monitoring boundaries, the result is a true positive; the inverse dotted black lines are the futility boundaries (area in which any further randomization is useful); EDR early drain removal, LDR late drain removal, RR risk ratio, RIS required information size

PPH, DGE, and readmission are similar in the two groups, but a “false” equivalence cannot be excluded. LOS (Fig. 3, panel A) was significantly lower (MD − 2.25; 95% CI − 3.23 to − 1.28). The sample size is adequate, and the RIS was 567 (Fig. 3, panel B).

Fig. 3
figure 3

Length of stay; A: forest plot; B: the x-axis is the number of patients yet to be randomized; the y-axis is the cumulative Z-score value representing the effect of each arm; the blue line is the cumulative Z-score obtained combining the studies; and the dotted red horizontal lines are the conventional boundaries (P value < 0.05); when the Z-curve crosses the conventional boundaries, and the required information size (RIS) is not reached, the result is a false positive (type I error); when the Z-curve does not cross the conventional boundaries and RIS is not reached, the result is a false negative (type II error); the dotted black near-logarithmic lines are the monitoring boundaries; the result is a true positive when the Z-curve crosses the monitoring boundaries. The inverse dotted black lines are the futility boundaries (area in which any further randomization is helpful); EDR early drain removal, LDR late drain removal, MD mean difference, RIS required information size

Heterogeneity, Meta-regression Analysis, and Publication Bias

Significant heterogeneity was observed for major morbidity (I2 = 54%; D = 42%). Table 3 presented heterogeneity of major morbidity as not being influenced by PDAC, the texture of the pancreas, the type and number of drains, or the quality of the study.

Table 3 Meta-regression analysis

Discussion

The present study demonstrated that EDR after PD could have some advantages, reducing major morbidity and LOS in patients with low or intermediate risk of CR-POPF. These results were obtained by including only RCTs and using the TSA approach to exclude type I errors owing to inadequate sample size.

Regarding critical endpoints, TSA analysis showed that EDR, compared with LDR, halves the rate of major complications. This effect seems well demonstrated without needing further RCTs because an adequate sample size has yet to be reached, as shown in the TSA graph. In addition, CR-POPF seems reduced in the EDR group but without conventional statistical significance. In other words, the certainty of this effect is weak owing to imprecision because the 95% CI crosses the null effect line. However, observing the TSA plot, it becomes evident that the Z-score curve could simultaneously cross the conventional and monitoring boundaries by adding only a few hundred patients. Thus, in the following years, with one or two new RCTs, the definitive demonstration that EDR reduces the POPF could be obtained. There are several potential explanations to support these results. First, major morbidity in pancreatic surgery mainly depends on POPF and POPF-related events. Thus, EDR could have a similar effect on major morbidity and POPF. LDR removal could facilitate the development of POPF-related complications, because drainage could mechanically contribute to soft tissue and vessel corrosion owing to pancreatic enzymes.34 Moreover, retrograde and ectopic bacteria could invade the fistula area by extending the catheter time, triggering hemorrhage or anastomosis disruption.35 Several studies have demonstrated that delaying drainage removal causes an increased rate of bacterial-positive cultures.7,36 Retrograde infection could be the basis of CR-POPF infection in some POPFs after PD, mainly when the amylase originated from the branch duct rather than when originating from a disrupted Wirsung-jejunal anastomosis. However, the impact of EDR on POPF is small and requires several patients to be demonstrated.

Concerning relaparotomy and the need for percutaneous redrainage placement, the two approaches appear similar, even if the rarity of these events (< 2% and 5 %) makes it impossible to demonstrate the absence of type II errors. Several thousand patients should be randomized before excluding a false equivalence between the two approaches. Furthermore, the large RISs suggested that the differences, if present, are too small to be clinically irrelevant. The present study confirmed the safety of the EDR approach regarding important endpoints, suggesting that PPH, readmission, and DGE rates were similar. PPH and readmission are rare events, and the statistical demonstration of the equivalence or non-equivalence could be time consuming, requiring several patients and useless from the clinical point of view. DGE, from a physiopathological point of view, is historically attributed to the presence of clinically relevant POPF rate. Thus, assuming this rationale, it seems logical to expect a parallel reduction of DGE in the EDR arm. However, the results suggested that DGE risk was similar among the groups. This counterintuitive result can be explained by recent evidence demonstrating that not all DGEs are related to POPF presence.37 Finally, the reduction of LOS in the EDR arm is statistically significant, not a risk of type I error, and clinically relevant (nearly 2 days less than LDR patients). This result seems consistent with the reduction of major morbidity.

Our study has both strengths and limitations. First, the present meta-analysis is the first to include only RCTs. All included studies are homogenous as inclusion criteria, clearly defining the target population: patients with low-intermediate risk of POPF without clinical, laboratory (amylase values > 5000 U/mL), or radiological suspicion of clinically relevant POPF with POD3. Second, the meta-analytical results were validated using the TSA approach that permits obtaining a measure of the imprecision, namely classical P value, and the credibility of the results in estimating the presence of type I or II errors was validated. Moreover, TSA could help pancreatic surgeons in evaluating whether it is logical and helpful to plan further randomized trials, calculating the correct sample size, and the proper endpoints. Finally, the study included only PD, reducing the bias owing to including distal pancreatectomies.

Nevertheless, some limitations should be recognized. Firstly, a significant heterogeneity weakened major morbidity and POPF results. Even if metaregression explained some sources of RR variability, several covariates, such as the size of Wirsung, surgeons’ experience, or hospital volume, cannot be directly analyzed. Even if the CR-POPF affected both procedures, the severity of complications and the feasibility of rescue strategies, such as percutaneous drainage, would differ between the two types of resection. Another limitation was that the study included studies from different countries with different healthcare systems. Finally, some limitations are due to the TSA methodology, which remains a retrospective method to analyze the RCTs, and thus has the risk of data-driven hypotheses. Moreover, the TSA is a complex statistical approach that is challenging for clinicians.38

In conclusion, EDR, compared with LDR, is associated with lower major morbidity and shorter LOS. These results are robust and not at risk of type I errors. The target population in which EDR can be useful and safe is with patients who, by POD3, did not present clinical, biochemical, or radiological suspicions of clinically relevant POPF. An effect of EDR in reducing POPF could be present, but some high-quality, well-designed RCTs are needed to confirm these results. The number of patients that should be randomized to obtain a definitive conclusion is reasonable and achievable in a relatively short time.