In this issue, the European Association for Endoscopic Surgery (EAES) and its Guidelines Subcommittee report on a rapid guideline for the management of choledocholithiasis [1]. Rapid guidelines or rapid recommendations are short guidelines, addressing one up to a few prioritized clinical questions. With rapid guidelines, efforts are invested in the rigor of development, both at the level of evidence retrieval and appraisal, as well as in the process of generating recommendations based on this evidence. Through a structured approach and in line with predefined methodological standards, an international, interdisciplinary panel provided a weak recommendation in favor of preoperative or intraoperative endoscopic retrograde cholangiopancreatography (ERCP), or laparoscopic common bile duct exploration (LCBDE).

A weak or conditional recommendation means that most patients would opt for the proposed intervention after informed decision making, and only a minority would not. For healthcare professionals and policy makers, it means that decision aids are likely needed to help individual patients make decisions consistent with their values and preferences, depending on the setting, resources, surgical expertise, and other parameters.

This guideline is somewhat unique, in that, it is one of a few guidelines in the field of surgery that applies a relatively novel evidence synthesis method, called network meta-analysis. Traditional (pairwise) meta-analysis is a statistical method that synthesizes the results of different studies; this allows a more precise estimate of the actual comparative effect between two interventions. However, it falls short in comparing multiple interventions simultaneously, rather than it compares two interventions at a time. Network meta-analysis can compare multiple interventions within the same model (in this guideline, LCBDE versus preoperative, intraoperative, or postoperative ERCP). This allows simultaneous comparison and prioritization of one out of many interventions [2, 3].

Network meta-analysis allows synthesizing studies comparing two or more interventions as long as these interventions form a connected network. In the example given in Fig. 1, LCBDE is being directly compared with preoperative ERCP, intraoperative ERCP, and postoperative ERCP (indicated by connecting arrows), meaning that there are randomized trials comparing LCBDE with either of these interventions (direct evidence, deriving from standard pairwise meta-analysis). But LCBDE is also indirectly compared with, e.g., intraoperative ERCP via a common comparator, which is preoperative ERCP (indirect evidence, indicated by dashed lines). This is a connected network as we can follow a path from any intervention to any of the remaining interventions.

Fig. 1
figure 1

Network plot for of interventions. Continuous arrows indicate direct comparisons; dashed lines indicate lack of direct (indirect) comparisons

Using sophisticated statistical methods, direct and indirect evidence are combined and provide the mixed, or network evidence. The main output of the network meta-analysis is the network treatment effects (e.g., odds ratio with the corresponding confidence interval), which show the relative efficacy between any pair of competing interventions. Another attractive feature of network meta-analysis is ranking metrics. One of the most common is the so-called P-score, a probability estimate for each intervention being better than the remaining interventions [4].

A feature of network meta-analysis is that it provides more precise comparative effect estimates (how much better or worse is the intervention compared to the comparator), compared to a standard pairwise meta-analysis [5]. As an example, for the outcome ‘major complications’ (Clavien-Dindo ≥ 3), the pairwise (standard meta-analysis) 95% confidence interval of absolute effect difference between preoperative and intraoperative ERCP was between 9 fewer to 493 more patients. This means that the true comparative effect of preoperative versus intraoperative ERCP on major complications is somewhere between 9 fewer to 493 more patients; i.e., 9 fewer patients to 493 more patients will experience a major complication with preoperative ERCP. But the corresponding network estimates were between 0 and 62 more patients [1]. Network meta-analysis reduced the 95% confidence interval (which corresponds to the precision: the narrower the confidence interval, the higher the precision) by 88% compared to pairwise meta-analysis. These estimates allow us to be more confident about the actual comparative effect of interventions, which affects the panel’s decisions on which intervention should be recommended or not. Furthermore, it allows healthcare professionals’ and patients’ decision making based on more precise information.

The validity of the results of network meta-analysis lies on the assumption that patients in all intervention arms and in all trials have similar characteristics. In brief, we require that the distribution of effect modifiers is similar across treatment comparisons. That entails that potential effect modifiers (e.g., the proportion of patients with acute cholangitis, which would adversely affect outcomes) should be defined a-priori. Though this assumption can be approximated statistically, it is evaluated mainly clinically and epidemiologically [3, 6]. In this guideline, there was no substantial variation in operative and interventional techniques, and most authors attempted transcystic stone extraction before choledochotomy. Furthermore, acute cholangitis and/or pancreatitis were exclusion criteria in 45%, 55%, 35%, and 33% of the preoperative ERCP, intraoperative ERCP, LCBDE, and postoperative ERCP cohorts, suggesting that these effect modifiers likely did not affect transitivity.

The summary evidence provided by the statistical analyses is subjected to rigorous appraisal before it can be used. For this purpose, the EAES Guidelines Subcommittee uses two evidence appraisal systems; the GRADE (Grading of Recommendations, Assessment, Development and Evaluation) system, and the CINeMA (Confidence in Network Meta-analysis) system.

Different trials carry different risks of bias. CINeMA, through a semi-automated platform, weighs the contribution of the risk of bias of each study to the overall risk of bias, according to the contribution of this trial to each network effect estimate [7]. This method allows a modest assessment of the overall risk of bias for each comparison.

Moreover, factors other than risk of bias may contribute to the certainty of evidence. These are imprecision (how precise are the results, defined primarily by the confidence interval), heterogeneity (whether the results of different trials agree with each other), incoherence (whether the results of direct and indirect comparisons agree with each other), publication bias (whether any published studies have been missed, or whether there are any unpublished trials), and indirectness (whether the patients, interventions and outcomes of the trials fit within guideline-specific criteria). These parameters are summarized by the CINeMA and the GRADE systems to appraise the certainty of the evidence [7, 8].

Following a thorough appraisal of the evidence certainty, the evidence summaries can be used to inform the development of recommendations. However, going from evidence to recommendations is a challenging process in the context of multiple interventions. The GRADE approach includes 7 pillars (domains) in the so-called evidence-to-decision framework: balance between benefits and harms, certainty of the evidence, use of resources, patients’ values and preferences, applicability, feasibility, and equity [8]. In the previous steps, network meta-analysis has provided a comprehensive overview of the evidence on benefits and harms, and the certainty of this evidence. These are only two of the domains that inform the development of recommendations. The guideline development group (in our case, an interdisciplinary panel of surgeons, gastroenterologists, patient representatives, systematic reviewers, statisticians, and guideline methodologists) discusses this evidence, along with conceptual differences among the interventions in each of these domains. At the end of this process, the guideline development group has gained an overview of the developed GRADE evidence-to-decision framework (Fig. 2), which usually makes the consensus to recommend either intervention straightforward. In the discussed EAES guideline, a unanimous consensus was achieved across recommendations on the first Delphi round.

Fig. 2
figure 2

Summary judgements in the evidence-to-decision framework

The landscape of the guideline development methodology has changed dramatically over the past few years. The European Association for Endoscopic Surgery and its Guidelines Subcommittee have embraced the latest advances in the field of medical statistics and guideline development methods, aiming to provide pertinent, trustworthy recommendations to improve patient care and experience. Surgical Endoscopy has supported this approach, hoping our readers will value EAES guidelines developed with rigorous, transparent and evidence-based methodologies.