Comparative efficacy of first-line therapeutic interventions for achalasia: a systematic review and network meta-analysis

Background Several interventions with variable efficacy are available as first-line therapy for patients with achalasia. We assessed the comparative efficacy of different strategies for management of achalasia, through a network meta-analysis combining direct and indirect treatment comparisons. Methods We identified six randomized controlled trials in adults with achalasia that compared the efficacy of pneumatic dilation (PD; n = 260), laparoscopic Heller myotomy (LHM; n = 309), and peroral endoscopic myotomy (POEM; n = 176). Primary efficacy outcome was 1-year treatment success (patient-reported improvement in symptoms based on validated scores); secondary efficacy outcomes were 2-year treatment success and physiologic improvement; safety outcomes were risk of gastroesophageal reflux disease (GERD), severe erosive esophagitis, and procedure-related serious adverse events. We performed pairwise and network meta-analysis for all treatments, and used GRADE criteria to appraise quality of evidence. Results Low-quality evidence, based primarily on direct evidence, supports the use of POEM (RR [risk ratio], 1.29; 95% confidence intervals [CI], 0.99–1.69), and LHM (RR, 1.18 [0.96–1.44]) over PD for treatment success at 1 year; no significant difference was observed between LHM and POEM (RR 1.09 [0.86–1.39]). The incidence of severe esophagitis after POEM, LHM, and PD was 5.3%, 3.7%, and 1.5%, respectively. Procedure-related serious adverse event rate after POEM, LHM, and PD was 1.4%, 6.7%, and 4.2%, respectively. Conclusions POEM and LHM have comparable efficacy, and may increase treatment success as compared to PD with low confidence in estimates. POEM may have lower rate of serious adverse events compared to LHM and PD, but higher rate of GERD. Electronic supplementary material The online version of this article (10.1007/s00464-020-07920-x) contains supplementary material, which is available to authorized users.

treatment sessions [5,6]. On the other hand, LHM combined with an anti-reflux procedure is a more invasive treatment often requiring only one treatment session with success rates of 71% to -92% [7]. In the European Achalasia trial, the success rate of PD and LHM was shown to depend on achalasia subtype with type II achalasia having the highest success rate at 5 years (PD 96%; LHM 88%) and type III achalasia with the poorest success rate at 5 years (PD 48%; LHM 86%) [8]. The success rate of POEM in prospective cohorts has been greater than 90% and maintained across achalasia subtypes, thought to be related to the ability to perform a proximal extended myotomy in type III, or spastic achalasia. A consistent observation with POEM has been a higher risk of gastroesophageal reflux disease (GERD) compared to PD or LHM [9,10].
In the last year, two landmark multicenter RCTs comparing POEM to PD and POEM to LHM have been published providing a framework for assessing comparative efficacy and safety of these interventions to inform optimal first-line intervention for treatment of patients with achalasia [11,12]. Hence, we performed a pairwise and network meta-analysis combining direct (from RCTs directly comparing treatments of interest) and indirect evidence (from RCTs comparing treatments of interest with a common comparator), to compare the relative efficacy and safety of PD, LHM, and POEM for the management of achalasia. We used Grading of Recommendations Assessment, Development and Evaluation (GRADE) criteria for network meta-analysis to appraise quality of evidence [13].

Methods
This systematic review is reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) for network meta-analyses (PRISMA-NMA) statement and was conducted following a priori established protocol [14]. We also followed good research practices as outlined in the ISPOR (International Society for Pharmacoeconomics and Outcomes Research) report on interpreting indirect treatment comparisons and network meta-analysis for health-care decision making.

Selection criteria
Studies included in this meta-analysis were RCTs with minimum follow-up of 1 year that met the following inclusion criteria: (a) Patients: adults (age > 18 years) with achalasia, treated with (b) Interventions and Comparators: PD, LHM, and POEM, and reported (c) Outcome: treatment success assessed at 1 year.
We excluded (a) observational or non-randomized studies, (b) RCTs of endoscopic botulinum toxin injection, as this is considered second-line therapy for patients who are not candidates for first-line therapy, (c) RCTs of oral therapies reserved for patients who are not candidates for first-line therapy (i.e., oral smooth muscle relaxants), and (c) trials with short duration of follow-up (< 1y) [15].

Search strategy
The search strategy was conducted updating a prior systematic literature review performed as part of the recent American Society of Gastrointestinal Endoscopy (ASGE) guidelines [16]

Data abstraction and quality assessment
Data on study-, patient-, and treatment-related characteristics were abstracted onto a standardized form, by two authors independently (AF, RY). The risk of bias of individual studies was assessed in the context of the primary outcome, using the Cochrane Risk of Bias 2 assessment tool [17].

Outcomes assessed
The primary efficacy outcome was treatment success at 1 year. Treatment success was defined based on decrease of Eckardt score (which measures symptom severity for dysphagia, regurgitation, retrosternal pain, and weight loss) [18] to ≤ 3 in 3 RCTs [11,12,19], absence of dysphagia in 2 RCTs (according to a specific questionnaire in the study by Borges et al. [20], or patient-reported improvement of symptoms in the study by Kostic et al. [21]), and according to DeMeester grading of dysphagia in a single RCT [22].
Secondary efficacy outcomes included treatment success at 2 years and physiologic outcomes (reduction in basal pressure of the lower esophageal sphincter [LES]; decrease in integrated relaxation pressure [IRP], post-treatment height of barium contrast on timed barium esophagram). Primary safety outcome was risk of post-treatment GERD at 1 year from the therapy; secondary safety outcomes were risk of severe erosive esophagitis (LA Grade C or D), and procedure-related serious adverse events.

Statistical analysis
Direct meta-analysis was performed using the Mantel-Haenszel fixed-effects model (in the absence of conceptual heterogeneity and < 5 studies) to estimate pooled risk ratio (RR) and 95% confidence intervals (CI); with small number of studies, random effects models can be unstable [23].
We assessed statistical heterogeneity using I 2 statistic, with values over 50% indicating substantial heterogeneity [24]. Due to the small number of trials, formal assessment of publication bias was not performed. Direct comparisons were performed using RevMan v5.3 (Cochrane Collaboration, Copenhagen, Denmark) (Review Manager (RevMan), ver. 5.3 ed: The Cochrane Collaboration, 2014). Next, we conducted network meta-analysis using a multivariate fixed-effects meta-regression as described elsewhere [25]. We used a frequentist approach based on a mixed-effects consistency model and provided a point estimate from the network along with 95% CI from the frequency distribution of the estimate.
The primary outcome (treatment success at 1 year) was analyzed using the network meta-analysis, while treatment success at 2 years was compared only through a direct meta-analysis. Pooled prevalence of procedure-related serious adverse events and of GERD outcomes was reported and pooled estimates, computed through the random effects model by DerSimonian and Laird test and expressed as mean and standard deviation, of physiological outcomes (reduction in basal pressure of LES, decrease in IRP, and posttreatment height of barium contrast on barium esophagogram) were calculated.
The quality of evidence derived from the pairwise and network meta-analysis was judged using the GRADE framework. In this approach, direct evidence from RCTs starts at high quality and can be rated down based on risk of bias, indirectness, imprecision, inconsistency (or heterogeneity) and/or publication bias, to levels of moderate, low, and very low quality (Supplementary Table 2). The rating of indirect estimates starts at the lowest rating of the two pairwise estimates that contribute as first-order loops to the indirect estimate but can be rated down further for imprecision or intransitivity (dissimilarity between studies in terms of clinical or methodological characteristics). If direct and indirect estimates were similar (i.e., coherent), then the higher of their rating was assigned to the network meta-analysis estimates.

Studies
From 38,354 unique studies identified using the search strategy, 6 RCTs met inclusion criteria and are included in the network meta-analysis to compare three different strategies for management of achalasia. Figure 1 shows the flow chart of study selection. Table 1 summarizes the RCTs included in the network meta-analysis. Overall, these six trials included 745 patients. All six RCTs were two-arm controlled trials, in which four compared LHM with PD [19][20][21][22], one compared POEM to PD [11] and one compared POEM with LHM [12]. Overall, 309 patients were treated with LHM, 260 with PD, and 176 with POEM. The network of the included trials is reported in Fig. 2.
All RCTs enrolled treatment-naïve patients except for the RCT by Werner et al. [12], in which 35.7% patients were previously treated for achalasia (26.2% treated with PD, 6.7% with botulin toxin injection and 2.7% with combined PD and botulinum toxin). LHM was combined with Dor fundoplication in four studies [12,19,20,22] and with Toupet fundoplication in one RCT [21]. The primary outcome (treatment success at 1 year) was reported in all studies, whereas only 4 RCTs reported treatment success at 2 years [12,19,20,22].
Serious adverse events were not consistently defined or reported in the trials.
Demographical and clinical characteristics of trial patients are reported in the Supplementary Table 3. Baseline patient characteristics and prognostic factors (namely, age, gender, body mass index (BMI), baseline Eckardt score, baseline LES pressure) were comparably distributed in the intervention and comparator groups and across different trials. Achalasia subtypes according to manometry findings were reported in 3 RCTs [11,12,19] with 111 (20.1%) achalasia type I, 358 achalasia type II (64.8%), and 49 achalasia type III (8.8%).
Out of 745 patients enrolled in the included trials, 397 were male (53.2%) and baseline Eckardt score ranged from Risk of bias assessment was performed in the context of the primary outcome. Due to lack of blinding of patients and physicians (outcome assessors) for a subjective outcome, studies were deemed to be at high risk of performance and detection bias (Supplementary Fig. 1).
On network meta-analysis, combining direct and indirect effect estimates, similar findings were observed. There was low confidence in estimates supporting higher efficacy of LHM vs. PD (RR, 1.  Table 2 and Supplementary  Table 4). Quality of evidence, primarily based on direct evidence, was rated down due to serious risk of bias, and due to serious imprecision (lower limit of 95% CI crossing unity). No significant incoherence (differences in direct and indirect estimates) was observed in closed loops and no evidence of inconsistency was registered (Cochran's Q test 2.37, p = 0.6684). No significant difference was observed in the efficacy of POEM vs. LHM, with very low confidence in estimates supporting the use of POEM and LHM due to very serious imprecision and serious risk of bias (Supplementary Table 4).

Secondary efficacy outcomes
Treatment success at 2 years On direct meta-analysis, based on 2 RCTs (273 patients) [8,20], there was no significant difference in treatment success at 2 years between  At 1 year, no evidence of barium contrast retention was reported after PD and LHM, while the trial by Ponds et al. [11] Table 5).

Risk of gastroesophageal reflux disease (GERD)
The clinical and objective evaluation of GERD are reported in Table 3.

Risk of treatment-related serious adverse events
Supplementary Table 6 reports the incidence of serious adverse events observed in the included trials. Risk of serious procedure-related adverse events with LHM, PD, and POEM was 6.7% (95% CI: 1.4%-11.9%), 4.2% (1.8%-6.6%), and 1.4% (0%-3.2%), The most frequent serious complication after PD was perforation, which ranged from 1.5% to 8% of treated patients. In two RCTs [19,22], 12% of patients treated with LHM experienced a severe mucosal injury, while the perforation rate for LHM was 4% in the study by Hamdy and colleagues [22] and 2.7% in the trial by Werner et al. [12]. Overall, perforation rate was 4.2% after PD and 1.2% with LHM, while none of the patients treated with POEM experienced this complication.

Discussion
Treatment choice for achalasia represents a common challenge and a matter of debate in clinical practice. Though definitive endoscopic or surgical interventions have been studied, there has been limited synthesis of data on the comparative efficacy of these treatments, in particular since the development of POEM. Through a network meta-analysis, and using GRADE criteria to appraise quality of evidence, Fig. 3 Direct meta-analysis comparing different treatment strategies for achalasia. Primary outcome was treatment success assessed at 1 year. PD pneumatic dilation, LHM laparoscopic Heller myotomy, POEM peroral endoscopic myotomy we made several key observations on the comparative efficacy and safety of these interventions. First, POEM and LHM may be more effective than PD, and comparable to each other, in decreasing achalasia-related symptoms at 1 year. Second, the risk of GERD was higher with POEM as compared to PD or LHM, but the risk of severe erosive esophagitis was low across all interventions. Overall risk of serious adverse events including perforation was lower with POEM as compared to PD or LHM.
Findings from this meta-analysis combined with consideration of relevant clinical factors can help guide clinicians and patients on treatment selection. Achalasia subtype is well established as critical prognostic factor. While we were unable to compare therapeutic efficacy across achalasia subtypes, prior experiences and studies suggest that efficacy in type III achalasia is greater following POEM with a proximal extended esophageal myotomy compared to LHM without extended myotomy or PD, due to the ability to treat not only the lower esophageal sphincter but also the spasticity in the distal esophageal smooth muscle. As such, the ASGE 2020 guideline on the management of achalasia, the American College of Gastroenterology 2020 guideline for Achalasia, as well as the Best Practice Advice from the American Gastroenterological Association Institute in 2017 recommend POEM as the preferred treatment for management of type III achalasia [16,26].
Additional patient-specific and resource considerations are relevant to choice of therapy. A recent systematic review and meta-analysis by Oude Nijhuis and colleagues additionally identified older age and presence of sigmoid-shaped esophagus as poor predictors of treatment success [27]. Health-care utilization is additionally an important consideration. Compared to POEM, mean operative time, blood loss, and requirements for narcotics are generally greater with LHM and length of hospital stay is either similar or longer with LHM [28][29][30]. Overall, direct costs for both POEM and LHM do not significantly differ, though when considering quality adjusted life years, POEM appeared to be cost effective compared to LHM [31]. While PD utilizes fewer resources, patients should be aware that outcomes following PD are optimized with a sequential dilation protocol. In fact, in this network meta-analysis POEM remained more effective than PD at 2 years, though the observed efficacy of LHM vs PD was not apparent at 2 years. Discordancy in efficacy is likely driven by variation in PD protocols among studies, as depicted in Table 1, which challenges the ability to actually compare long-term efficacy outcomes. All things considered, the ultimate therapeutic decision should be patient centered based on shared decision-making models, and this is an area that requires further investigation and understanding.
An important strength of this network meta-analysis is the inclusion of RCTs alone and inclusion of new trials, in particular, the two landmark head-to-head RCTs comparing POEM to LHM and PD [11,12]. Prior meta-analyses predated these RCTs, included both observational and randomized studies, and did not objectively appraise the overall quality of evidence using standardized GRADE methodology [7,32].
Although the finding of increased rate of post-treatment GERD after POEM was expected, it is important to discuss the clinical implications. Ten to 20-year long-term followup after Heller myotomy reports high incidence of GERD and erosive complications including strictures, Barrett's esophagus, and esophageal adenocarcinoma, which in turn contribute to the failure rate of myotomy at 10 years [33]. As such, Heller myotomy is often performed with an anti-reflux procedure. In POEM, the lack of a combined anti-reflux procedure to strengthen the integrity of the anti-reflux barrier [34] likely augments a gastroesophageal reflux physiology following POEM. However, the pooled estimate of severe erosive reflux disease following POEM in the short term is low. A previous meta-analysis found that approximately 30 patients should be treated with LHM over POEM to prevent 1 case of post-procedure severe esophagitis [34]. Further, initial experiences of endoscopic transoral incisionless fundoplication following POEM have been shown to reduce risks of esophagitis and esophageal acid exposure [35]. Nonetheless, long-term follow-up of consequences of GERD post POEM needs to be studied.
There are certain limitations, related to both the network meta-analysis as well as individual studies, which merit further discussion. The studies had a short duration of follow-up, and the primary outcome of this network meta-analysis was focused on short-term (1-year) success. Limited post-intervention follow-up prevents the ability to understand longterm comparative efficacy between interventions, which is critical for a chronic disease. There was also a paucity of direct head-to-head comparative trials, in particular comparing POEM to the other treatments. Further, performance and detection bias related to the non-blinded design of included trials introduced significant risk of bias. Performance and detection bias are not easily avoidable in RCTs testing new devices or techniques in surgery or endoscopy, given the nature of the intervention under study, and this represents a limitation in particular when considering subjective outcomes such as improvement of symptoms in patients with achalasia. Assessment of vital physiologic outcomes by blinded readers can overcome limitations related to blinding in these trials; however, these outcomes were infrequently and/or inconsistently reported in trials data which limit ability to compare post-treatment physiologic efficacy. Similarly, treatment-related adverse events were poorly reported and a thorough assessment of risk-benefit profile could not be performed. Finally, inherent to network meta-analyses is risk of misinterpretation due to conceptual heterogeneity, related to differences in participants, interventions, co-interventions/ background treatment, and outcome assessment, which may limit comparability of trials; these cannot be adequately accounted for with study-level synthesis, and individual participant-level pooled analyses will be needed.
In conclusion, based on network meta-analysis, POEM and LHM may be comparable to each other, and both may be more effective than PD, in the definitive management of treatment-naïve patients with achalasia. While the risk of GERD is higher with POEM, overall rate of severe erosive esophagitis is low. Rate of treatment-related serious adverse events may be lower with POEM versus LHM or PD. Future prospective studies comparing long-term efficacy and safety of POEM, LHM, and PD, particularly across specific achalasia subtypes are warranted.