Dear Editor,


We would like to thank you for allowing us the opportunity to respond to the letter received from Marshall et al. and to address the queries raised with respect to our network meta-analysis (NMA) of fluticasone furoate/umeclidinium/vilanterol (FF/UMEC/VI) triple therapy compared with other therapies for the treatment of chronic obstructive pulmonary disease (COPD) [1].

To minimize clinical heterogeneity across COPD studies, we conducted two separate NMAs [1, 2] and presented the data in parallel in the same journal. The dual therapy NMA [2] included studies from patients who were mostly symptomatic with infrequent exacerbations to ensure that the long-acting β2-agonist (LABA)/long-acting muscarinic antagonist (LAMA) comparisons were performed in populations similar to those in which they were indicated for use in routine practice. In the triple therapy NMA [1], we focused on patients with moderate-to-very-severe COPD. This was to ensure that the comparisons of the triple therapy class were performed in populations that were consistent with the licensed indication for COPD triple therapy. We also limited the studies to randomized controlled trials (RCTs) to minimize methodological heterogeneity. Most of the included studies were high-quality registration trials, which strengthens the assumption of similarity or exchangeability [3,4,5]. An important consideration when conducting an NMA is the selection of a model that best fits the distribution of the data and takes into account any potential sources of error/heterogeneity [6]. As noted, outcomes of our NMA were originally reported using a fixed-effects (FE) model. The FE model was considered the most appropriate due to the low expected heterogeneity between studies and the small number of studies reporting some of the outcome variables. Random-effects (RE) models are not recommended where there are too few studies for an accurate estimate of between-study variance to be made [6].

Some heterogeneity exists and is a known and accepted characteristic of all NMAs. As part of the feasibility assessment for this NMA, covariates were compared over the studies, finding that the similarity assumption held, and differences between covariates were acceptable to allow pooling. Important covariates assessed were sex, age, smoking status, disease severity, number of exacerbations in the previous year, percentage of inhaled corticosteroid (ICS) users at baseline, and COPD duration in years. In this NMA, we also performed and reported statistical tests to quantify heterogeneity within our selected studies, namely a chi-squared test and Higgins I2 test, both of which are common and well-established. For the majority of analyses, I2 showed a mild-to-moderate amount of heterogeneity (0–50%) [7]. For the exacerbation analyses, I2 was higher than the other analyses; the source of heterogeneity was investigated through sensitivity analysis, such as excluding open-label studies and studies with a duration of follow-up of less than 24 weeks. The overall findings remained unchanged. As an alternative, meta-regression could be conducted to explore the sources of heterogeneity by excluding studies from the networks of evidence, and comparison of baseline characteristics. Meta-regression, however, cannot be conducted if the evidence base includes less than 10 studies [8].

Although RE models are considered appropriate in some cases where statistically significant heterogeneity may be present, starting with the FE model and moving to the RE model (in analyses where the test of heterogeneity is high) is discouraged [6]. In consideration of the above, we chose to report outcomes using an FE model for all analyses. Heterogeneity in study design and/or clinical characteristics of participants from individual studies was highlighted as a potential limitation of the analysis in our publication.

Marshall et al. appear to indicate that using RE models will resolve statistical heterogeneity. In our experience, the heterogeneity statistics do not change when switching from RE to FE models; the Q-statistics and I2 statistics are all identical for both modeling approaches. For the primary endpoint of our study (mean change from baseline in trough forced expiratory volume in 1 second [FEV1] at 24 weeks), the result of the NMA using the RE model was identical to the result when using the FE model. An examination of the FEV1 results generated from the triple therapy studies NMA clearly shows that the results are consistent with the magnitude and direction of results seen in AERISTO [9] (glycopyrrolate [GLY]/formoterol fumarate [FOR] vs. UMEC/VI) even though the AERISTO trial was not included in our NMA. This helps to provide additional internal validity of the results of our NMA.

For moderate/severe exacerbation analysis, the incident rate ratios for comparators versus FF/UMEC/VI are similar using FE versus RE models, and the order of the P-score rankings are similar; however, the confidence intervals were wider (due to the number of small studies in the network) when using the RE model, despite the similar values. Taking into account between-study variance, RE models produce a “less precise” and more conservative estimate of combined effect size compared with an FE model [6].

Marshall et al. questioned the inclusion of the KRONOS and FULFIL trials in our NMA. The patient populations included in these two studies are broadly similar in relation to many parameters, including the patient age, sex, current smokers, post-bronchodilator FEV1% predicted, and percentage of ICS users at study entry [10, 11]. In addition, these two studies have the same duration and measured similar outcomes using a similar statistical hierarchy. The only aspect in which KRONOS may be considered different to FULFIL, based on their inclusion criteria, is the number of prior exacerbations of the patients at baseline (65% of patients with ≥ 1 exacerbation in FULFIL vs. 26% in KRONOS). However, this fact did not seem to differentially affect the exacerbation rate experienced during the study duration. In fact, there is an opposite trend observed in the exacerbation rates recorded with the common comparator (budesonide [BUD]/FOR dry powder inhaler). In FULFIL, 0.34 and 0.36 exacerbations occurred by 24 and 52 weeks, respectively, versus 0.55 in KRONOS (annual rate of moderate/severe exacerbations based on 24-week core phase). Thus, we cannot assume that this difference has any significant effect on NMA findings. This is likely the reason why these two studies have also been included in other NMAs.

As part of the study selection for this triple therapy NMA, we included studies conducted in populations that were consistent with the licensed indication for COPD triple therapy: “indicated as a maintenance treatment in adult patients with moderate-to-severe COPD who are not adequately treated by a combination of an ICS and a LABA or combination of a LABA and a LAMA” [12,13,14]. The comprehensive systematic literature review (SLR) identified RCTs conducted in adults aged ≥ 40 years with a COPD diagnosis, including the relevant triple therapy studies that are presented specifically within the respective Summary of Product Characteristics (SmPCs) of the licensed triple therapies. It should be noted that both pivotal registration trials, ETHOS and KRONOS, are included within the BUD/GLY/FOR SmPC (Trixeo Aerosphere SmPC) [13], and are presented as supportive of the efficacy with respect to lung function and moderate/severe exacerbations. It would therefore be remiss to exclude either of these key registration studies. All the studies that met the following inclusion criteria were included in the SLR:

  • Study designs: RCTs with a minimum duration of 8 weeks;

  • Population: adults aged ≥ 40 years with a moderate-to-severe COPD diagnosis as defined by Global Initiative for Chronic Obstructive Lung Disease guidelines or any other major guidelines;

  • Interventions: triple therapy combinations, these being combinations of three molecule classes (ICS, LABA, and LAMA) either as single- or multiple-inhaler triple therapy;

  • Comparators: studies that compare treatments of interest (above) to any therapy (including combination therapies) licensed for the treatment of COPD in any country;

  • Outcomes: the outcomes of interest include lung function (trough FEV1), annual or annualized exacerbation rates, health-related quality of life (St George’s Respiratory Questionnaire score), Transition Dyspnea Index, rescue medication use, and adverse events;

  • Database search date limits: March 3, 2017–October 16, 2020.

Marshall et al. commented on the differences between our findings and those of four other NMA studies [13,14,15,16]. The Ferguson et al. [17] and Bourdin et al. [16] publications assume therapeutic equivalency of the ICS/LABA or LAMA/LABA therapies (an assumption that is not supported by head-to-head evidence), which may contribute towards differences in findings and the magnitude of effects reported. These assumptions will allow weaker ICS/LABA or LAMA/LABA combinations to “borrow efficacy” from the stronger ones. These two assumptions would mean that we are testing the benefit of adding different LAMA (to ICS/LABA) or different ICS (to LAMA/LABA). Differences between alternative LAMA/LABA combinations [2, 9, 17, 18] or ICS/LABA combinations [19, 20] have previously been demonstrated, and it has been suggested that the clinical impact of triple combination therapies may be modulated by the pharmacological characteristics of the individual components [14]. We also propose that differences in efficacy observed between triple therapies in our NMA may be attributed to differences in the component molecules. Further to this, in one of the cited NMA studies [13, 14], data from the phase 3 ETHOS trial, representing a large body of evidence for BUD/GLY/FOR efficacy, were not included in the analysis, which is an important consideration when comparing outcomes.

The NMA studies mentioned by Marshall et al. [13,14,15,16] presented results from a Bayesian RE model. These studies have not presented sufficient details on their statistical analyses to facilitate reproducibility of the models used. One of the reasons for selecting a Frequentist model rather than a Bayesian model for our NMA was to facilitate reproducibility and ensure that the data and method for our NMA are consistent with the method used in the primary clinical trials included in our NMA. All the original clinical trials that met our eligibility criteria for our NMA used a Frequentist method.

One of the strengths of our NMA is the acknowledgment that we have not made any assumptions that are not substantiated by the available data. No pooling of the ICS/LABA or LAMA/LABA make it easier to compare direct and indirect results generated within the triple therapy NMA. For this reason, we have presented our NMA results separately for dual and triple therapies. The FEV1 results generated from the triple therapy NMA clearly show that the results are consistent with the magnitude and direction of results seen in AERISTO [9], even though the AERISTO trial was not included in the triple therapy NMA. Furthermore, the NMA generated consistent results with the known head-to-head trials [10, 21, 22].

In summary, we believe that we have selected the most appropriate model for the analysis. Regardless of which model is used, the data are either very similar or identical, and remain suggestive of favorable efficacy with single-inhaler triple therapy comprising FF/UMEC/VI versus other single- or multiple-inhaler triple therapies.