Background

Chronic obstructive pulmonary disease (COPD) is an important cause of morbidity across the world, and the third leading cause of death globally [1, 2]. Primary prevention by a combination of reducing tobacco exposure, decreasing contact with biomass fuels and noxious gases together with improved child health are the most effective ways of decreasing this burden in the longer term, although it takes time for the benefits of interventions on mortality to become apparent [3, 4]. In patients with symptomatic COPD the impact of specific medications on decreasing the risk of dying is an important consideration and merits scientific consideration. The evidence on mortality reduction from individual clinical trials in COPD is inconclusive with relatively few studies of duration and sample size sufficient to demonstrate an impact [5].

Network meta-analysis (NMA) provides a statistical approach to combining direct and indirect trial evidence to generate relative treatment effects between different drugs on outcomes of interest. In the absence of head-to-head trials including all comparators, NMA has been recommended by reimbursement agencies in the UK and Germany [6, 7] and endorsed by influential bodies such as ISPOR [8]. NMA has been applied to COPD mortality data on two previous occasions [9, 10].

We conducted a systematic review and network meta-analysis (NMA) designed to assess whether pharmacotherapy affects mortality reported in COPD clinical trials. NMA was then used to allow all treatment options to be compared in a single analysis [1113]. The analysis combines survival data reported in two different forms: total number of deaths (r) from (n) subjects subsequently referred to as the ‘binary endpoint’, and hazard ratios which describe the impact of treatment on time to death and account for censoring. Although hazard ratios are more informative they are not reported in all studies and the inclusion of binary data enables the maximum number of trials to be included. Sensitivity analyses permitted us to analyse the robustness of the results to various assumptions supporting the base case analysis.

Primarily, our objective was to estimate the impact of specific COPD treatments on patient mortality using NMA. Secondly, we explored the strengths and limitations of undertaking and interpreting NMA in this context.

Methods

Systematic review

A systematic review was conducted to identify randomised, blinded trials of COPD patients treated with tiotropium, beclomethasone, budesonide, fluticasone propionate, triamcinolone, bambuterol, formoterol, salmeterol, salbutamol, indacaterol, theophylline, roflumilast, indacaterol maleate, ipratropium bromide, vilanterol trifenatate, fluticasone furoate or placebo. Dosing and administration method were not specified in the inclusion criteria. Combinations of the listed interventions were allowed; dose comparison studies were not included unless another listed intervention was also incorporated in the study. Studies were required to report all-cause mortality in binary or hazard ratio form for at least 24 weeks of follow-up; mortality could be reported as a study outcome or as a serious adverse event. Only English language full publications were included.

EMBASE (1988), MEDLINE, MEDLINE In-Progress (1946) and CENTRAL (1898) were searched from database inception to October 2012. Searches combined controlled-vocabulary and free-text terms for COPD and the treatments of interest; RCT filters were used in EMBASE and MEDLINE. Full publications were reviewed for inclusion by two analysts (JT and JC). Data was extracted from eligible trials by one analyst with validation conducted by the second analyst. Dosages of the same therapy were combined for the purposes of the analysis (indacaterol (150 μg od, 300 μg od, 600 μg od), budesonide (200 μg bid, 400 μg bid, 1200 μg od for 6 months followed by 800 μg od for 30 months), fluticasone propionate (250 μg bid, 500 μg bid), salmeterol (50 μg bid, 100 μg bid), formoterol (6 μg bid, 12 μg bid, 24 μg bid) and salmeterol fluticasone propionate combination (SFC) (50/ 250 μg bid, 50/ 500 μg bid)). The Cochrane risk of bias tool was used to assess methods of randomisation, allocation concealment, blinding, patient follow-up and incomplete reporting [14].

Statistical analysis

Binary mortality data: total number of deaths (r) from (n) subjects, and hazard ratios with reported confidence intervals from published studies meeting our inclusion criteria were used as inputs to the analysis. We preferred hazard ratios over binary data where reported. Hazard ratios were taken from the Cox proportional-hazards models as these were consistently reported, in particular the Cox model for TORCH was preferred over that calculated directly from the Kaplan Meier in accordance with the other studies which reported HRs. Hazard ratio data and binary data were combined using the methodology established in Woods [15] which also appropriately incorporates multi-arm trials. Estimated treatment effects were synthesised using network meta-analysis (NMA) in a Bayesian multilevel framework. This method allows simultaneous comparison of outcomes of multiple treatments from trials comparing different sets of treatment options (providing a connected network of treatments can be formed) whilst retaining within-trial randomisation. A study protocol was written and reviewed prior to initiating the systematic review and analysis. Full details of the statistical method and the model code are provided in the Additional file 1.

The base case analysis included all RCTs meeting our inclusion criteria using the intention to treat (ITT) results from these studies, combining different licensed doses of the same medicines as single comparators. Results are presented for active treatments relative to placebo (reference).

Sensitivity analyses

The following pre-planned analyses were conducted to examine the sensitivity of the study results to various assumptions:

  1. (A)

    Including on-treatment (OT) mortality (excluding deaths that occurred to patients who ceased to receive the allocated study treatment) results in preference to ITT results where available.

  2. (B)

    Meta-regression controlling for differences in COPD severity assuming a common covariable effect across treatments (assessed by baseline FEV1 % predicted – mean value per study)

  3. (C)

    Excluding studies where patients had high lung function at baseline (mean FEV1 % predicted >65 %)

  4. (D)

    Excluding studies where patients received unlicensed doses

  5. (E)

    Excluding studies of less than 48 weeks duration

  6. (F)

    Excluding studies not powered to detect a difference in mortality

  7. (G)

    Excluding studies that failed to meet our specified quality assessment criteria (i.e. 2 or more components of the assessment had a high or unclear risk of bias) as assessed by the Cochrane Collaboration risk of bias [14]

  8. (H)

    Including studies from the Dong [10] NMA for which mortality data were unavailable in the primary publication. Dong [10] cited a variety of sources for these data, including contacting the study authors and searching website and clinical trial registers; these were not included in the present base case analysis

  9. (I)

    Separating patients treated with tiotropium by type of inhaler used (SoftMist or HandiHaler). Safety concerns (increased mortality risk) had at the time of the present analysis been raised around the SoftMist inhaler [16]. We also incorporated the results of TIOSPIR [17], a RCT of over 17,000 subjects designed to evaluate efficacy and safety of the two different inhalers, in this sensitivity analysis. TIOSPIR was not published until the final writing up of the present study.

Statistical models were fitted using WinBUGS [18]. As the present study is a Bayesian analysis we refer to credible intervals (the probability that the true value is contained within the interval) rather than confidence intervals; instead of statistically significant differences, we refer to important differences (95 % credible interval for hazard ratio does not cross 1.0).

Both fixed and random effect models were fitted. Fixed effect assumes there is one true effect of each treatment and that variation around this is attributed to chance whilst random effects assume a distribution of effects and that variance between studies is attributed to heterogeneity. Larger studies are thus attached relatively less weight in random effects model [19]. The Deviance Information Criteria (DIC) was calculated for each model and used to assess whether any model should be preferred [20]. Each model was run for a burn-in period of 40,000 simulations, which were then discarded, with parameter nodes monitored for a further 200,000 simulations. Caterpillar and Brooks - Gelman - Rubin (BGR) plots were used to compare results obtained using different initial values, thus ensuring that the models had converged [21].

Results and discussion

Systematic review

The systematic review identified 42 studies reporting all-cause mortality in COPD patients (Fig. 1; reasons for excluding full publications: Additional file 2: Table S1). Demographic characteristics of subjects (age, gender) are reported in Table 1; the impact of differences in baseline FEV1 % predicted is assessed in sensitivity analyses B and C. The proportion of current smokers was similar across trials, but three trials (all with patients with less impairment of lung function) reported levels in excess of 75 % [2224].

Fig. 1
figure 1

PRISMA diagram showing inclusion of studies at each stage of the systematic review and network meta-analysis

Table 1 Baseline characteristics of included studies and all-cause mortality (binary data)

Assessment of study quality using the Cochrane risk of bias tool found that the quality of study reporting was generally high (Additional file 2: Table S2). Although all trials were randomised, 17 did not adequately describe the method of randomisation; [22, 2439] and two studies did not adequately describe methods for allocation concealment [37, 39]. With the exception of FICOPD II where the theophylline arm (not included in analysis) was open-label [36], all studies were double-blind. Reporting of loss to follow-up was unclear in 17 studies; [22, 23, 27, 29, 30, 32, 34, 37, 4047] imbalanced dropouts between the treatment groups in two studies was considered to result in a high risk of bias for the reported outcome data [39, 48]. In nine studies two or more components of the assessment were found to be potentially associated with an unclear or high risk of bias [22, 24, 27, 29, 30, 32, 34, 37, 39]. This was thought to reflect incomplete reporting rather than underlying methodological weakness in many cases.

Studies included in the analysis

Two studies were excluded from the statistical analysis. Campbell [49], was excluded since the treatment arms in this trial (formoterol + formoterol as needed, formoterol + terbutaline as needed, placebo + terbutaline as needed) were not included in any of the other trials analysed, and therefore did not link to the evidence network. Similarly, Kerstjens [41], comparing terbutaline with ipratropium bromide + terbutaline and beclomethasone + terbutaline, did not connect to the main evidence network. Two treatments were excluded from the statistical analysis. Theophylline was included in a single trial, FICOPD II (Rossi [36]), which reported no deaths, and so it was not possible for a hazard ratio to be estimated for this treatment. Similarly, the only trial including tiotropium + formoterol combination (Vogelmeier [38]) did not report any deaths for this arm, which was therefore also excluded from the analysis. The other treatment arms of these studies were included in the analysis.

The statistical analysis was based on 40 RCTs including 55,220 randomised subjects and 88,261 person years of experience, allowing the comparison of 14 treatments. Figure 2 shows the base case evidence network weighted by the number of person-years of follow up for each within-trial comparison. Reported binary mortality outcomes are presented in Table 1 and hazard ratios in Table 2. In the base case analysis hazard ratios for all-cause mortality were available for three studies and binary data were available for the remaining 37 studies.

Fig. 2
figure 2

Base case evidence network. The width of the lines are proportional to the total person years of follow-up for all trials informing that comparison

Table 2 All-cause mortality (hazard ratios) of included studies

Base case results

Results from the fixed and random effects base case analysis are presented in Fig. 3. Hazard ratios for each treatment are compared to placebo; a hazard ratio below 1.0 indicates that the treatment is associated with reduced mortality compared to placebo. There was no evidence to suggest that the random effects model was a better fit than the fixed effects model; a difference in DIC of 2–3 is required to be indicative of improved model fit [20]. However, if we believe there is true heterogeneity between the trials, the random effects model would be more appropriate.

Fig. 3
figure 3

Forest plot of results of network meta-analysis. Hazard ratios compared to placebo (DIC 431.9 FE, 431.5 RE). SFC = Salmeterol fluticasone propionate combination; CrI = credible interval; Doses were pooled for the purpose of the analysis: indacaterol (150 μg od, 300 μg od), budesonide (200 μg bid, 400 μg bid, 1200 μg od for 6 months followed by 800 μg od for 30 months), fluticasone propionate (250 μg bid, 500 μg bid), salmeterol (50 μg bid, 100 μg bid), formoterol (6 μg bid, 12 μg bid, 24 μg bid) and salmeterol fluticasone propionate combination (SFC) (50/250 μg bid, 50/500 μg bid)

Two interventions produced a hazard ratio relative to placebo that did not cross 1.0 using the fixed effects model. SFC was associated with a reduction in mortality of 21 % (HR 0.79; 95 % CrI 0.67, 0.94) and indacaterol with a mortality reduction of 72 % (HR 0.28; 95 % Crl 0.08, 0.85). Using a random effects model SFC failed to show evidence of effect (HR 0.79; 95 % CrI 0.56, 1.09). For indacaterol the result using the random effects model (HR 0.29; 95 % CrI 0.08, 0.89) was comparable to that using the fixed effects model. No evidence of effect on all-cause mortality (versus placebo) was found for other treatments. Although the results for most comparators have wide credible intervals suggesting inconclusive results, the HRs for tiotropium + salmeterol, tiotropium + SFC and beclomethasone + formoterol have particularly wide credible intervals; in each case the results are generated by single, relatively small study arms therefore the uncertainty around the estimates is high.

Sensitivity analyses

Results of the sensitivity analyses did not in general differ markedly from the base case (Additional file 2: Table S3). For SFC vs placebo the relative treatment effect improved in the fixed effects analysis when unlicensed doses were excluded, but results from the random effects model showed no evidence of effect and were similar to the base case. Similarly, the relative treatment effect for indacaterol vs placebo strengthened slightly (HR 0.17, 95 % CrI 0.03, 0.78) when studies with a shorter duration were excluded.

Conclusion

In this NMA, data from 40 trials were used to inform comparisons of mortality associated with 14 different pharmacological treatments for COPD. The method allows comparisons of treatments not compared directly within individual RCTs, and provides additional information on the relative efficacy of treatments for which direct trial comparisons are available. The results show that only indacaterol and the combination of the long-acting β2-agonist salmeterol and the inhaled corticosteroid fluticasone propionate (SFC) are associated with an important reduction in the risk of all-cause mortality in COPD in fixed effect models. Although the fixed effects model was presented as the base case there was no clear difference between the fixed and random effects models (both of which are presented). The results were consistent across a number of sensitivity analyses including controlling for disease severity.

Results for SFC are based on 233 deaths occurring in 7427 subject years. The results for indacaterol are based on four deaths occurring over 1446 subject years and have wide credible intervals. These results are sensitive to the number of deaths (a small change will have a large impact on the resulting HR) and may change with further research.

The results for many of the treatments are inconclusive, as demonstrated by the wide credible intervals exhibited around a number of the HRs. Whilst tighter credible intervals are observed around the results for tiotropium, salmeterol and fluticasone, our analysis is still inconclusive as to whether the treatments provide a greater benefit or harm to patients.

Two published NMAs have evaluated the relationship between pharmacological agents and mortality in COPD patients [9, 10]. Dong [10] considered all-cause mortality and cardiovascular death as outcomes: 42 trials published up to July 2011 were included, treatments were grouped by class (long-acting β2 agonists, inhaled corticosteroids etc.) and tiotropium was separated by inhaler type. The authors sourced trial mortality results from secondary sources. The study reported a reduction in mortality for LABAs combined with ICS compared with placebo (HR 0.80; 95 % CrI 0.67, 0.94) based on a fixed effects model. Baker [9] included 28 trials reporting the mortality published up to October 2007: treatments were grouped by class. A mortality reduction reported for LABAs in combination with ICS vs placebo (HR 0.71; 95 % CrI 0.49, 0.96) in the fixed effect model.

The present analysis included an additional 14 months of reported evidence and a wider range of treatments (roflumilast, indacaterol and triamcinolone) compared with Dong [10]. Furthermore, results were not aggregated by class. An assumption of class effects presupposes that the effect of each intervention within a class is identical. Even if the assumption holds for efficacy data it may not translate to safety data as interventions could have physiological effect other than the mechanism of action, therefore we chose estimate effects for each intervention independently [50].

Binary and hazard ratio data were combined in the same analysis, permitting the maximum number of studies to be included and using the best available data from each. We minimised the risk of errors by using data only from citable sources. Sensitivity analyses were undertaken to examine the robustness of the results to the underlying assumptions.

There are a number of limitations of this study. NMA methods depend on the assumptions that effect measures are additive on the selected scale and that relative treatment effects are comparable; [8] heterogeneity between trials may invalidate this assumption. Potential observed or unobserved differences between trials may impact on heterogeneity and thereby relative treatment effects.

The majority of the studies included were not specifically designed to capture mortality as a primary or secondary endpoint. The feasibility of conducting RCTs powered to detect differences in mortality in COPD patients is limited by the need for large sample sizes with sufficient follow-up, as well as the potential for introducing bias associated with differential dropout rates across study arms. Although this is a limitation of the current analysis, where there is an absence of head-to-head trials including all comparators, NMA is a useful tool for healthcare decision makers. In the present analysis we only included studies which reported mortality in the primary study publication. Inclusion of other studies where mortality is available in secondary publications may influence the results however the relatively small number of deaths in these trials makes this unlikely [10].

A potentially beneficial impact on mortality could be masked if a large number of studies with low or ineffectual dosages are included. Whilst there is some evidence that dose responsiveness may not be a significant factor in COPD [17, 51], this could be explored further by extending the network to incorporate dose finding studies and by implementing a three-level hierarchical NMA model with an additional level for each drug class [52].

Whilst we controlled for disease severity (recorded by baseline lung function) we did not control for other potential differences between trials which may impact on relative treatment effects (e.g. background therapy, history of exacerbations) as reporting was less consistent for these indicators.

Further work could examine baseline risk or the response in the placebo arms between studies. For example, similar rates of death per 1000 patient years (PY) were observed in the indacaterol (9.9/1000 PY), budesonide (10.0/1000 PY) and triamcinolone (11.4/1000 PY) placebo arms. Much higher rates were observed in the tiotropium (37.2/1000 PY), fluticasone propionate (43.3/1000 PY), salmeterol (47.0/1000 PY) and SFC (48.7/1000 PY) placebo arms (strongly influenced by the size and number of deaths in TORCH and UPLIFT) (Additional file 2: Table S4).

We conclude that currently available data from clinical trials in COPD suggest that some pharmacological treatments may have a significant impact on mortality, compared with placebo. In particular indacaterol and the combination of salmeterol and fluticasone propionate have shown evidence of reduction in all-cause mortality. The result for indacaterol is however based on a small number of deaths occuring to subjects receiving this therapy. Further research is warranted to strengthen our conclusions.