Background

Idiopathic pulmonary fibrosis (IPF) is a progressive interstitial pneumonia of unknown cause that usually affects older adults and is associated with a median survival of 3–5 years after the time of diagnosis [1, 2]. The diagnostic criteria, clinical characteristics, and natural course of the disease have been well defined in recent evidence-based guidelines for the diagnosis and management of IPF [2]. IPF manifests with worsening dyspnea and a high degree of morbidity experienced by patients [1]. Patients with IPF often experience a step-wise decline in pulmonary function test (PFT) parameters and clinical symptoms, and acute exacerbations are associated with increased mortality. Until recently, despite an increasing number of clinical trials, no intervention, other than lung transplantation, had demonstrated an enhanced survival in patients with IPF [2]. However, recent large scale randomized controlled trials (RCTs) of a few novel agents have demonstrated a decreased rate of disease progression as measured by forced vital capacity (FVC) in well-defined patients with IPF [35].

The choice of first line treatment is best addressed by direct comparisons of treatment regimens in high quality studies, but such studies do not yet exist for IPF. Previous systematic reviews and meta-analyses have relied on direct comparisons [6, 7]. A recently published multiple comparison analysis showed the potential benefit of nintedanib and pirfenidone compared to other treatment interventions [8]. Further, based on indirect comparison, results suggested that nintedanib might be superior to pirfenidone in slowing the rate of FVC decline [8]. This review had limitations as it focused on only a select number of interventions (three in total, including N-acetylcysteine monotherapy, nintedanib, and pirfenidone), which limited the evidence to a fraction of that available. More importantly, it focused on the outcome of FVC, a correlate for survival [9], and, due to its variable reporting across included studies (including FVC outcome measures such as percent change, percent predicted, volume change, etc.), the analysis relied on standardized mean differences, which limit application in decision-making [8].

We performed a multiple treatment comparison based on a network meta-analysis considering both direct and indirect comparisons of 10 treatment interventions that have been tested in RCT of patients with well-defined IPF. We focused on mortality and SAEs, as these outcomes are clinically relevant and meaningful to patients.

Methods

We conducted this systematic review to inform the clinical practice guidelines for the pharmacologic treatment of patients with IPF sponsored by the American Thoracic Society, European Respiratory Society, Japanese Respiratory Society, and the Asociacion Latinoamericana de Torax Society [10]. This multiple comparison network meta-analysis (NMA) followed the guideline development process and was independent of it in that the results of this NMA were not available for the formulation of the guidelines.

For the previous guideline document, published in 2011, we had performed an evidence synthesis of treatment interventions for IPF [2]. For this NMA, we updated the 2010 review and searched for more recent publications only. We utilized the Ovid platform to search MEDLINE, EMBASE, Cochrane Registry of Controlled Trials, Health Technology Assessment, and the Database of Abstracts of Reviews of Affects for the period of May 2010 (the date since the last search) through August 2015 (see Appendix for search strategy). Reviewers (BR, CC, YZ) contacted experts and reviewed previous meta-analyses for additional articles.

Three reviewers (BR, CC, YZ) screened the titles and abstracts in duplicate to determine potential eligibility and entries identified by any reviewer proceeded to the full-text eligibility review. Pre-tested eligibility forms were used for full text review, which was also performed in duplicate, with a third adjudicator (HJS) helping to reach consensus in situations of disagreement. We included parallel group RCTs, including factorial designs, but excluded quasi-randomized and cross-over trials. No language restrictions were applied. Studies were only included if they involved adult (≥18 years of age) patients with IPF as defined by the 2011 criteria [2]. Studies that included patients with other confounding respiratory conditions and idiopathic interstitial idiopathic pneumonia other than IPF were excluded. Studies had to examine treatment with one of the 10 identified interventions of interest included in the guideline update (ambrisentan, bosentan, imatinib, macitentan, N-acetylcysteine, nintedanib, pirfenidone, sildenafil, prednisone/azathioprine/N-acetylcysteine triple therapy, and vitamin K antagonist) compared with one of the other interventions or placebo. We focused on mortality and rates of severe adverse events (SAEs) as data for these outcomes were considered important to patients and widely available across RCT.

Data was abstracted in duplicate and authors of primary publications were contacted when required for missing or unclear information. Individual study risk of bias (RoB) was assessed independently and in duplicate. Reviewers assessed RoB using a tool modified from that recommended by the Cochrane Collaboration [11, 12]. For each included study we provided a judgment of ‘low RoB’, ‘probably low RoB’, ‘probably high RoB’, or ‘high RoB’ for each of the following items: randomization sequence generation, randomization concealment, blinding, incomplete data, selective reporting, and other bias (including lack of intention-to-treat analysis). The overall rating of RoB for each individual study was the lowest of the ratings for any of the RoB criteria.

Heterogeneity in treatment effects was evaluated by estimating the variance between studies, and through Cochrane Q-test and I2 [1315] when at least two studies were available for each pairwise comparison. Under a Bayesian framework, we used a Markov Chain Monte Carlo algorithm to carry out a random effects NMA, where binomial distribution was used for the number of mortality or SAE events within studies. Multiple treatment NMAs allows for the combination of direct and indirect evidence into a combined overall point estimate. We also performed a post-hoc subgroup analysis excluding two trials with follow-up of only 6 months duration, both of which examined sildenafil treatment (with placebo).

We report odds ratios (OR) and their corresponding 95 % credibility intervals (CrI), which are the Bayesian analog of the 95 % confidence intervals [16]. The ORs reported are relative effects of IPF treatments in reducing mortality or SAEs in IPF patients within (an average of) 1 year. Vague (non-informative) priors were used for model parameters and convergence was assessed using Brooks Gelman Rubin plots [17], as well as trace and time-series plots. Goodness-of-fit was evaluated using the mean residual deviance and the surface under the cumulative ranking curve (SUCRA) was employed to rank the treatments [18]. SUCRA is generated based on cumulative probability plots, an intervention which always ranks first would have a SUCRA value of one, whereas one that always ranks last would have a value of zero. We also generated clustered ranking plot of the network based on cluster analysis of SUCRA values for the two outcomes (mortality or SAE). This exploratory plot allows for identification of clusters of treatments that have similar effectiveness and safety profiles [19]. The Bayesian network meta-analysis was conducted using the R statistical package.

The Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach specific to NMA served to assess the certainty in the evidence (quality of evidence) associated with specific comparisons, including direct, indirect, and final network meta-analysis estimates [20]. Our confidence assessment addressed the RoB (in individual studies), imprecision, inconsistency (heterogeneity in estimates of effect across studies), indirectness (related to the question or due to intransitivity), and publication bias [20]. Incoherence assessment was not needed in this analysis as all estimates included only direct (interventions vs. placebo) or only indirect evidence (for all other comparisons). For direct comparisons, the starting point for certainty in estimates was ‘high’ and for indirect comparisons we lowered the starting certainty to ‘moderate’. The certainty in indirect estimates was inferred from examination of the connecting network loops associated with the particular comparison. The certainty rating chosen was the lowest of the direct estimates contributing to the indirect comparison. The judgment of precision was based on the credible interval around the point estimate from the indirect comparison. Publication bias could not be formally assessed based on statistical criteria due to the small number of studies included in the direct comparisons. Although the potential for this bias is real given the small number of studies and the for-profit interest, we did not believe this concern was sufficient enough to further downgrade the certainty in the evidence.

Results

A total of 9,933 titles were identified during the primary search (Fig. 1), and were combined with 346 studies found through screening titles included in the previous iteration of the IPF guidelines. Of these 10,279 references, 10,225 were judged as ineligible on the basis of titles and abstracts, leaving 54 studies for full text review, of which 35 proved ineligible, leaving 19 eligible RCTs that were included in the final analysis [35, 2135].

Fig. 1
figure 1

Flow chart of search results

Table 1 summarizes the characteristics of these 19 RCTs involving 5,694 adults. All trials examined patients diagnosed with IPF according to current international diagnostic criteria [2]. Most trials focused on patients with mild or moderate impairment in PFTs or other clinical parameters used to exclude patients with severe functional impairment as a result of their lung disease.

Table 1 Study characteristics

Mortality

Table 2 shows the NMA and mortality results. The results demonstrate lower mortality associated with sildenafil treatment compared to ambrisentan (NMA OR, 0.12; 95 % CrI, 0.01–0.78, moderate quality of evidence), triple therapy (NMA OR, 0.02; 95 % CrI, 0.01–0.30, moderate quality of evidence), and vitamin K antagonists (VKA) (NMA, OR 0.05; 95 % CrI, 0.01–0.37, moderate certainty in the evidence). Similarly, pirfenidone is associated with a mortality benefit when compared to ambrisentan (NMA OR, 0.28; 95 % CrI, 0.07–0.93, moderate certainty in the evidence), triple therapy (NMA OR, 0.05; 95 % CrI, 0.01–0.44, moderate certainty in the evidence), and VKA (NMA OR, 0.10; 95 % CrI, 0.02–0.47, moderate certainty in the evidence). Nintedanib is beneficial in terms of mortality when compared to only triple therapy (NMA OR, 0.05; 95 % CrI, 0.01–0.49, moderate certainty in the evidence) and VKA (NMA OR, 0.11; 95 % CrI, 0.02–0.54, moderate certainty in the evidence).

Table 2 Estimates of effects (with 95 % credible intervals) and confidence ratings for comparisons of therapeutic agents for the treatment of idiopathic pulmonary fibrosis (IPF) on the outcome mortality

We found no significant difference when comparing sildenafil to pirfenidone (NMA OR, 0.44; 95 % CrI, 0.08–2.28, moderate certainty in the evidence) or nintedanib (NMA OR, 0.42; 95 % CrI, 0.07–2.13, moderate certainty in the evidence), or when comparing pirfenidone to nintedanib (NMA OR, 0.95; 95 % CrI, 0.36–2.24, moderate certainty in the evidence). Triple therapy is significantly worse than most interventions including imatinib (NMA OR, 16.00; 95 % CrI, 1.43–730.7, moderate certainty in the evidence), NAC monotherapy (NMA OR, 11.84; 95 % CrI, 1.19–480.3, moderate certainty in the evidence), and placebo (NMA OR, 12.52; 95 % CrI, 1.58–444.4, moderate certainty in the evidence), in addition to those listed above. VKA also was associated with a higher mortality compared with imatinib (NMA OR, 7.92; 95 % CrI, 1.17–65.39, moderate certainty in the evidence), NAC monotherapy (NMA OR, 5.80; 95 % CrI, 1.08–38.11, moderate certainty in the evidence), bosentan (NMA OR, 6.46; 95 % CrI, 1.35–43.69, moderate certainty in the evidence), and placebo (NMA OR, 6.14; 95 % CrI, 1.49–35.13, moderate certainty in the evidence) in addition to those listed above.

SUCRA analysis (Table 3) suggested nintedanib, pirfenidone, and sildenafil as the three treatments with the highest probability of reducing mortality in IPF. Subgroup analysis, excluding two trials of sildenafil with only 6-month follow-up, showed nintedanib and pirfenidone to be the two treatments with the highest probability of being efficacious compared with other included interventions.

Table 3 Surface under the cumulative ranking curve (SUCRA) data for the outcomes of mortality and severe adverse events

Severe adverse events (SAEs)

Four of the 19 trials did not report SAEs and were therefore not included in this analysis [25, 27, 29, 36]. Table 4 shows the NMA and SAE results. Triple therapy showed a significant increase in SAEs compared with bosentan (NMA OR, 4.94; 95 % CrI, 1.52–17.70, low certainty in the evidence), imatinib (NMA OR, 4.35; 95 % CrI, 1.05–20.05, low certainty in the evidence), macitentan (NMA OR, 4.74; 95 % CrI, 1.18–20.63, low certainty in the evidence), nintedanib (NMA OR, 4.35; 95 % CrI, 1.36–15.47, low certainty in the evidence), pirfenidone (NMA OR, 4.17; 95 % CrI, 1.29–14.51, low certainty in the evidence), sildenafil (NMA OR, 4.91; 95 % CrI, 1.11–22.48, low certainty in the evidence), and placebo (NMA OR, 4.15; 95 % CrI, 1.43–12.88, low certainty in the evidence).

Table 4 Estimates of effects (with 95 % credible intervals) and confidence ratings for comparisons of therapeutic agents for the treatment of idiopathic pulmonary fibrosis (IPF) on the outcome severe adverse events (SAEs)

SUCRA analysis (Table 4) suggested that bosentan, macitentan, and sildenafil had the lowest risk of SAEs. Nintedanib and pirfenidone were ranked fourth and sixth, respectively. VKA and triple therapy were the two lowest ranked interventions with the highest probability of causing SAEs. Subgroup analysis, excluding two trials of sildenafil with only 6-month follow-up, demonstrated very similar results.

SUCRA cluster

Figure 2 shows a scatterplot including SUCRA value for mortality on the y-axis and SUCRA value for SAEs on the x-axis. Cluster analysis demonstrates the division of treatments into two distinct groupings. One cluster of interventions, which includes ambrisentan, triple therapy, and VKA, has lower SUCRA values for both outcomes compared with the other grouping.

Fig. 2
figure 2

Scatterplot including surface under cumulative ranking curve (SUCRA) value for mortality on the y-axis and SUCRA value for severe adverse events (SAEs) on the x-axis. A higher SUCRA ranking for mortality indicates better survival whereas a higher SUCRA ranking for SAEs indicates fewer events associated with treatment. Cluster analysis demonstrates the division of treatments into two distinct groupings

Discussion

The results of this NMA highlight potentially important differences in mortality and SAEs between different treatment interventions for IPF. Our findings suggest a possible mortality advantage of nintedanib, pirfenidone, and sildenafil compared to other treatments. Focusing on longer-term mortality data, by excluding the two trials of sildenafil with 6-month follow-up, we observed the potential survival benefit of nintedanib and pirfenidone compared to other treatment interventions. No significant difference was seen when comparing these two treatments to each other.

The strengths of this systematic review and NMA include the inclusion of RCTs that address a precise clinical question with well-defined IPF patients, focusing on outcomes that are important to patients. We conducted a comprehensive search and RoB assessment, with both processes involving duplicate review and third party adjudication if necessary. Using rigorous NMA methods [16], we used indirect evidence to compare the efficacy and safety profiles of active therapeutic agents investigated in patients with IPF, which allowed for assessment of comparative efficacy between IPF treatment interventions, providing the best estimates of effect. The GRADE approach also allowed reporting of the certainty in the evidence when interpreting each unique treatment comparison and across the network.

The benefits of any intervention must be weighed against potential harms or adverse effects. Although both pirfenidone and nintedanib are associated with SAEs, being primarily dermatologic manifestations and gastrointestinal disturbances, neither proved significantly worse than any other intervention. The SUCRA rankings for these interventions suggested that, although they were not likely to be the best options in terms of avoiding SAEs, they were not in the bottom of the rankings either. The balance between benefit and harm is demonstrated in Fig. 2 where treatments found in the upper right of the graph, such as nintedanib, pirfenidone, and sildenafil, are beneficial in terms of both mortality and SAE rates compared to other active interventions. The results further suggest that certain interventions for IPF, specifically triple therapy, VKAs, and ambrisentan, are associated with an increased risk of SAE with no demonstrated benefit.

The limitations of our review include the small number of studies relative to the number of comparisons considered, resulting in low certainty in estimates for many key comparisons. Although all included studies examined only IPF patients, there was also some heterogeneity in disease severity as assessed by PFTs, radiologic assessment, and follow-up time across studies. To incorporate heterogeneity in treatment effects, we employed random-effect assumptions. The subgroup analysis was also performed to examine the impact of including trials with shorter duration of follow-up. We were unable to perform the NMA of some other patient important outcomes such as quality of life indices, 6-minute walk test, or acute exacerbation rate due to the differential reporting of these outcomes across included studies and the relative inaccessibility of the primary data. Applying the NMA model to the limited number of studies that included these outcomes would lead to very imprecise and non-informative results. Therefore, it is possible that minimal important differences in treatment effects concerning other patient-important outcomes were missed [37].

Conclusions

This NMA provides the best available estimates of treatment effect on overall mortality for IPF interventions combining all available evidence. It is the first analysis to provide comparative efficacy for patient important outcomes from interventions in IPF. Results suggest greater benefits of nintedanib and pirfenidone compared to other treatments, while no significant difference was seen when comparing these two interventions. Ambrisentan, VKA, and triple therapy are associated with harm and had no demonstrated benefit. However, given the limitations and low certainty in the evidence for most comparisons, conclusions should be interpreted with caution and clinical decision-making must be informed by the results of future head-to-head RCTs to confirm or refute these findings.