Introduction

Model-based meta-analysis (MBMA) is a valuable tool that provides a quantitative framework for decision-making during drug development. Specifically, a MBMA typically integrates relevant summary level patient data (e.g., efficacy and/or safety data) from treatment arms of identified randomized controlled trials (RCTs), applying pharmacological models (e.g., dose/exposure response), for overall assessment of efficacy and safety [1]. During drug development, comparative effectiveness is important in decision-making; however, conducting head-to-head trials to benchmark against competitors is impractical, expensive, and time-consuming. One of the advantages of using a MBMA, compared to conventional meta-analysis, resides in the use of less restrictive inclusion/exclusion criteria for study selection, since it can characterize the response of interest as a parametric function of time integrating information from RCTs with different designs. This means that MBMA allows the direct comparison of treatment effects in silico, even in the absence of real-life head-to-head trials, taking into account RCTs heterogeneity (e.g., patient population characteristics or trial features) through quantification of inter-study variability (ISV) and inter-arm variability (IAV) [2]. Several longitudinal MBMA models have been developed across different therapeutic areas, including rheumatoid arthritis [3, 4], psoriasis [5], osteoporosis [6] and chronic obstructive pulmonary disease (COPD) [7], amongst others [2].

COPD is an inflammatory lung disease characterized by not fully reversible airflow obstruction. According to the World Health Organization estimate, COPD is the third leading cause of death worldwide, causing 3.23 million deaths in 2019 [8]. The Global Initiative for Chronic Obstructive Lung Disease (GOLD) [9] strategy recommends the use of short or long-acting β2 agonists (SABA and LABA, respectively), short or long-acting anticholinergic (SAMA and LAAC, respectively) or combined therapy with LABA/LAAC with/without inhaled corticosteroids (ICS). Some of the most used endpoints to assess disease progression, and hence drug effect, in COPD clinical trials include forced expiratory volume in one second (FEV1) and exacerbation rate, amongst others (alongside patient reported outcomes).

A legacy MBMA in COPD patients, which relates patient and trial characteristics and dosing to FEV1, and exacerbation rate summary level data, has been developed [7, 10]. This analysis included RCTs published up until 2013 with bronchodilators and anti-inflammatories given either as mono- (82%), dual-(17%), or triple-therapy (1%) combinations. However, more recent studies [11, 12] have focused on comparing the efficacy of the inhaled triple-therapy (with the addition of ICS). For example, a phase 3 trial [11] with more than 10,000 patients prospectively identified that once daily triple-therapy [umeclidinium (UMEC)/vilanterol (VI)/fluticasone furoate (FF)] was associated with a greater reduction in exacerbation rate than either of the dual-therapies UMEC/VI or VI/FF. Thus, the legacy MBMA model can be updated to include all the new trial data published since 2013; and be used to assess comparative effectiveness of new and established COPD maintenance treatments.

This analysis aims to (i) evaluate the application and predictability of a published longitudinal MBMA COPD model for FEV1 [7] and exacerbation rate [10], (ii) update the legacy MBMA with new information (e.g., new drugs and additional dual and triple combination studies), and therefore to address more contemporary questions using a more well-informed model, and (iii) perform a comparative effectiveness analysis across all drugs included in the analysis.

Methods

Data

The literature search, study selection, data extraction, processing and analysis were performed following the described Methods in the legacy MBMA [7]. The augmented data defined as the combination of both the new additional collected data (post-2013) and the legacy data (pre-2013) were used to perform the analysis.

Literature search and study selection

An automated literature search was conducted in November 2020 using OVID MEDLINE and Embase databases with a restriction on English language publications. Searching criteria comprised studies published between July 1, 2013 and November 24, 2020 using keywords such as “FEV1”, “COPD”, and related terms as well as the generic and brand names of established or under development long-acting bronchodilators and anti-inflammatory compounds used in the treatment of COPD. After removing duplicates, abstracts of potentially relevant articles were screened. Subsequently, full-text versions of the relevant identified articles were reviewed independently by C.L-P, SY, CA and MB.

Key selection criteria for the reviewed articles included (i) single- or double- blind multiple-dosing COPD maintenance trials (with the exception of the open-label triotropium Spiriva arms); (ii) a minimum of 30 and 50 patients for cross-over and parallel-group designs, respectively; and (iii) reported absolute morning trough FEV1 values. Change from baseline (CFB), placebo or comparator arm FEV1 observations were included only if they could be back-calculated to absolute values. For studies where the CFB in trough FEV1 could be transformed to absolute value (absolute FEV1 = FEV1 baseline + CFB) but the FEV1 baseline value was measured post-short acting bronchodilator (post-SABD), a correction was made (PostBDcorrection) on the transformed absolute morning trough FEV1 by estimating a correction factor (θcorrection) (Eq. 1). Pre-dose FEV1 observations with respect to study drug but measured post-SABD were also included in the data for analysis. For the predictions of exacerbation rate, the mean annual rate of moderate or severe exacerbations per patient per year was used as the outcome of interest for analysis.

$${\text{P}\text{o}\text{s}\text{t}\text{B}\text{D}}_{\text{c}\text{o}\text{r}\text{r}\text{e}\text{c}\text{t}\text{i}\text{o}\text{n}}=\text{a}\text{b}\text{s}\text{o}\text{l}\text{u}\text{t}\text{e} {\text{F}\text{E}\text{V}}_{1\text{i},\text{j}}\cdot \left(1-{{\uptheta }}_{\text{c}\text{o}\text{r}\text{r}\text{e}\text{c}\text{t}\text{i}\text{o}\text{n}}\right)$$
(1)

Data extraction, processing, and analysis

Study characteristics data such as study size, inclusion criteria (e.g., exacerbation history and disease severity), treatment information and population demographic characteristics were extracted from the selected papers and collected using Microsoft Excel software. Relevant information that was not provided in the article was obtained from ClinicalTrials.gov (https://clinicaltrials.gov/), or databases from the different companies such as GSK (https://www.gsk-studyregister.com/en/), Boehringer Ingelheim (https://www.mystudywindow.com/), Novartis (https://www.novctrd.com/#/) and AstraZeneca (https://astrazenecagrouptrials.pharmacm.com/ST/Submission/Search). Data management and graphical exploration were performed in R software (The R Foundation for Statistical Computing) version 3.5.2 [13] using R packages [i.e. Xpose4 (http://xpose.sourceforge.net, version 4.6.1)] [14, 15]. Data analysis and modelling were performed in NONMEM software (ICON Development Solutions, Ellicott City, Maryland) version 7.5 together with an Intel FORTRAN compiler and Perl-speaks-NONMEM (PsN) version 5.2.6 [16].

Missing covariate values such mean age and any unknown fraction of patients who had a medication history and/or received background treatment during the run-in and/or study period were imputed as described in the legacy MBMA [7]. Specifically, multiple regression linear models reported in [7] were considered in the analysis.

Model-based longitudinal meta-analysis

The same structural model components and variability from the legacy models were considered in this analysis to assess the predictive performance of the published MBMA [7] and exacerbation rate model [10]. Key structural components of the legacy MBMA model [7] used in this analysis are described below.

Structural meta-model for FEV1

The observed absolute morning trough FEV1 (L) for the jth arm of the ith study over time is described in Eq. 2. Sub-models were used to describe the model-predicted untreated study arm baseline FEV1 (B) (i.e., the pre-dose baseline at randomization), long-term disease progression (DP), placebo effect (PBO), and drug effects (E) of the background and study drug treatments (X). The residual unexplained variability (RUV) was assumed to follow a N(0, σ2) and weighted by the inverse of the square-root of the number of patients in the study arm (N).

$${\text{F}\text{E}\text{V}}_{1\text{i},\text{j}}\left(\text{t}\right)={\text{B}}_{\text{i},\text{j}}\left(\text{C}\text{O}\text{V}\right)+{\text{D}\text{P}}_{\text{i},\text{j}}\left(\text{t},{\text{B}}_{\text{i},\text{j}}\right)+{\text{P}\text{B}\text{O}}_{\text{i}}\left(\text{t}\right)+{\text{E}}_{\text{i},\text{j}}\left(\text{t},\text{X},{\text{B}}_{\text{i},\text{j}}\right)+{\text{P}\text{o}\text{s}\text{t}\text{B}\text{D}}_{\text{c}\text{o}\text{r}\text{r}\text{e}\text{c}\text{t}\text{i}\text{o}\text{n}}+\frac{{\upepsilon }}{\sqrt{{\text{N}}_{\text{i},\text{j}}}}$$
(2)

Covariate (COV) effects such as disease severity, exacerbation history (according to study protocol) and mean age at enrollment were included on B (Eq. 3).

$${\text{B}}_{\text{i},\text{j}}=\text{T}\text{V}\text{B}\cdot \left(1+{{\uptheta }}_{\text{C}\text{O}\text{V}}\cdot \left({\text{C}\text{O}\text{V}}_{\text{i},\text{j}}-{\text{C}\text{O}\text{V}}_{\text{m}\text{e}\text{d}}\right)\right)\cdot \text{e}\text{x}\text{p}\left({{\upeta }}_{\text{B}\text{i}}+\frac{{{\upkappa }}_{\text{i},\text{j}}}{\sqrt{\frac{{\text{N}}_{\text{i},\text{j}}}{200}}}\right)$$
(3)

Here, TVB is the estimated typical baseline value, θCOV is the estimated covariate coefficient of the COV, COVi,j is the observed mean covariate value in the study arm, and COVmed is the reported median of the mean covariate values across all study arms reported in the legacy MBMA, specifically a value of 63.6 years for age; two/four for the lowest/highest disease severity class; and one/zero for those studies that require/do not require patient having an exacerbation history to be enrolled. Random effects, ηB and κ, on baseline were included to describe inter-study variability (ISV) and inter-arm variability (IAV), respectively. IAV was weighted by the inverse of the square-root of N normalized to 200 patients, as used previously [7], which is close to the typical number of patients per study arm (Eq. 3).

Disease progression (DP) (Eq. 4) and placebo (PBO) (Eq. 5) models were described by linear and mixture models, respectively, as described in the legacy MBMA [7]. TVDPslope is the estimated typical value for the DP slope for a typical 1.2 L baseline, and ηDPi is the random effect for ISV on DP. The mixture model was used to describe two subgroups of studies: those with a gradual PBO effect onset following an Emax model or studies with an immediate PBO effect onset. The estimated PT50 parameter indicates the time when the half-maximum PBO effect is reached in studies with a gradual onset, whereas it was fixed to 0.0001 weeks in studies with an immediate onset. The proportion of studies belonging to either subgroup was determined by using the $MIXTURE functionality in NONMEM. TVPBOmax is the estimated typical value of the maximum PBO effect, and ηPBO is the random effect for ISV on the maximum PBO effect.

$${\text{D}\text{P}}_{\text{i},\text{j}}={\text{T}\text{V}\text{D}\text{P}}_{\text{s}\text{l}\text{o}\text{p}\text{e}}\cdot \frac{{\text{B}}_{\text{i},\text{j}}}{1.2}\cdot \text{t}\cdot \text{e}\text{x}\text{p}\left({{\upeta }}_{\text{D}\text{P}\text{i}}\right)$$
(4)
$${\text{P}\text{B}\text{O}}_{\text{i}}\left(\text{t}\right)=\left({\text{T}\text{V}\text{P}\text{B}\text{O}}_{\text{m}\text{a}\text{x}}+{{\upeta }}_{\text{P}\text{B}\text{O}\text{i}}\right)\cdot \frac{\text{t}}{{\text{P}\text{T}}_{50}+\text{t}}$$
(5)

For each compound available in the dataset, an individual drug effect (Ex) was determined. If data were available, dose-response information for a compound was described using an Emax model (Eq. 6), where Dxi,j is the dose of compound x in the jth study arm of the ith study, and ED50,x is the estimated dose resulting in half-maximum efficacy. The maximum efficacy (Emax) was calculated as shown in Eq. 7 based on the estimated efficacy (Effref,x) of the reference dose (Dref,x) and ED50,x. For those compounds where ED50,x was unidentifiable, it was assumed that the compound has the same efficacy at all dose levels present in the dataset. For some compounds of the same class, some parameters were constrained to be the same. Specifically, aformoterol and formoterol share the same Emax as well as tiotropium (blinded) and tiotropium Respimat. The ED50 parameter for aformoterol was constrained to be half the ED50 value of formoterol, whereas the ED50 for tiotropium (blinded) was constrained to be equal to the ED50 of tiotropium (open-label).

$${\text{E}}_{\text{x}\text{i},\text{j}}=\frac{{\text{E}}_{\text{m}\text{a}\text{x},\text{x}}\cdot {\text{D}}_{\text{x}\text{i},\text{j}}}{{\text{D}}_{\text{x}\text{i},\text{j}}+{\text{E}\text{D}}_{50,\text{x}}}$$
(6)
$${\text{E}}_{\text{m}\text{a}\text{x},\text{x}}={\text{E}\text{f}\text{f}}_{\text{r}\text{e}\text{f},\text{x}}\cdot \frac{\left({\text{D}}_{\text{r}\text{e}\text{f},\text{x}}+{\text{E}\text{D}}_{50,\text{x}}\right)}{{\text{D}}_{\text{r}\text{e}\text{f},\text{x}}}$$
(7)

Treatment effects for the COPD medication received as background treatment during run-in and post-randomization, as well as a treatment class-dependent time course of effect onset were included in the model [7]. In addition, the effect of Bi,j (the model-predicted untreated study arm baseline) and ISV on the overall effect of bronchodilators (EBD) and anti-inflammatory (EAI) treatments were considered using a sub-model described in Eq. 8, where ηBD/AI i is the random effect for ISV on the bronchodilators/anti-inflammatory treatments, and BEBD/AI i,j is the interacting effect of the Bi,j described in Eq. 9. A step function (STEPB) was used to describe this effect with Bi,j below 1.2 L (more severe scenario) resulting in an estimated reduction in the effect of drug treatment (θBBD/AI) [7].

$${\text{E}}_{\text{B}\text{D}/\text{A}\text{I}\text{i},\text{j}\left(\text{t}\right)}={\text{E}}_{\text{B}\text{D}/\text{A}\text{I}\text{i},\text{j}}\left(\text{t}\right)\cdot \left(1+{{\upeta }}_{\text{B}\text{D}/\text{A}\text{I}\text{i}}\right)\cdot {\text{B}\text{E}}_{\text{B}\text{D}/\text{A}\text{I}\text{i},\text{j}}$$
(8)
$${\text{B}\text{E}}_{\text{B}\text{D}/\text{A}\text{I}\text{i},\text{j}}=\left(1-{\text{S}\text{T}\text{E}\text{P}}_{\text{B}}\right)\cdot \left(1+{{\uptheta }}_{\text{B}}^{\text{B}\text{D}/\text{A}\text{I}}\cdot \left({\text{B}}_{\text{i},\text{j}}-1.2\right)\right)+{\text{S}\text{T}\text{E}\text{P}}_{\text{B}}$$
(9)
$${STEP}_{B}=\left\{\begin{array}{c}1 \,\,\,If\,{B}_{i,j}\ge 1.2 L\\ 0 \,\,\,If\,{B}_{i,j} <1.2 L\end{array}\right.$$

An empirical interaction model allowing an infra-additive interaction (i.e., combined effect of both drugs is less than the sum of the effects of either drug alone) determined by the long-acting bronchodilator interaction parameter in the legacy MBMA [7] was used to describe the interaction between bronchodilators of different drug classes (e.g., combinations of LABA and LAAC compounds), whereas for all other maintenance treatment combinations were assumed to have a fully additive effect. This means that additive or synergistic interactions for other plausible combinations (e.g., ICS/LABA) were not contemplated in the model. The interaction between SABD and long-acting bronchodilators for those FEV1 observations measured post-SABD was considered by estimating a fractional reduction in the overall bronchodilator effect [7].

MBMA prediction of exacerbation rate

A link between MBMA-predicted placebo adjusted bronchodilator and anti-inflammatory drug effect on FEV1 (ΔΔFEV1BD and ΔΔFEV1AI, respectively) and annual rate of moderate-severe exacerbations at week 12 for arm jth in study ith (ERpredi,j) has been previously described [10, 17]. The ERpredi,j (Eq. 10) is modeled as the product between pre-treatment exacerbation rate (ERPBOi,j) (Eq. 11 and Eq. 12) and the typical exacerbation rate ratio between treatment and placebo groups (TVRi,j) (Eq. 13).

$${\text{E}\text{R}}_{\text{p}\text{r}\text{e}{\text{d}}_{\text{i},\text{j}}}={\text{E}\text{R}}_{\text{P}\text{B}{\text{O}}_{\text{i},\text{j}}}\cdot {\text{T}\text{V}\text{R}}_{\text{i},\text{j}}$$
(10)
$${\text{T}\text{V}\text{E}\text{R}}_{\text{P}\text{B}{\text{O}}_{\text{i},\text{j}}}= \left\{\begin{array}{c}{{\uptheta }}_{\text{E}\text{R}\text{P}\text{B}\text{O}\text{i},\text{j}}\cdot \text{exp}\left(\left(\text{p}\text{p}{\text{F}\text{E}\text{V}1}_{{1}_{\text{i},\text{j}}}-41\right)\cdot {{\uptheta }}_{\text{E}\text{R}}+\left({\text{I}\text{C}\text{S}}_{{\text{p}}_{\text{i},\text{j}}}-61\right)\cdot {{\uptheta }}_{\text{H}\text{I}\text{C}\text{S}}\right) If\,{\text{E}}_{\text{H}\text{I}\text{S}\text{T}}\,is\,required\\ {{\uptheta }}_{\text{E}\text{R}\text{P}\text{B}\text{O}\text{i},\text{j}}\cdot \left(1+{{\uptheta }}_{{\text{E}}_{\text{H}\text{I}\text{S}\text{T}}}\right)\cdot \text{exp}\left(\left(\text{p}\text{p}{\text{F}\text{E}\text{V}1}_{{1}_{\text{i},\text{j}}}-49\right)\cdot {{\uptheta }}_{\text{E}\text{R}}+\left({\text{I}\text{C}\text{S}}_{{\text{p}}_{\text{i},\text{j}}}-51\right)\cdot {{\uptheta }}_{\text{H}\text{I}\text{C}\text{S}}\right) If\,{\text{E}}_{\text{H}\text{I}\text{S}\text{T}}\,is\,not\,required\end{array}\right.$$
(11)
$${\text{E}\text{R}}_{\text{P}\text{B}{\text{O}}_{\text{i},\text{j}}}= {\text{T}\text{V}\text{E}\text{R}}_{\text{P}\text{B}{\text{O}}_{\text{i},\text{j}}}\cdot \text{e}\text{x}\text{p}\left({{\upeta }}_{\text{E}\text{R}}\right)$$
(12)
$${\text{T}\text{V}\text{R}}_{\text{i},\text{j}}=\text{e}\text{x}\text{p}({{\uptheta }}_{\text{B}\text{D}}\cdot {\Delta }{\Delta }{\text{F}\text{E}\text{V}}_{{1}_{\text{B}\text{D}}}+ {\text{I}\text{C}\text{S}}_{\text{n}\text{a}\text{i}\text{v}\text{e}{\text{f}}_{\text{i},\text{j}}}\cdot {{\uptheta }}_{\text{B}\text{D}}\cdot {\Delta }{\Delta }{\text{F}\text{E}\text{V}}_{{1}_{\text{A}\text{I}}})\cdot (1-{\text{I}\text{C}\text{S}}_{\text{e}\text{x}\text{p}{\text{f}}_{\text{i},\text{j}}}\cdot {\varDelta \varDelta {\text{F}\text{E}\text{V}_1}_{\text{A}\text{I}}}^{{{\uptheta }}_{\text{A}\text{I}\text{e}\text{x}\text{p}}})$$
(13)

The TVERPBOi,j is the estimated typical ERPBOi,j value (θERPBOi,j) for a study that required or not a history of exacerbation (EHIST) and a washout of ICS, θER is the estimated fractional change in ERPBOi,j for 1% point increase in %predicted FEV1 (ppFEV1i,j), θEHIST is the estimated fractional change in ERPBOi,j if history of exacerbations is not required and θHICS is the estimated fractional change in ERPBOi,j for 1% point increase in medical history of ICS, with ICSpi,j representing the percent of patients with a history of ICS usage, ηER is the random effect for ISV on the ERPBOi,j.

The fraction of ICS-naïve patients (ICSnaivefi,j) and fraction of ICS-experienced patients (ICSexpfi,j) were predictors for TVRi,j (Eq. 13), whereas θBD represents the estimated fractional change in TVRi,j for each unit (L) of drug effect from a direct bronchodilator, and θAIexp is the estimated power function for TVRi,j based on unit (L) of drug effect from an anti-inflammatory drug in patients with medical history of ICS.

Stochastic error (ɛ) due to limited sample size was incorporated in the model (Eq. 14), where ɛ follows a N(0, σ2) and Ni,j was the number of subjects for arm jth in study ith normalized to 500 patients which is close to the typical number of patients per treatment arm. The treatment duration in weeks (TRTdur) was also considered in the RUV using an estimated power function (θTRT).

$$\text{l}\text{o}\text{g}\left({\text{E}\text{R}}_{\text{o}\text{b}{\text{s}}_{\text{i},\text{j}}}\right)=\text{l}\text{o}\text{g}\left({\text{E}\text{R}}_{\text{p}\text{r}\text{e}{\text{d}}_{\text{i},\text{j}}}\right)+\frac{1}{\sqrt{\frac{{\text{N}}_{\text{i},\text{j}}}{500}}}\cdot {\left(\left.\frac{{\text{T}\text{R}\text{T}}_{\text{d}\text{u}\text{r}}}{52}\right)\right.}^{{{\uptheta }}_{\text{T}\text{R}\text{T}}}\cdot {{\upepsilon }}_{\text{i},\text{j}}$$
(14)

Models’ predictive performance

The predictive performance of both published models for FEV1 [7] and exacerbation rate [10] using the augmented dataset was assessed visually using goodness-of-fit plots such as observed FEV1 vs. population/individual predictions, conditional weighted residuals (CWRES) vs. time and individual weighted residuals (iWRES) vs. individual predictions. To produce these plots, the model parameters from the published models were considered with number of objective function evaluations set to zero (MAXEVAL = 0). At this stage, changes to the models were considered if they showed a poor predictability based on trends in the goodness-of-fit plots that indicates whether a model is correct or not [18]. In case of including covariates, decision was based on parameter plausibility and uncertainty, goodness-of-fit plots, and objective function value (OFV), where the difference in OFV between two nested model approximates the chi-square (χ2) statistics, which can be tested for significance (χ21,0.05 = 3.84).

Extension to new drugs

The published MBMA was extended with clinical trial data that were not available before July 2013. This adds information on new drugs and drug combinations, not present in the previous data base. The same treatment effect models described in Eq. 6 and Eq. 7 were applied for the new drugs, and a new set of parameter estimates was obtained. For ICS, LABA and LAAC compounds, a time-course of effect onset was tested as described previously [7] and retained in the model if a reduction in OFV was observed (regardless of statistical significance). In the published model, data from fluticasone propionate and FF were handled as a single drug “fluticasone” given either twice a day (b.i.d.) or once a day (q.d.), respectively. The efficacy parameter for fluticasone q.d. was estimated as a relative reference efficacy of fluticasone q.d. compared to b.i.d. In this analysis they were considered as different drugs and different set of parameters were estimated for each drug.

Comparative effectiveness

The comparative effectiveness for FEV1 across bronchodilators and anti-inflammatory compounds was assessed as described previously [17]. The drug effect parameters were re-parameterized as relative effects (Reffect) of two drugs for all comparison of interest (Eq. 15).

$${\text{R}}_{\text{e}\text{f}\text{f}\text{e}\text{c}\text{t}}=\frac{{\text{E}}_{\text{d}\text{r}\text{u}\text{g}1}}{{\text{E}}_{\text{d}\text{r}\text{u}\text{g}2}}$$
(15)

Confidence intervals (CI) around the Reffect were obtained using the log-likelihood profiling tool (llp) in PsN. If Reffect is larger than one would mean that drug 1 is superior to drug 2, conversely if Reffect is smaller than one would mean that drug 1 is inferior to drug 2. If the CI for Reffect includes one, superiority or inferiority cannot be established.

Results

Data

The post-2013 dataset includes a total of 132 references comprising 156 studies (Table S1 in Supplementary material). Combined with the pre-2013 data, there is a total of 298 studies including 250,543 patients who contributed to 4,137 mean morning trough FEV1 observations for analysis. The augmented data include a mean of 274 patients per study arm (number of arms: 914). Fifty-two (17%) studies with a total of 99,296 patients reported annual exacerbation rate for each study arm contributing to a total of 135 observations for analysis. Studies characteristics and patient demographics for the pre-2013, post-2013 and augmented datasets are shown in Table 1.

A total of 23 compounds were given across 914 study arms as mono- (71%), dual- (25%) or triple-therapy (4%). Compounds and their dosing regimens are shown in Table 2.

Table 1 Characteristics of studies including in the analysis
Table 2 Compounds included in the analysis with their drug class and dosing regimen

Model predictive performance and extension to new drugs

Goodness-of-fit plots obtained from the legacy MBMA [7] are displayed in Fig. 1. Based on these plots, the current model seems to predict the post-2013 data reasonably well. Therefore, no changes in the structural, statistical and covariate models were performed.

Fig. 1
figure 1

Goodness-of-fit plots for the MBMA published model using the combined dataset and published MBMA estimates a Observed FEV1 vs. population predictions; b Observed FEV1 vs. individual predictions; c Absolute individual weighted residuals vs. individual predictions; and d Conditional weighted residuals vs. time

Four new compounds (olodaterol, revefenacin, batefenterol and FF) were included in the analysis. New parameter estimates for the augmented data are presented in Table S2 in Supplementary material and the predictive performance of the post-2013 model is shown in Fig. 2.

Fig. 2
figure 2

Goodness-of-fit plots for the post-2013 MBMA model using the combined dataset and including new drugs a Observed FEV1 vs. population predictions; b Observed FEV1 vs. individual predictions; c Absolute individual weighted residuals vs. individual predictions; and d Conditional weighted residuals vs. time

A correction factor (Eq. 1) with a typical value (relative standard error – RSE%) of 0.89 (0.7) was estimated for six studies accounting for absolute trough FEV1 data obtained from CFB with FEV1 being measured post-SABD. A positive estimate for θcorrection will increase the model-predicted FEV1 to appropriately fit the data with augmented absolute FEV1 (studies with baseline measurement obtained post-SABD). For example, a predicted transformed absolute FEV1 value of 1 L is increased by 0.11 L. As shown in Table 3, the typical estimated efficacy (95%CI) for the reference dose (Effref) for olodaterol, revefenacin, batefenterol and FF are 89 mL (82.5–96.0), 144 mL (128.4–159.3), 190 mL (165.3–214.8) and 43.6 mL (33.4–53.7), respectively. A typical value (RSE%) of B is 1.17 (1.0) with a coefficient of variation (CV) for ISV and IAV of 10% and 2.1%, respectively for a typical study size of 200 patients. The typical rate for disease progression is 32 mL/per year (RSE: 9.6%) with an ISV CV of 54%. The half-maximum placebo effect was reached after 20 weeks (compared to 11 weeks previously reported) for those studies with gradual onset, whereas 45% of the studies were estimated to have an immediate placebo effect (compared to 48% previously reported).

Table 3 Estimates of FEV1 effects for all compounds based on an untreated baseline FEV1 of 1.2 L

Treatment effect was reduced in more severe patients with a Bi.j less than 1.2 L (Eq. 9). Such reduction is directly correlated with the untreated study arm FEV1 baseline [7]. For example, a study arm baseline of 1 L resulted in a reduction of 5.4% and 12.8% in the effect of long-acting bronchodilators and anti-inflammatory treatments, respectively. Dose-response relationship was identified for fourteen compounds (Fig. 3). The addition of onset effect was not supported for any of the new drugs included in the analysis, with a difference in OFV of 13 for FF and 0 for both olodaterol and revefenacin.

Fig. 3
figure 3

Dose-response relationship for fourteen compounds (given as mono-therapy). Reference doses (black dotted line); estimated ED50 (vertical blue dotted line) and Emax (horizontal blue dotted line) parameters and FEV1 response reported in studies between 1996 and 2013 (pink) and between 2013 and 2020 (blue) are shown

MBMA prediction of exacerbation rate

Goodness-of-fit plots for the published exacerbation model [10] (parameters shown in Table 4) are displayed in Fig. 4. Based on the observations vs. population predictions, the legacy model overpredicts the post-2013 mean annual exacerbation rate data. Therefore, changes to the model using the augmented data were performed as follows. In the legacy model, different centering values were used for covariates (ppFEV1 and ICSp) to assess the effect of history of exacerbation (θEHIST) on TVERPBOi,j (Eq. 11); however, this may force θEHIST to be different than zero. Therefore, in this analysis, the same centering values for covariates were considered. This gave an estimate of -0.12 for θEHIST compared to -0.30 (case when covariates were centered separately). Additionally, removing θEHIST did not result in a statistically significant increase in the OFV (1.11 points) meaning that EHIST did not have a meaningful impact in the model. Based on this, the θEHIST parameter was removed from the model (Eq. 16) and then the covariate effect was tested.

Table 4 Parameter estimates for the exacerbation model
Fig. 4
figure 4

Goodness-of-fit plots for the exacerbation rate model using the combined dataset and published model. a Mean annual exacerbation rate vs. population predictions, b Mean annual exacerbation rate vs. individual predictions, c Absolute individual weighted residuals vs. individual predictions and d Conditional weighted residuals vs. arm

As an attempt to explain the differences seen between the pre-2013 and post-2013 data sets and thus improve the predictive performance of the model, covariates such as the type of therapy (TTYPE; e.g., mono-, dual- or triple-therapy) and year when the study started (Studyyear) were tested on the pre-treatment placebo rate (TVERPBOi,j) as shown in Eq. 17 and Eq. 18, respectively.

$${\text{T}\text{V}\text{E}\text{R}}_{\text{P}\text{B}{\text{O}}_{\text{i},\text{j}}}= {{\uptheta }}_{\text{E}\text{R}\text{P}\text{B}\text{O}\text{i},\text{j}}\cdot \text{exp}\left(\right(\text{p}\text{p}{\text{F}\text{E}\text{V}1}_{{1}_{\text{i},\text{j}}}-41)\cdot {{\uptheta }}_{\text{E}\text{R}}+({\text{I}\text{C}\text{S}}_{\text{p}_{\text{i},\text{j}}}-61)\cdot {{\uptheta }}_{\text{H}\text{I}\text{C}\text{S}})$$
(16)
$${\text{E}\text{R}}_{\text{P}\text{B}{\text{O}}_{\text{i},\text{j}}}=\left\{\begin{array}{c}({\text{T}}{\text{V}}{\text{E}\text{R}}_{\text{P}\text{B}{\text{O}}_{\text{i},\text{j}}}+{{\uptheta }}_{\text{d}\text{u}\text{a}\text{l}})\cdot \text{e}\text{x}\text{p}\left({{\upeta }}_{\text{E}\text{R} }\right) \,If\,\,\,TTYPE=dual\\ ({\text{T}}{\text{V}}{\text{E}\text{R}}_{\text{P}\text{B}{\text{O}}_{\text{i},\text{j}}}+{{\uptheta }}_{\text{t}\text{r}\text{i}\text{p}\text{l}\text{e}})\cdot \text{e}\text{x}\text{p}\left({{\upeta }}_{\text{E}\text{R} }\right) \,If\,\,\,TTYPE=triple\end{array}\right.$$
(17)
$$\text{E}\text{R}_{\text{P}\text{B}{\text{O}}_{\text{i},\text{j}}}= {\text{T}\text{V}}{\text{E}\text{R}}_{\text{P}\text{B}{\text{O}}_{\text{i},\text{j}}}\cdot {\left(\left.\frac{{\text{S}\text{t}\text{u}\text{d}\text{y}}_{\text{y}\text{e}\text{a}\text{r}}}{2013}\right)\right.}^{{{\uptheta }}_{{\text{S}\text{t}\text{u}\text{d}\text{y}}_{\text{y}\text{e}\text{a}\text{r}}}}\cdot \text{e}\text{x}\text{p}\left({{\upeta }}_{\text{E}\text{R} }\right)$$
(18)

Inclusion of TTYPE and Studyyear resulted in a decrease in OFV of 4 and 23 points, respectively. Model parameter estimates and goodness-of-fit-plots for the post-2013 model including Studyyear are shown in Table 4 and Fig. 5, respectively. A typical value (RSE%) of placebo exacerbation rate is 0.91 (7.22) with a CV for ISV of 27%.

Fig. 5
figure 5

Goodness-of-fit plots for the post-2013 exacerbation rate model using the combined dataset. a Mean annual exacerbation rate vs. population predictions, b Mean annual exacerbation rate vs. individual predictions, c Absolute individual weighted residuals vs. individual predictions and d Conditional weighted residuals vs. arms

The relationship between predicted mean annual exacerbation rate and ΔΔFEV1 for moderate and severe disease at different years is displayed in Fig. 6. In a typical study from 2016 with 61% of patients having a history of ICS usage, the model predicted that a reduction in exacerbation rate of at least 20% can be achieved with a reduction in ΔΔFEV1 of at least 47 mL in moderate and severe patients (Fig. 6).

Fig. 6
figure 6

Relationship between predicted mean annual exacerbation rate and placebo adjusted drug effect on FEV1 for different study year in patients with moderate and severe disease. For predictions, it was assumed a predicted %FEV1 (ppFEV1) of 70 and 40 for moderate and severe patients, respectively; and 61% of patients required to wash out from ICS (Eq. 16). Dots are observed data for moderate (left panel) and severe (right panel) patients; lines are the median annual exacerbation rate for a given year; ribbon represents the continuous interval for studies from 1992 and 2016; ΔΔFEV1 is the placebo adjusted change from baseline in FEV1. ISV and IAV are not included in the predictions

Comparative effectiveness

The relative effect for anti-inflammatory drugs and bronchodilators using each drug as a reference is shown in Figs. 7 and 8, respectively. For most anti-inflammatories, superiority or inferiority cannot be established. However, roflumilast is superior to most of the drugs except beclomethasone and mometasone. In case of bronchodilators, batefenterol shows to be superior to most of the drugs except revefenacin where superiority/inferiority cannot be established (Fig. 8). Furthermore, umeclidinium shows to be superior to all bronchodilators except batefenterol (which shows to be superior), and revefenacin and glycopyrronium where superiority/inferiority cannot be established. Vilanterol is superior to formoterol, salmeterol and olodaterol only.

Fig. 7
figure 7

Comparative effectiveness of anti-inflammatory drugs. Point estimate and 95% CI for the relative obtained using the llp option in PsN is displayed for all drugs. Plot is stratified by drug 2

Fig. 8
figure 8

Comparative effectiveness of bronchodilator drugs. Point estimate and 95% CI for the relative obtained using the llp option in PsN is displayed for all drugs. Plot is stratified by drug 2

Discussion

This study assessed the predictive performance of a model-based longitudinal meta-analysis for FEV1 [7] in patients with COPD following inhaled treatment with anti-inflammatories and bronchodilators and its link to mean annual exacerbation rate [10] using augmented data with additional RCTs published between 2013 and 2020. This resulted in at least doubling the number of observations with more than 2000 additional morning trough FEV1 (data from 134960 randomized patients in 156 studies) for analysis, and showed that the legacy model [7] can predict these data well. Furthermore, the model was updated incorporating additional drugs as well as drug combinations with new estimates of FEV1 effects. The parameter estimates based on the augmented data (post-2013 model) are generally in line with the legacy MBMA estimates of drugs in the same class (Table S2 in supplementary material). Although the population model predictions adequately described the observed data (Fig. 2a), the post-2013 model cannot reliably predict at the individual level, as evidenced by the shrinkage (standard deviation%) for the residual error is 20% for the FEV1 vs. individual prediction plot. To some extent, this can be explained by the sparse data per study for the MBMA (median [range] observations per study arm of 4 [1–13]).

Notably, the typical new estimated efficacy of the reference dose for tiotropium (Respimat) was lower compared to previous estimates (120 mL vs. 134 mL) as well as for aclidinium b.i.d. (98 mL vs. 120 mL) and vilanterol (114 mL vs. 139 mL). This could be explained by the infra-additive LABA + LAAC interaction considered in the model resulting in a reduced efficacy of long-acting bronchodilators [19], and the higher percentage of patients receiving LABA and LAAC as a background medication during the study period in the post-2013 data (Table 1). On another note, the post-2013 model predicts a reduction of 5.4% in the effect of bronchodilator treatment for more severe patients (study arm baseline FEV1 < 1.2 L), in line with the legacy MBMA (4.7%). However, a lower reduction in the effect of anti-inflammatory treatments was estimated in the post-2013 model (12.5% vs. 28%).

In the post-2013 model, the typical rate of disease progression is 32 mL/year which is in line with previous report (33 mL/year) [20]. A faster disease progression has been observed in current smokers compared to former smokers (mean rate of decline in FEV1 of at least 21 mL/year greater in smokers) [20]. However, smoking status was not identified as a significant covariate on disease progression. This could be due to the nature of aggregate covariate data that describe a narrower range of values than individual covariate data making the covariate analysis difficult. Additionally, longer studies might be required to capture disease progression as has been previously pointed out [7]. Some benefits of combining individual patient level data and aggregate data for analysis include to (i) characterize patient-level relationships; (ii) better describe the effect of covariates, and the correlation among outcomes; and (iii) make the model suitable for predictions or simulations of individual outcomes.

Dose-response relationship was identified for fourteen out of twenty-three compounds (Fig. 3). For indacaterol, the new estimate of the typical value of ED50 is significantly smaller (with a larger uncertainty) compared to the published estimate (5.37 ug vs. 24.3 ug) (Table S2 in supplementary material). For this drug, it has been shown that baseline FEV1, as a marker of disease severity, influences the dose-response with less severe COPD patients requiring lower doses to achieve optimal effect [21]. For study treatment arms allocated to indacaterol, there is a mean difference of 13 mL in FEV1 at baseline between pre-2013 (mean FEV1 of 1.351 L) and post-2013 dataset (mean FEV1 of 1.338 L) meaning that in the post-2013 data, patients receiving indacaterol had a slightly less severe COPD than patients in the pre-2013 data. However, such difference might not deem to be clinically relevant. Usually, large phase 3 and 4 studies assess a clinical dose instead of a wide range of doses which may contribute to the relatively high uncertainty seen in the ED50 parameter estimates for some drugs. The dose-response for umeclidinium has been already characterized [22]; however, in this study, the ED50 parameter for umeclidinium was unidentifiable possibly due to the lack of data at the lower dose range (only one study with approximately 150 patients in total [22]).

In this analysis, the inclusion of onset of effect for the new drugs was not supported. Bronchodilators such as olodaterol and revefenacin have a fast onset of action (5 and 45 min post-dose for olodaterol [23] and revefenacin [24], respectively). To describe their effect onset, earlier FEV1 time points would have been required.

Comparative effectiveness is an important decision-making component during clinical drug development. As shown in this analysis as well as previously [17], MBMA is a useful tool to leverage use of the available data efficiently, and compare efficacy across different drugs even in the absence of real-life head-to-head trials. A comparative effectiveness analysis was performed using the post-2013 MBMA model as described before [17]. Roflumilast, a PDE4 inhibitor, showed to be superior to most of the other anti-inflammatories based on pulmonary response except for beclomethasone and mometasone where superiority/inferiority could not be determined. A comparison between inhaled beclomethasone and roflumilast in patients with persistent asthma showed a comparable effect between the two drugs in improving pulmonary function [25]. Superiority or inferiority of the inhaled ICS, FF, relative to most of anti-inflammatories could not be established when given as a mono-therapy. Differences among bronchodilators are more evident compared to anti-inflammatory drugs. Formoterol and salmeterol showed to be inferior to most of the other long acting bronchodilators, which is in agreement with the previous analysis [17].

Different bronchodilators and combinations have been compared in head-to-head trials. For example, umeclidinium showed to be non-inferior to once-daily glycopyrronium but superior to tiotropium in patients with COPD based on trough FEV1 at day 85 [26, 27]. This is in line with our findings as shown in Fig. 8. Regarding olodaterol, in this analysis, it showed to be inferior to most of other bronchodilators. A direct comparison between tiotropium/olodaterol and umeclidinium/vilanterol, superiority with umeclidinium/vilanterol was observed for the primary end point of trough FEV1 at week 8 [28]; however, non-inferiority could be established when compared umeclidinium/vilanterol with indacaterol/glycopyrronium (endpoint of trough FEV1 at week 12) [29]. Comparative effectiveness between different combinations was not assessed in this study.

The exacerbation model used in this analysis [10] predicts a higher exacerbation rate as a function of higher ppFEV1 as well as higher percentage of patients receiving ICS prior to randomization, both indicative of COPD disease severity. The same relationship between disease severity (ppFEV1), use of ICS prior randomization and exacerbation rate was shown in a model that describes exacerbation rates as a function of treatment duration in patients receiving roflumilast [30]. Currently, history of exacerbations remains the most important predictor of future exacerbations [31]. The need to be treated with ICS prior to randomization may indicate a previous exacerbation history. This potential correlation between use of ICS and an exacerbation event could explain why the model did not support the addition of exacerbation history as a covariate.

The published estimates for the exacerbation model [10] (Table 4) tends to overpredict the post-2013 mean annual exacerbation rate data. The inclusion of study year on baseline was the only statistically significant covariate. This covariate effect may be explained by (i) an improvement in the disease management over time and/or (ii) an increase in the number of marketed therapies which include dual and triple combinations (14% and 6% more studies compared to pre-2013 data, respectively). However, the authors acknowledge that study year might be a confounding factor, masking other plausible explanations for the differences in the pre-2013 and post-2013 data (i.e., measurement error). Furthermore, study year would not capture significant changes in exacerbation rate if simulations for future comparison of drugs are performed (e.g., exacerbation rate may be similar between 2016 and 2020). Future studies focusing on assessment of other potential predictors of exacerbation, for instance dyspnea, cough and sputum, exercise capacity and/or peripheral eosinophil count (PBE) may provide additional insight. This seems to be important considering the worse survival outcome associated with severe or frequent exacerbations [32]. Current evidence suggests the use of PBE when deciding on ICS use [9]; however, the link between PBE counts with future exacerbation risk is inconclusive and even controversial in COPD. In this study, only 6% of the trials that reported annual exacerbation rate had PBE data, and therefore it was not tested in the model.

This MBMA could serve as a tool to make quantitative decisions during drug development, specifically for trial design selection and optimization. For instance, clinical trial simulations can help to improve our understanding on first-in-patient trial design performance to inform Phase 2 decision making [33, 34], or assess the probability of detecting drug effect as a function of sample size [35]. Applications of MBMA to support drug development decision-making have been already pointed out [36]. Furthermore, the influence of varying inclusion criteria on the trial outcome in terms of effect size and power can also be quantified. This can be of interest in light of the stringent inclusion criteria for RCTs, which makes study findings less generalizable to a broader population [37].

Conclusion

The addition of 7 years’ worth of new data to the legacy COPD MBMA enabled a more robust model with increased predictability performance. This study shows that the legacy MBMA model is consistent and can be considered as a general model to predict FEV1 when using bronchodilators and anti-inflammatories in COPD. Furthermore, the comparison effectiveness analysis performed in this study demonstrated the usefulness of the MBMA, with results, where available, being in line with real-life head-to-head trials. An updated exacerbation model has been presented where study year seems to be an important covariate to explain potential improvements in the disease management over time. Future studies should focus on integrating aggregate and individual level patient data into the MBMA to make the analysis suitable for individual outcomes predictions.