Background

Chronic obstructive pulmonary disease (COPD) is a chronic, progressive, heterogeneous disease. The manifestation of disease progression varies over time and between patients. Despite this, most clinical trials conducted in COPD typically focus on a single primary outcome such as forced expiratory volume in 1 s (FEV1), exacerbation frequency or, less frequently, mortality.

The impact of interventions on disease progression has been measured by the annual rate of decline in FEV1 over several years [1]. However, there are limitations to this approach. Individuals with a slower rate of decline may dilute any observable treatment benefit in rapidly progressing subgroups. Moreover, individuals with a rapid decline may discontinue studies early, underestimating the true mean rate of decline in the control arm [2].

Primary analyses of clinical trials typically report group mean results, which can be insufficient to detect clinically important changes at the individual patient level. Furthermore, the focus on only one dimension of COPD may misrepresent real improvements that are meaningful to patients.

Measuring clinically important deterioration (CID) in terms of the most impactful events at the individual patient level might provide a significant benefit in studying the progression and effects of COPD in clinical trials. The three events included in this composite endpoint – trough FEV1, St. George’s Respiratory Questionnaire (SGRQ) score and moderate/severe exacerbation – have been previously used by Singh et al. [3], Anzueto et al. [4] and Greulich et al. [5], and were selected because they are commonly used in clinical trials and are known to have an impact on patients with COPD.

To explore the composite endpoint of CID further, we used a post hoc analysis of the 4-year UPLIFT study. The objectives of this analysis were to test the validity of CID when only including FEV1 and SGRQ events that were confirmed at a subsequent visit, to prove that CID predicts future outcomes, and to explore other elements of CID.

Methods

This post hoc analysis assessed time to first CID as time to the first occurrence of at least one of the following: decrease in trough FEV1 from baseline ≥100 mL, increase in SGRQ total score from baseline ≥4 units or moderate/severe exacerbation (the same components as suggested by Singh et al. [3]). Changes in FEV1 and SGRQ score were always calculated from baseline. Changes in FEV1 were assessed using pre-bronchodilation values, in line with previous studies assessing CID [6,7,8] and reflecting real-world clinical practice for FEV1 monitoring. A decrease in trough FEV1 ≥ 100 mL is considered to be the minimum clinically important change perceived by patients [9, 10] and is within the defined range suggested by the American Thoracic Society/European Respiratory Society task force [11], whereas an increase in SGRQ total score ≥ 4 units is considered the minimum clinically important change in quality of life [12].

Unlike for the composite endpoint published by Singh et al. [3], we only included confirmed FEV1 and SGRQ deteriorations, i.e. events that were present during at least two consecutive assessments (5 or 6 months apart). This excluded short-term fluctuations in the disease, which could provide an unreliable indication of CID. If no further assessment was available, but the patient discontinued study medication or died, the event was also considered as confirmed. Confirmed events were not required for exacerbations of COPD.

We have used the term “sustained” to refer to deteriorations that were then maintained at almost every subsequent visit.

Study design

Study design details have been previously reported [13] and are briefly summarised below. UPLIFT (ClinicalTrials.gov: NCT00144339) was a 4-year, randomised, double-blind, parallel-group study comparing tiotropium 18 μg, administered once daily via the HandiHaler®, with matching placebo [14]. The UPLIFT study was conducted in 37 countries [14]. Patients were aged ≥40 years, with a smoking history of ≥10 pack-years and moderate-to-very severe COPD (Global Initiative for Chronic Obstructive Lung Disease [GOLD] 2–4 [15]). For further details, see the Supplementary Methods. The protocol was approved by the ethics committee at each centre, and all patients provided written, informed consent.

Spirometric testing was performed at randomisation, at the Day 30 visit and at visits every 6 months up to Month 48. SGRQ was assessed at randomisation and every 6 months up to Month 48. Exacerbations and associated hospital admissions were recorded on case report forms at every visit. The two primary endpoints were pre- and post-bronchodilation yearly rate of decline in mean FEV1.

Statistical analysis

For time-to-event endpoints, hazard ratios (HRs), 95% confidence intervals (CIs) and P values were calculated using a Cox proportional hazards model. Patients without CID events were censored at the treatment stop date.

To assess the association of CID with future outcomes, patients experiencing a CID event within the first 6 months were compared with those not experiencing the event. For this analysis, the time to first moderate/severe exacerbation was calculated from Month 6 (180 days) to the first subsequent event or treatment discontinuation. Time to death was calculated from Month 6 (180 days) to the date of death or the end of the vital status follow-up (Day 1470).

Results

Patient dispositions have been reported previously [14]. Overall, 5652 patients received treatment (2811 tiotropium; 2841 placebo) and had baseline measurements for both FEV1 and SGRQ. GOLD stage at baseline (based on post-bronchodilator FEV1) was available for 5589 patients (GOLD 2: 1293 placebo, 1310 tiotropium; GOLD 3: 1266 placebo, 1239 tiotropium; GOLD 4: 250 placebo, 231 tiotropium).

Incidence of CID

Most patients in the total population (83.9%) experienced at least one CID during the study (Table 1). Exacerbations were more frequent than FEV1 or SGRQ decline (Table 1).

Table 1 Incidence of CID and risk of first CID occurrence in total population and by GOLD stage

The contribution of exacerbations to the composite endpoint became more pronounced whereas the contribution of FEV1 became less pronounced as COPD severity (GOLD stage) increased in the total population (Table 1).

Time to first event for each component is shown in e-Figure 1.

Overall, about half of patients experienced at least two of the three events qualifying as CID, whereas fewer patients experienced all three events (Fig. 1a). A similar proportion of patients in each GOLD group experienced at least two CID events (Fig. 1b–d). The incidence of all three CID events was also similar for GOLD 2 and 3 patients, whereas few GOLD 4 patients experienced all three CID events (Fig. 1b–d).

Fig. 1
figure 1

Kaplan–Meier estimates for the time to CID in the overall population and GOLD subgroups. Kaplan–Meier estimates of time to first event, two of three events, or three of three events of the components of the composite endpoint (trough FEV1 decline ≥100 mL, SGRQ score deterioration ≥4 units, or moderate/severe exacerbation) in (a) the overall population, (b) GOLD 2 patients, (c) GOLD 3 patients, and (d) GOLD 4 patients. - indicates either that this value could not be assessed (median was not estimable) or is not applicable (HR only displayed for time to event analysis). CI: confidence interval; FEV1: forced expiratory volume in 1 s; GOLD: Global Initiative for Chronic Obstructive Lung Disease; HR: hazard ratio; SGRQ: St. George’s Respiratory Questionnaire

Overall, most confirmed events were sustained at subsequent visits. Confirmed trough FEV1 decline was sustained at 12–48 months after the initial event in 74.6–81.6% of patients (Table 2). Confirmed SGRQ deterioration was also sustained at 12–42 months after the initial event in 72.3–78.1% of patients (Table 2). This pattern was comparable with the GOLD subgroups (e-Table 1), although patient numbers were low for the GOLD 4 subgroup.

Table 2 Patients with FEV1 decline or SGRQ deterioration in the total population

For unconfirmed events (reported at one timepoint), the proportion of patients whose FEV1 decline or SGRQ deterioration was sustained was lower: 51.6–71.9% of patients still had the FEV1 decline 6–48 months after first decline, and 52.5–65.5% still had SGRQ deterioration (e-Table 2).

In addition, in patients who had confirmed events, mean FEV1 remained at least 193 mL worse than baseline for the rest of the trial (Table 2). For unconfirmed events, mean FEV1 in patients with an event ranged from 95 mL worse than baseline at Month 6 to 142 mL worse than baseline at Month 24 and 213 mL at Month 48. In patients with SGRQ deterioration, mean increase was > 10 units for the rest of the trial for confirmed events, but ranged from 4.7 to 8.3 units for unconfirmed events.

Relative timing of events

The pattern and timing of clinically relevant events was highly variable for individual patients. Of patients who experienced both confirmed FEV1 decline and SGRQ deterioration, it was unusual to experience both events at the same assessment (Table 3). The time from FEV1 decline to subsequent SGRQ deterioration was slightly longer than the time from SGRQ deterioration to subsequent FEV1 decline (Fig. 2).

Table 3 Timing of FEV1 decline and SGRQ deterioration in the overall population and GOLD subgroups
Fig. 2
figure 2

Kaplan–Meier estimates of time to first subsequent SGRQ deterioration or first subsequent FEV1 deterioration. Kaplan–Meier estimates of median time from FEV1 decline ≥100 mL to SGRQ score deterioration ≥4 units, and median time from SGRQ score deterioration ≥4 units to FEV1 decline ≥100 mL in the overall population. CI: confidence interval; FEV1: forced expiratory volume in 1 s; NE: not evaluable; SGRQ: St. George’s Respiratory Questionnaire

For patients who experienced both FEV1 decline and SGRQ deterioration, those with less spirometric obstruction appeared more likely to experience confirmed FEV1 decline prior to confirmed SGRQ deterioration (GOLD 2: 50.1%; GOLD 3: 41.6%; GOLD 4: 37.0%) (Table 3).

Exacerbations demonstrated a greater contribution to the composite endpoint in more severe patients. Patients with more severe COPD were more likely to experience an exacerbation prior to experiencing FEV1 decline or SGRQ deterioration (Table 3).

Response to treatment

The time to first CID event, and time to first occurrence of the individual components, was sensitive to therapeutic intervention (Table 4). Time to first CID, two CID events and all three CID events was longer with tiotropium than with placebo (Table 4 and Fig. 1a). This trend was observed in GOLD 2 and 3 subgroups, but less so with GOLD 4 patients (Fig. 1b–d).

Table 4 Treatment comparison of time to first CID in the overall population and by GOLD stage

Risk of future exacerbations and death

Patients who had CID events by Month 6 were more likely to experience a moderate or severe exacerbation (HR 1.79; 95% CI 1.67, 1.92), a severe exacerbation (HR 1.67; 95% CI 1.49, 1.86) or death (HR 1.21; 95% CI 1.06, 1.39) (Table 5). The increase in the risk of exacerbations was qualitatively similar for GOLD 2–4 subgroups (Table 5).

Table 5 Risk of exacerbation or death from Month 6 onwards by CID status at Month 6 in the overall population and by GOLD

When the composite endpoint was broken down into its component events, the HRs for future exacerbations were smaller for FEV1 decline and SGRQ deterioration by Month 6 than for the composite endpoint in the overall population, and among GOLD 2 and GOLD 3 COPD patients (Table 5). Exacerbations within 6 months had higher HRs for any exacerbation and for severe exacerbations than the composite endpoint.

For unconfirmed events, the HRs for long-term outcomes were lower than for the sustained events (Table 5 and e-Table 3).

Investigating future events by CID status at Month 12 showed similar results (e-Table 4).

Mortality analysis with CID

Additional analyses using time to composite event or time to one of the component events as a time-varying covariate were performed. The HR for death for patients with a CID event versus patients without an event was 1.69 (95% CI 1.42, 2.01) (e-Table 5).

Using a stepwise Cox regression model to adjust for important baseline predictors of mortality had little effect on the predictive performance of the composite (e-Table 5). When all three components were included as separate predictors, all were associated with increased mortality risk (e-Table 5).

To validate these findings, the results in e-Tables 4, 6 and 7 are presented for the placebo and tiotropium arms separately. The HRs are slightly higher in the tiotropium arm, which may be related to the larger number of events in the placebo arm before Month 6. The results in e-Tables 6 and 7 are similar between arms and confirm the results in the total population.

Discussion

Composite endpoints have only recently been introduced in post hoc analyses of COPD clinical trials [3,4,5,6, 16]. Here, we conducted a post hoc analysis of the UPLIFT study. This analysis demonstrated the importance of using confirmed events in CID analysis and that CID predicts future outcomes. It also confirmed that the components of this composite endpoint behaved differently based on the baseline FEV1 of the individual patient. These data suggest that sustained decline in trough FEV1, sustained deterioration in SGRQ score of ≥4 units and a moderate/severe exacerbation are appropriate components of a composite endpoint for the assessment of CID in patients enrolled in COPD clinical trials. Earlier analyses of the UPLIFT trial have focused on exacerbations or a composite endpoint of the more severe events (exacerbations, respiratory failure, death and trial withdrawal due to worsening COPD), which do not provide an in-depth view of the impact of COPD on patient symptoms or quality of life [16, 17]. In the current analysis we focus on a composite endpoint of validated clinically important criteria (FEV1, SGRQ and exacerbations) to provide a more complete assessment of the impact on patients.

The individual components of the composite endpoint comprise characteristics of COPD that impact patient well-being, are clinically relevant events for the patient and predict future outcomes [15]. Although there are other parameters that could be included in such an endpoint, the components included are relatively easy to include in clinical trials and have established minimum clinically important differences.

Most deteriorations in FEV1 and SGRQ that were confirmed at a second visit were maintained for the rest of the 4-year UPLIFT study. Some publications of composite endpoints in COPD have not required confirmation at a subsequent visit [3, 16]. We believe that counting only confirmed FEV1 and SGRQ deteriorations improves the reliability of the composite endpoint, as it excludes short-term variation and inconsistent measurements. This is supported by the low proportion of patients with unconfirmed events whose FEV1 or SGRQ deterioration is sustained at subsequent timepoints, and by the lower HRs for long-term outcomes with unconfirmed events compared with confirmed events.

Our analysis demonstrated that the components of the composite endpoint rarely occur at the same time in an individual patient. Most patients experience decline of trough FEV1, deterioration of SGRQ score and moderate/severe exacerbations on an individualised time scale. This supports the value of individual components in a composite endpoint. The stepwise regression data also show that each component independently contributes to increased mortality risk. The composite endpoint is also sensitive to pharmacological treatment, and is similar to the findings of Singh et al., who observed a reduction in first CID with umeclidinium/vilanterol versus placebo in a post hoc study of the same composite endpoint [3]. Other post hoc analyses have used slightly different composite endpoints: FEV1, SGRQ and Transition Dyspnea Index focal score [6]; FEV1 or Transition Dyspnea Index; an increase in SGRQ; and a moderate-to-severe COPD exacerbation [4].

In all the publications that included FEV1, the strongest driver of CID in each of the analysis populations was lung function [3,4,5,6]. In contrast to these previous results, the most commonly reported endpoint in our study was exacerbations, perhaps because the UPLIFT study was 4 years long compared with the shorter (maximum 26 weeks) duration of the previous studies [3]. Our analysis showed a high overall frequency of CID for both treatment arms, which is expected due to the long study duration.

Lastly, we have shown that patients considered to have a CID early in the UPLIFT study (within the first 6 months) had worse outcomes for the 42-month remainder of the study; this was also confirmed in an analysis using CID as a time-varying covariate. These outcomes support results from previous analyses of the shorter TORCH and ECLIPSE studies. The 4-year length of our study provided valuable information on sustained CID and the relationship between clinically important events that could not be ascertained in clinical trials of shorter duration.

The study had limitations. In addition, relatively few patients with GOLD 4 lung function impairment were enrolled. Additionally, GOLD 4 patients have a lower baseline FEV1 than GOLD 2 or 3 patients, and as such, declines in FEV1 of ≥100 mL were less common, and would be expected to be more debilitating, in these patients. This should be considered in future studies, where percentage declines may be considered as an alternative clinically significant decline. The composite index considers the parameters SGRQ and moderate/severe exacerbations, which could be seen as subjective; therefore, it is possible that this could introduce some variability in the results. Also, this was a post hoc analysis, although the large population and long follow-up time allowed for a satisfactory number of events to be observed.

Conclusions

We believe these results indicate that a composite endpoint of CID is a promising endpoint to assess disease activity in COPD clinical trials and may be a useful outcome that helps clinicians interpret the implications of trial results for individual patient management. Development of prospective studies is required to determine whether patients who experience disease progression (i.e. those who experience CID) at an increased rate can be identified earlier. By stratifying patients based on time to CID in a clinical trial database, it may be possible to identify characteristics that are associated with longer-term poor outcomes that could be useful for identifying which patients require further treatment earlier. Moreover, the composite endpoint may also serve to reduce patient numbers in clinical trials, as large numbers of patients are required to generate enough statistical power to detect a single outcome within patients with moderate COPD [18]. The length of trials may also be reduced, thereby limiting challenges such as patient discontinuation and cost that are prohibitive in trials of increased duration. Prospective studies are needed on the use of this concept to understand the sensitivity and efficacy of current and potential therapies.