FormalPara Key Points

A number of new treatments for asthma and chronic obstructive pulmonary disease (COPD) were recently released, with several more on the horizon. This has sparked interest in the comparative effectiveness among the available inhaled therapies.

The natural histories of asthma and COPD presentation and progression present many unique challenges for comparative effectiveness research. Comorbidities, disease heterogeneity, and poor treatment adherence are just a few of the problems that can introduce bias into the analysis if not effectively addressed in the study design.

1 Introduction

The goal of comparative effectiveness research (CER) is to measure the real-life benefits and risks of treatments. The Institute of Medicine defines CER as “the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition or to improve delivery of care. The purpose of CER is to assist consumers, clinicians, purchasers, and policymakers to make informed decisions that will improve healthcare at both the individual and population levels” [1]. Proof of whether or not a treatment can work is known as efficacy, while the benefit of that treatment in routine clinical practice is known as effectiveness. Efficacy is usually established by randomized clinical trials (RCTs), which are considered to be the ‘gold standard’ for proof of treatment benefit, but RCTs do have important limitations, including highly selected study populations and artificial clinical conditions. Effectiveness studies are needed to demonstrate that treatments still have the intended benefits when they are used in broader unselected patient populations and routine clinical practice. CER attempts to capture the differences in clinical benefits among similar treatments when used in the general population.

The availability of well-established inhaled combination corticosteroid and long-acting β-agonist (ICS/LABA) products [budesonide/formoterol (BFC) and fluticasone/salmeterol (FSC)] in the management of asthma and chronic obstructive pulmonary disease (COPD) creates an opportunity to examine the benefits, as well as limitations, of CER in the therapeutic area of chronic respiratory disease. Both of these combination inhalers have proven efficacy in both asthma and COPD (a third combination, mometasone/formoterol, is approved only for asthma), and data from RCTs and meta-analyses suggest that these products perform similarly under controlled RCT conditions [27]. The role of ICS/LABA combination therapy for patients with persistent asthma is well established, while the role of combination therapy in COPD is not as clear (Fig. 1). The only head-to-head comparisons of ICS/LABA combinations in asthma found that any differences in efficacy between them were slight, and in their primary endpoints, not statistically significant [6]. In contrast, some recently reported real-world, comparative effectiveness studies have suggested that there are differences between ICS/LABA combination treatments in a variety of clinical outcomes. These discrepancies are intriguing, but there are several unique features of asthma and COPD disease pathophysiology, progression, and management that need to be considered when interpreting CER studies, as well as very important limitations in study design. In this review, we examine the clinical trajectory of ICS/LABA use in asthma and COPD and how study design problems during different time periods may result in significant biases in CER research. We present a current review of published CER studies that have directly compared BFC with FSC, and examine how they may or may not have dealt with these study design issues. Finally, we summarize the practical, clinical implications of these CER studies as well as the knowledge gaps that remain, and look at what lessons can be learned for the development of new therapeutic options.

Fig. 1
figure 1

Role of ICS/LABA combination therapy in (a) asthma and (b) COPD [8]. Asthma guidelines reproduced with permission from the National Asthma Education and Prevention Program [8]. Refer to the original document for full guideline notes (http://www.nhlbi.nih.gov/guidelines/asthma/09_sec4_lt_12.pdf, accessed 17 March 2014). ACP American College of Physicians, ACCP American College of Chest Physicians, ATS American Thoracic Society, COPD chronic obstructive pulmonary disease, EIB exercise-induced bronchospasm, ERS European Respiratory Society, FEV 1 forced expiratory volume in 1 second, ICS inhaled corticosteroid, LABA long-acting inhaled β2-agonist, LTRA leukotriene receptor antagonist, SABA short-acting inhaled β2-agonist

2 Literature Review

We conducted a literature review using both PubMed and Thomson Reuters Web of Science databases for studies comparing ICS/LABA treatments, spanning the time period 1 January 1997 through 2 October 2013. Multiple searches were conducted. Search terms included fluticasone, salmeterol, budesonide, and formoterol. Initially the search was broad and also included terms for mometasone furoate and beclomethasone formoterol. A total of 330 unique citations were identified from the broad search process. The abstracts for these were reviewed, and based on that information the following were excluded from further review: 124 review articles, 118 clinical trial studies, 35 non-relevant studies (e.g. generic studies, inhaler mechanism effectiveness, only ICS treatment, or simulation study), and 14 non-study documents (book chapters, viewpoints, or editorials). We then reviewed the remaining 39 articles in detail. Of these, 25 articles did not present comparative effectiveness findings for the BFC and FSC or were duplicative of ones that did, and 14 articles described observational comparative effectiveness studies of BFC and FSC; 9 for asthma and 5 for COPD. Highlights of these studies are summarized in Tables 1, 2, 3 and 4 [922].

Table 1 The index date—study design characteristics
Table 2 The baseline period—patient characteristics prior to the index date
Table 3 The follow-up period—treatment comparisons, adjustments for adherence, and tests of assumptions
Table 4 Study results, conclusions, and analysis limitations

3 The Trajectory of Asthma and Chronic Obstructive Pulmonary Disease (COPD) Treatment

A major difference between RCTs and CER is that the index date, or ‘day one’ of treatment, in an RCT for an asthma or COPD medication typically starts when the patient is at a stable baseline, at least several weeks after their last exacerbation. In real-life, asthma or COPD patients usually start taking medications when they are sick. In studies that have examined the natural history of COPD in the time before and after the first diagnosis of COPD, the diagnosis typically occurs during an exacerbation, respiratory infection, or non-respiratory acute medical event, and respiratory medications are often dispensed along with antibiotics and other drugs during that time [23, 24]. Asthma diagnoses are also more likely to be made at times when patients are symptomatic [25, 26]. Therefore, ICS/LABA treatment in the real-world is usually prescribed when asthma and COPD patients are unstable and often have other complicating problems such as infections or cardiovascular complications. When contemplating a CER study in asthma and COPD, it is useful to partition the time periods into the time of the initial treatment with a new medication (the index date), the time before the new medication was dispensed (the baseline period), and the time after the medication was dispensed (the follow-up period). Each of these periods is associated with unique factors that result in selection biases, measurement errors, or potential confounding that are likely to affect clinical outcomes. In the following sections, we will examine these features by each time period in detail, and examine how these potential problems have been handled in the CER studies for ICS/LABA treatment to date.

4 The Index Date

In CER of ICS/LABA combinations, the index date is commonly the date of the first prescription fill for the new treatment after the asthma or COPD diagnosis is established. In Table 1, the majority are longitudinal cohort analyses with index dates that are based on this definition. The index date is a logical choice as the start of the analysis period because it is safe to assume that the patient first started using the ICS/LABA inhaler on or near that date, and whatever benefits and side effects attributable to the treatment will begin to accrue at that time. However, that assumption may not be true in clinics where free drug samples are dispensed, wherein there may be a gap of weeks to months between the office visit when treatment was started and the first ICS/LABA prescription drug fill, resulting in a bias in follow-up time. Therefore, one needs to be familiar with the policies about sampling in the health systems one is studying and the likelihood that patients are dispensed medications that do not appear in the database.

Unfortunately, studies rarely describe what was happening to the patient on the index date that caused them to be treated. These circumstances are important because they are likely to affect the patient’s outcome during follow-up. For example, if the patient was experiencing an asthma or COPD exacerbation at the time it was prescribed, their risk for another exacerbation during the next few months will be substantially higher than for someone who had not had a recent exacerbation [27, 28]. If the ICS/LABA prescription was provided during an emergency department visit, it is likely that the patient will have different adherence with long-term treatment than someone who is prescribed one during a scheduled office visit [29]. If the prescription is written by a pulmonologist, then it is more likely that there will have been a pulmonary function test or other testing confirming the diagnosis than if written by a primary care provider. If the index date is at or near the time of a hospitalization, then associated comorbidities such as heart disease or pneumonia may be more likely to affect subsequent outcomes [30].

One way to deal with the problem of heterogeneous clinical presentations is to match patients based on the clinical setting where they started ICS/LABA treatment, such as a clinic visit, ED visit, or hospitalization. Another way is to match patients based on the specialty of the prescribing physician, such as pulmonary or allergy specialists versus primary care provider. In studies where COPD or asthma exacerbations are an outcome of interest, it is very important to match on exacerbations that occurred on or near the index date because one of the best predictors of future exacerbations is the history of prior exacerbations [27, 31, 32].

One also needs to be aware that physicians may not be free to choose which ICS/LABA product the patient will get; the choice is often made by pharmacy benefit managers who negotiate a lower price for one ICS/LABA product or the other, which then becomes the more readily available product on the formulary [33]. In that circumstance, the fact that a patient is not using the less expensive product is an indication that there is something unusual about the patient that may affect their outcomes. When comparing COPD or asthma patients among several health systems, one must be aware of those with a formulary that heavily favors one product because differences among heavily biased cohorts may be more likely to reflect group clinical characteristics than treatment effects.

4.1 Index Dates in Cross-Sectional Studies

A few of the CER studies in asthma were based in respiratory clinics and are purely cross-sectional, meaning that all patients at the initiation of the research have already been taking their medications for weeks to years before the first day of the study (Table 1). The index date for these cross-sectional studies is usually the date of enrollment, and because of the variable length of medication use at the start, lead time bias is inevitable. Clinic-based studies are necessary when the endpoints of interest are asthma symptom control and health-related well-being, which are measured using standardized questionnaires. These projects also tended to be more similar to RCTs in that they excluded persons who had a recent exacerbation, they excluded those with COPD or other serious chronic diseases, and they included a relatively homogeneous selected population of persons who most likely had severe asthma at one time and were thus referred to a specialist’s office. While reducing confounding influences, these criteria also limit the generalizability of the results. Unlike RCTs, these projects are obviously not blinded or randomized, making them highly susceptible to the biases that affect any observational study.

4.2 Misclassification Errors in Asthma and COPD Comparative Effectiveness Research (CER)

Misclassification errors (misdiagnoses) are common among persons labeled with either asthma or COPD. One primary care-based clinical study found that almost half of their patients with a physician diagnosis of COPD did not have airflow obstruction when tested by spirometry [34]. In addition, in a study of 496 randomly selected patients diagnosed with asthma from eight cities in Canada, 150 (30 %) did not have asthma after a complete evaluation including methacholine challenge testing. [35] For asthma, there can also be difficulties in accurately distinguishing between levels of disease severity [3638]. To decrease the risk of disease misclassification, some database studies require multiple outpatient COPD or asthma diagnoses over a specified time period, or at least one diagnosis during a hospitalization. Hospitalization diagnoses are assumed to be more valid because patients are more likely to have had a more thorough assessment than during an office visit, although that is not always a safe assumption. [39]

4.3 Inclusion and Exclusion Criteria

In asthma or COPD CER studies there is a tendency to use the same inclusion and exclusion criteria used in asthma or COPD RCTs. It is often driven by a desire to demonstrate effectiveness in CER that is similar to efficacy in RCT. However, when doctors in the real-world prescribe asthma or COPD medications, they are not limited to the strict inclusion and exclusion criteria that are used for RCTs. Like it or not, physicians will immediately begin testing the boundaries unexplored by RCTs as soon as a drug is on the market. Relative effectiveness of treatment in the general population should reflect treatment in the general population. If there is a desire to replicate RCT results, then secondary analyses that systematically examine the impact of various inclusion and exclusion criteria can serve that purpose and additionally may yield unique and useful information.

Exclusion criteria deserve special consideration in CER studies because that is where the unintended effects of new treatments are likely to be found. Drug studies attempting to prove efficacy in RCTs typically exclude unstable patients, those with severe comorbidities, or persons who are not likely to be compliant with therapy, because these patients increase safety monitoring concerns while also reducing the power to observe clinical benefits. However, excluding these patients also reduces the power of RCTs to capture negative side effects, particularly those that are more likely in the excluded populations. One of the unique advantages of CER is that the risk or benefits of treatment for these excluded, unstudied patients can now be captured and described.

It is common for COPD patients to have other respiratory diagnoses, especially asthma. In a cohort of 42,565 COPD patients treated in managed care systems in the US, 27 % also had an asthma diagnosis within the last year [40]. The impact on measured outcomes of having a combined asthma and COPD diagnosis is variable depending on the outcome, but most studies have found increased risk for exacerbations [41]. Furthermore, most observational database studies do find that a combined diagnosis of COPD and asthma does affect treatment, particularly the selection of ICS/LABA combinations versus inhaled anti-muscarinic treatments [42]. Unfortunately, most CER studies in COPD to date have simply excluded patients with any diagnosis of asthma. Lung cancer is common in older COPD cohorts (4–7 % in cross-sectional cohorts) [40], and because of its very high case fatality rate, it has a significant impact in longitudinal studies [43]. Depending on the objectives of the study, investigators may need to make a systematic effort to capture specific respiratory comorbidities.

Another common way to reduce selection bias and create balanced ICS/LABA cohorts similar to RCT populations is to use case-control methods. It is possible to match on just a few major demographic and clinical factors [17], but most studies have used propensity score matching (PSM) techniques which allow more comprehensive matching using a broader array of clinical variables (Table 1). The obvious benefit of matching is that known confounding factors are more likely to be balanced among the comparison groups, but PSM still has a number of very important limitations [44]. First, matching limits the study cohorts to the size of the smaller treatment population. If one of the treatment groups is substantially smaller than the other, then most of the persons in the larger group are dropped from the analysis, which leads to sample selection bias and may seriously compromise the generalizability of the results. Second, PSM will help balance the selected variables, but any unselected or unmeasured variables that affect outcomes may continue to be unbalanced. This is very important in observational studies where attitudes about treatment and adherence with therapy are often not measurable. Finally, when one matches on a clinical parameter, then one’s ability to measure how that factor affects the outcomes is compromised [45]. For example, if men are more likely than women to benefit from the reduction in COPD or asthma exacerbations attributable to ICS-LABA treatment, then matching by sex will bias the risk estimates for exacerbations among men towards the null, and eliminate the possibility of examining interactive effects between ICS-LABA treatment, sex, and exacerbation risk.

5 The Baseline Period

Utilization data collected during the baseline period is often used to establish the diagnosis of asthma or COPD, capture items that are associated with disease severity, and identify comorbid diseases and conditions (Table 2). The duration of the baseline period is typically 12 months, which helps to ensure that the treatment dispensed on the index date is a new treatment. Importantly, a 12-month baseline also allows for the seasonal variability in asthma and COPD exacerbations. Within the baseline period, the timing of events may be important, in particular those occurring just prior to the index date. For example, an emergency department visit with a discharge diagnosis of bronchitis that occurred 1 week prior to hospitalization for asthma is most likely a very different syndrome than that of a patient who was seen in a pulmonologist’s office 6 months before the index date with a chronic cough.

The approach to dealing with confounders is one of the most important differences between RCTs and CER. In randomized trials, treatment groups have randomly and equally distributed known (e.g. age, comorbidities, sex) and unknown (e.g. preferences and attitudes toward treatment) confounding influences, leaving treatment as the major difference between groups that may impact outcomes. The challenge of CER is to adequately adjust for the many confounding factors that exist in these retrospective treatment populations. In asthma and COPD, comorbid illnesses such as cardiovascular disease or psychiatric illness are examples of known and very complex confounders. It is important to keep in mind that even if it were possible to fairly balance every known confounding factor retrospectively, unknown factors will remain unbalanced and could become sources of residual confounding.

To capture baseline comorbidities that are likely to affect subsequent treatment choices or outcomes, it is best to use one of the standardized classification systems that have been adapted for use in electronic databases containing International Classification of Diseases, Ninth Revision (ICD-9) or ICD, Tenth Revision (ICD-10) codes [46]. The Charlson Comorbidity Index, and database versions such as the Charlson–Deyo Index, are popular for identifying and weighting prognostically significant comorbidities, although the validity of the original weighting scheme is limited due to the improvement in some disease outcomes since the first version was published over 20 years ago [47]. The Elixhauser system captures a broader range of comorbidities, and software for database versions are publically available on the US Agency for Healthcare Research and Quality website [48]. However, there are comorbidities of special interest in asthma and COPD, such as obstructive sleep apnea, cor pulmonale, allergic rhinitis, vascular disease, gastroesophageal reflux, and cardiac arrhythmias, that are missed by some of these standardized systems. Therefore, it is usually necessary to supplement standardized comorbidity classification tools with the other respiratory diseases and conditions associated with COPD and asthma that are likely to affect outcomes.

The baseline period may also be used to estimate the severity of asthma or COPD. It is reasonable to expect that patients with unstable asthma or COPD will have more hospitalizations and unscheduled clinic visits. Emergency department visits may also be indicators for severity; however, many patients, even in managed care systems, may use emergency departments as their primary care clinic, and thus emergency department visits not leading to hospitalization tend to be more like clinic visits than hospital stays [27]. Patients with more severe disease may also have more documentation of respiratory symptoms in the form of diagnosis codes; for example, dyspnea (ICD-9 codes 786.05/786.09) or wheezing (ICD-9 codes 786.07) [49]. Oxygen use, or codes for hypoxemia or respiratory failure, are also evidence of severe lung disease.

Medication use during the baseline period is also a useful indicator of respiratory disease severity. Increased rescue medication use (either short-acting β-agonists and/or ipratropium bromide) in the baseline period is associated with increased COPD exacerbations before and after the index date [20]. Increased use of rescue medication is also a marker for poor asthma control [50]. Thorough description of baseline use of both the total number and types of medications used (e.g. oral steroids, xanthenes, leukotriene inhibitors) is useful for characterization of the study populations.

6 The Follow-up Period

The follow-up period has unique challenges (Table 3). The biggest challenge is dealing with the poor treatment adherence among asthma and COPD patients. As previously noted, the index date is commonly associated with an exacerbation of asthma or COPD. Fortunately, the natural history of most COPD exacerbations is that the symptoms will resolve and the patient will return to their usual health status prior to the exacerbation [51]. But as asthma and COPD symptoms decrease, the benefits of maintenance medications such as ICS/LABAs become less obvious to patients, and they quite often conclude that they do not need them anymore. Experience varies, but approximately 25 % of persons initially dispensed an ICS/LABA will not get it refilled, and by 6 months more than half will have discontinued it [29]. Although adherence with combined inhalers tends to be better than for use of the individual component medications, studies of ICS/LABA combinations show that continuity of treatment in either asthma or COPD is very poor [5254]. Even in RCTs such as the TORCH study where patients were very closely monitored, 44 % of those on placebo and 34 % on full treatment stopped the medication before the end of the study, resulting in a bias that affected the results [55].

The conventional treatment comparison method for RCTs is the intent-to-treat analysis, wherein patients are kept in their original treatment groups even if they have stopped taking the medicine. Many respiratory CER studies also simply follow the intent-to-treat design, even though the adherence with treatment is far worse than that of RCTs and a large proportion of patients labeled as treated with BFC or FSC were dispensed only one inhaler for the entire follow-up period. This obviously creates misclassification errors, and importantly makes it far more likely that any observed differences between treatment groups are biased by factors associated with selection of treatment than by any effect of the treatment itself.

While patients who are non-compliant create misclassification errors and other biases, the patients who are compliant with ICS/LABA treatment create another problem—bias by indication. Patients who have more severe asthma and COPD are more likely to experience chronic symptoms, have higher risk of recurrent exacerbations, and thus use more respiratory medications. Without effectively addressing the problem of bias by indication, ICS/LABA combinations can appear to increase the risk of asthma and COPD exacerbations in the general populations, when in fact the treatment is simply acting as a marker for more severe disease.

Although treatment adherence is a fundamental problem for CER studies, most ICS/LABA studies to date have not been very thorough in capturing or describing how medications were actually used, or adjusting for adherence in the analysis (Table 3). There are notable exceptions; for example, the project by Suissa et al. [13] specifically addressed the adherence problem by using both intent-to-treat analyses and analyses while on-treatment only. Adjusting for treatment compliance during follow-up is a very difficult problem that is the subject of ongoing research. Because of the problem of bias by indication, simply making treatment a time-dependent variable is more likely to introduce bias than adjust for it [56]. Micro-simulation models and marginal structural models are new techniques that address the complex competing problems of disease severity and treatment adherence, but have only recently been applied to COPD [57].

Exacerbations are a popular endpoint for CER because they have significant impact on other disease outcomes such as quality of life, healthcare costs, hospitalizations, disability, and mortality [58]. However, defining exacerbations is controversial, especially in retrospective studies. The most common definitions for exacerbations of CER are based on utilization, which is a very practical approach for studies where utilization and cost are major endpoints [59]. Outpatient exacerbations are usually defined as outpatient visits associated with prescription fills for oral corticosteroids (OCSs) and/or antibiotics, standard treatments for acute asthma and COPD exacerbations. Occasional use of oral steroids can be a reasonable indicator of an acute exacerbation, but use of only antibiotics is less clear, especially now that chronic use of azithromycin to prevent COPD exacerbations is growing in popularity. If exacerbations are the primary endpoint in CER, chart abstraction may be needed to validate the sensitivity and specificity of exacerbation definitions.

When utilization is used to define exacerbations, one also has to decide how closely the outpatient or hospital visit must be associated with the prescription fill to reasonably conclude that they are related. Most have required that fills occur within 3 days of a respiratory-related visit. Exacerbations of asthma and COPD are known to linger from a few days to several weeks, and in one prospective clinical study of exacerbations, the median duration was 12 days [60]. It is important to designate the expected duration of exacerbations so that follow-up visits for one exacerbation do not get counted as multiple exacerbations. There are no methods that have near perfect sensitivity or specificity for capturing COPD or asthma exacerbations, but the utilization method has demonstrated useful reliability across many patient populations and study designs.

Misclassification errors and extreme outliers can be a problem in the follow-up period, particularly for persons labeled with asthma who have vocal cord dysfunction or conversion disorders, or COPD patients who have hospitalizations that last for weeks. Some misdiagnosed asthma patients can have multiple hospitalizations, including events that result in intubation and intensive care unit stays. Pneumonia is often the cause of acute exacerbations of COPD that result in hospitalization, and hospitalizations for COPD and pneumonia are substantially longer and more costly than those without pneumonia [61]. Just a few outliers or misclassified patients can substantially skew clinical outcomes, especially costs, and bias CER of ICS/LABA combinations [58]. Projects using administrative databases need to be aware of the possibility of extreme outliers and develop decision rules a priori for exclusion of extraordinary cases that bias results.

Confounding by events associated with comorbidities is also a problem in the follow-up period. In a nationwide, population-based study of comorbidities associated with COPD, Baty et al. [30] identified a few with the most prognostic significance among COPD patients who had hospitalizations related to COPD (N = 160,317 patients). Lung cancers, lymphatic neoplasms, obesity-hypoventilation syndrome, pseudomonas pneumonia, and secondary polycythemia were associated with reduced time to rehospitalization, while asthma was associated with longer time to rehospitalization. The large number of comorbidities associated with COPD as compared to age- and sex-matched controls suggests that a CER project is very likely to find random differences in comorbidities between treated COPD groups by chance alone.

6.1 Sensitivity Analyses

Sensitivity analyses are secondary analyses that are intended to examine assumptions about important covariates, the impact of specific inclusion/exclusion criteria, and the differences among various analysis methods. Sensitivity analyses are especially important in studies that use multivariable analyses, where the effects of interactions among covariates can be obscured. Inherent in any study are a multitude of decisions about classifications and methods. To the extent possible, decisions that have a greater likelihood in impacting findings should be investigated through sensitivity analyses. For example, decisions about exacerbation event duration, assumptions about treatment adherence, or definitions of exacerbations that could vary by the number of days between a doctor’s visit with a primary respiratory diagnosis and an outpatient claim for an OCS, may need to be tested across a range of alternate values to see what impact these assumptions may have had on the final results.

7 Interpretation of Results and Conclusions in CER

An important aspect of CER studies in asthma and COPD is that they often have multiple outcome measures of interest (Table 1) which increases the probability that there will be at least one ‘statistically significant’ difference between treatments by chance alone (Table 4). Another aspect to consider is that the database studies often include thousands of patients, so their size alone gives them power to detect very small differences between the treatment groups. However, a small difference between treatment groups with a p-value well below 0.05, or even 0.001, does not prove that the difference is either clinically significant or due to a treatment effect. Absolute standardized differences can provide better information than p-values about effect size, and are starting to appear more in published CER studies. Statistical tests help determine the degree to which differences can be attributed to random errors, but study design factors are more important determinants of establishing causality [62]. One must always keep in mind that retrospective analyses, because of their high likelihood of bias, are not reliable proof of causal associations. However, that is not to say that the results are not valuable. An analysis that explores the sources of variability and imprecision among outcomes can reveal very useful insights into the clinical factors that help determine the success and failures of treatments in real life.

Determining a clinically important and significant difference between treatment groups in CER, as opposed to a statistically significant difference, is a subjective assessment. In most retrospective comparisons from observational databases, odds ratio differences of less than 2.0 should be regarded with caution [63]. A helpful way to understand the clinical relevance of a difference in outcome between two treatments is to present a comprehensive assessment of all demographic and clinical factors associated with an outcome in the study population, then compare the magnitude of the treatment-related differences with the other clinical factors affecting outcome [64]. For example, a 72 % difference in pneumonia incidence between two COPD treatment groups with p-value <0.001 may sound like an important finding [22]. However, if the incidence of pneumonia more than doubles with COPD severity, various comorbidities, and advanced age, then a 72 % difference could be within the range expected from selection biases, misclassification errors, and unmeasured confounders [65].

8 Conclusions

Naturally there is a lot of interest in which treatment is better when there are choices within the same class. However, because of the limitations in retrospective study designs and databases, retrospective analyses of population databases and observational cohorts can rarely be conclusive about the causal associations between treatments and outcomes. In our compulsion to see which treatment is better, we often lose sight of the fact that both treatments are substantially improving outcomes as compared to those not treated, and the relative difference between treatments is within the range of random variation, not only statistically but also in terms of the robustness of the study design. The intense focus on proving which treatment is ‘better’ can be a distraction away from understanding their role and relationships with the other determinates of overall health outcomes for asthma and COPD patients.

Given the many potential sources of bias that we have identified in CER studies of asthma and COPD, one might wonder if these studies are at all reliable or interpretable. However, if one embraces these sources of variation instead of hiding or ignoring them, and one directly addresses them in CER studies, then a truer depiction of how asthma and COPD patients are managed in real-life can be recognized. Both reproducibility, referring to method or measurement precision, and repeatability, referring to agreement across similar populations, are important concepts for both RCTs and CER studies. Indeed, much of the value of CER studies lies in the ability to conduct multiple CER studies using equivalent methods in an efficient manner using minimal resources. Additional CER is needed to understand how these treatments are being used in patients in the general population who do not fit the restrictive inclusion and exclusion criteria of RCTs. There is also a need for studies that compare various analysis approaches and statistical methods to deal with some of the study design limitations we have discussed, particularly in treatment adherence and bias by indication.