Selection of covariates for adjustment in randomized trials is still frequently based on observed baseline imbalances between the study groups [1], even though this strategy is flawed and hence not recommended [24]. For example, relatively small imbalances (indicated by large P values) of strong prognostic factors may still result in bias, when omitting such variables from an adjustment model [3].

In observational studies, the selection of covariates for adjustment should not be based on baseline imbalances either [5, 6]. Nevertheless, it is likely that this practice is even more common in observational studies than in trials [7], since adjustment for confounding is known to be an important issue in observational designs. Similar to the situation in trials, a variable that is a strong prognostic risk factor of the outcome, yet weakly associated with exposure may not be selected for adjustment, yet such omission may result in confounding. Also, adjusting for variables that are related to the exposure under study, yet are no true confounding variables, may actually introduce bias, rather than remove it. Examples include so-called M-bias, Z-bias, and adjustment for variables that are intermediates in the causal chain [8, 9]. Hence, baseline imbalances should not guide selection of covariates for adjustment in observational research.

Using observational data on the effects of long-acting beta-agonist use on mortality risk in patients with obstructive pulmonary disease, we here illustrate that even a situation of ‘perfect’ balance of prognostic characteristics between study groups should not result in omitting such variables from being selected for adjustment for confounding. Before turning to this clinical example, we first illustrate the invalidity of this strategy for selecting confounding variables using a numerical example on hypothetical data.

Numerical example

Suppose an observational study was conducted among 20,000 subjects on the effects of a certain exposure. Two variables (e.g., age and gender) were considered potential confounding variables, because both were known risk factors for the outcome of interest. Age (dichotomized at e.g. 50 years), was imbalanced between the exposure groups: of those exposed 75% were of old age, whereas 25% of those unexposed were of old age. Gender, however, was equally distributed among the exposure groups, since both groups included 50% females (Table 1).

Table 1 Characteristics of a hypothetical study population of 20,000 subjects

The incidence of the outcome (e.g., mortality) among those exposed was 13.5%, and among those unexposed 19.5%, resulting in an estimated risk ratio (RR) of 0.69. Since gender was clearly balanced between the exposure groups, stratification by gender was not expected to result in a difference between the crude (i.e., unadjusted) RR and gender-adjusted RR. Indeed, after adjustment for gender the RR was equal to the crude RR (i.e., RR = 0.69).

Clearly, age was unevenly distributed among the exposure groups. Stratification by age controlled for the confounding by age and resulted in a change in the risk ratio: RR = 0.44. What is more, in these hypothetical data old age and female gender were related, such that women tended to be older (odds ratio = 6). However, by adjusting (stratifying) for age, the gender distribution that was initially balanced between exposure groups changed: the proportion females among exposed and unexposed subjects of young age became 20 and 40%, respectively. Among exposed and unexposed subjects of old age, the proportion females became 60 and 80%, respectively. Hence, due to the relation between age and gender, stratification by age resulted in an uneven distribution of gender among the exposure groups within age strata.

As a result, gender is likely to be considered a confounding variable within strata of young and old subjects. Indeed, stratification by gender after stratification by age resulted in another change in the risk ratio: RR = 0.50 (age- and gender-adjusted) versus RR = 0.44 (age-adjusted RR). In Table 2, the cell counts of the two-by-two tables for the exposure-outcome associations are given for the different age-gender strata. By merging these tables, the steps described above can be replicated in detail.

Table 2 Association between exposure and outcome within age-gender strata in a hypothetical study

Clinical example

It has been suggested that inhaled beta-agonist therapy for pulmonary obstructive diseases (i.e., asthma and COPD) increases the risk of major cardiovascular events [10]. To study the effects of ever versus never inhaled long-acting beta agonist (LABA) use on all-cause mortality, we used a sample from the Netherlands University Medical Center Utrecht General Practitioner Research Network on the period 1995–2005. Subjects were included in the cohort when a diagnosis of asthma [ICPC code R96], or COPD [ICPC code R95] was mentioned in the electronic database. Ever versus never exposure to LABA was based on ATC coding [ATC R03AC12, R03AC13, R03AK06, or R03AK07]. The relation between LABA use and mortality was analyzed using a Poisson regression model with robust standard errors to estimate risk ratios [11]. Potential confounding variables were age, gender, and a diagnosis of cardiovascular co-morbidity, because these are known risk factors for myocardial infarction. For this example age was arbitrarily dichotomized at 50 years: those older than 50 years, were considered ‘old’, the others ‘young’. Cardiovascular co-morbidity was considered present when a subject was treated with a cardiovascular drug (antithrombotic drugs [ATC B01], cardiac therapy [ATC C01], diuretics [ATC C03], beta-blockers [ATC C07], or agents acting on the renin-angiotensin system [ATC C09]).

Among 2,394 asthma and COPD patients included in the analyses, the LABA ever-users were considerably older than never-users (Table 3). These groups did not differ, however, with respect to cardiovascular co-morbidity status (P = 0.99), or gender (P = 0.98). Consequently, adjustment for cardiovascular co-morbidity status or gender did not change the observed risk ratio (RR) for mortality: unadjusted RR 1.19 (95% CI 0.93–1.51), RR 1.19 (95% CI 0.94–1.50) after adjustment for cardiovascular co-morbidity status, and RR 1.19 (95% CI 0.94–1.51) after adjustment for gender. However, adjustment for age affected the RR considerably: RR 0.95 (95% CI 0.76–1.19). In this clinical example, old age and presence of cardiovascular co-morbidity were related (odds ratio = 11). As a result, within age strata, cardiovascular co-morbidity was no longer balanced between groups of LABA users. For example, after stratification by age, the proportions of cardiovascular co-morbidity among ever-users and never-users of old age were 33.6 and 42.0%, respectively (P = 0.002). Due to these imbalances, additional adjustment for cardiovascular co-morbidity status indeed changed the risk ratio: RR 1.01 (95% CI 0.80–1.26). The stratum-specific RRs were indeed approximately similar (Table 4).

Table 3 Distribution of patient characteristics by ever versus never long-acting beta-agonist (LABA) use
Table 4 Association between ever versus never long-acting beta-agonist (LABA) use and mortality, stratified by age and co-morbidity status

Since old age was also related to female gender (odds ratio = 1.3), after stratification by age the groups of LABA users were no longer comparable with respect to gender either (e.g., proportions females among users among young ever-users and never-users were 40.5 and 46.5%, respectively (P = 0.04)). Consequently, additional adjustment for gender resulted in another change in the risk ratio: RR 0.98 (95% CI 0.79–1.23).


In observational studies, the selection of variables in a model to adjust for confounding is often based on known associations with the outcome under study (i.e., the variables are known risk factors for the outcome), and observed associations with the exposure of interest [7]. Potential confounding variables with an uneven distribution among the exposure groups are then selected for (multivariable) adjustment, whereas evenly distributed ones are omitted from the adjustment model. Both the hypothetical and clinical example show that this approach is incorrect and can result in relevant residual confounding.

The observation that a variable is equally distributed among exposure groups indicates that it is marginally (i.e. unconditional on other variables) independent of the exposure under study. If, however, two variables are marginally independent and both are related to a third variable, they are dependent, conditional on that third variable [12]. This means that although exposure and gender (hypothetical example) or LABA use and cardiovascular co-morbidity status or gender (clinical example) were marginally independent, they were dependent conditional on age, because both were related to age.

The amount of (residual) confounding by the initially balanced confounding variable after adjustment for age alone likely depends on the strength of the association between the two variables as well as the strength of the association between the initially balanced confounding variable and the outcome. In both examples these associations were substantial. Obviously, if age is not related to the initially balanced confounding variable, stratification by age will not result in an uneven distribution of the latter variable within age strata, and hence no residual confounding due to that variable. In the clinical example, two initially balanced confounding variables became imbalanced after stratification by age. In practice, the number of initially balanced confounders could be even larger and residual confounding due to omitting all these variables from the adjustment model may become substantial, especially when these variables are strong risk factors for the outcome. Likewise, adjusting only for imbalanced baseline covariates in a randomized trial may actually induce bias by imbalancing other baseline covariates that are strong risk factors for the outcome.

In textbooks on epidemiology, a confounding variable is defined as a variable that is a risk factor for the outcome under study and also related to the exposure of interest [13, 14]. Furthermore, an intermediate to the causal chain is by definition not a confounding variable. Thus, what is considered a confounding variable depends on the outcome of interest and exposure under study and hence the clinical research question. However, it also depends on the stage of analysis, since in the examples presented here, gender and co-morbidity status did not confound the observed crude association, but they were confounding variables for the age-adjusted association.

Different strategies for selecting confounding variables have been proposed. A frequently applied strategy is based on some change-in-estimate criterion (e.g. 10% change in OR), but variables may then be falsely identified as confounding variables due to non-collapsibility [15]. Statistical tests to assess whether a certain variable is associated with either the exposure, the outcome, or both, are typically insensitive in small datasets, but raising the significance level can reduce this problem [16]. However, even ‘perfect’ balance of prognostic characteristics among exposure groups can result in confounding (as shown in our examples). Based on prior knowledge, common causes of both exposure and outcome (or causes of either exposure or outcome [17]) may be identified. Obviously, this relies on available knowledge, but in any case established risk factors for the outcome will be selected. Even if these variables are not related to exposure, statistical power will likely increase with adjustment for such risk factors [18]. Hence, selection of confounding variable for adjustment starts with identifying risk factors for the outcome.

In conclusion, a risk factor for the outcome under study that is evenly distributed among exposure groups can still be a confounding variable. Hence, observed balance of important prognostic variables among the exposure groups in a baseline table should not result in omitting such variables from the model to adjust for confounding.