Introduction

A meta-analysis of randomized clinical trials (RCTs) synthesizes evidence to inform clinical guidelines and policies [1, 2]. RCTs attempt to reduce bias and establish two or more similar groups at baseline, suggesting that any subsequent differences between groups are caused by allocated interventions. An imbalance in prognostic factors (i.e., age or condition severity) between groups may occur at baseline within a trial by chance [1, 3, 4]. There are also nonrandom reasons. Insecure allocation concealment (investigators influencing the allocation of participants to groups) may bias the results [2, 5, 6]. Deviating from the intention-to-treat analysis method (such as a “per-protocol” analysis) and reporting data for only a subset of the randomized participants may also distort the results of RCTs [7, 8].

Meta-analysis of baseline values may indicate evidence of bias at baseline for important prognostic factors, which may threaten the validity of the results [1,2,3, 9]. In a meta-analysis of calcium supplements for weight loss, the authors found that participants allocated to the treatment groups had a body mass that was 1.73 kg (95% confidence interval [CI] 2.97 to 0.5) lower than that of participants allocated to the control groups [1]. When this baseline difference was statistically controlled in an updated meta-analysis and meta-regression, the effect of calcium supplements was no longer statistically significant, suggesting that the result from the original meta-analysis was largely due to the baseline imbalance in body mass [1]. Similarly, in a meta-analysis of oseltamivir for treating flu, the author found that fewer participants who tested positive for influenza were assigned to the treatment group than to the control group (relative risk = 0.95, 95% CI 0.91 to 0.99), suggesting bias in treatment allocation [9]. Baseline imbalances have also been identified in other studies [2, 3, 10].

Recently, we observed baseline imbalance in our meta-analysis examining the antihypertensive effects of isometric exercise compared to nonexercise controls in adults with elevated blood pressure [11]. The pooled baseline systolic blood pressure (SBP) was 4.78 mmHg higher (95% CI 4.03 to 5.52) in the exercise group than the control group, and the pooled baseline diastolic blood pressure (DBP) was 5.48 mmHg higher (95% CI 5.10 to 5.68) in the exercise group than the control group. This may indicate bias across the studies and question the validity of the estimated treatment effects, given that the pooled exercise group was, on average, substantially different from the control group in the outcome of interest at baseline. These differences were largely driven by one comparatively large study (n = 400) with a large baseline imbalance between the intervention and control groups [12]. When we removed this one study, the baseline imbalance and heterogeneity were largely attenuated (SBP 0.64 mmHg, 95% CI −0.58 to 1.85; DBP 1.23 mmHg, 95% CI 0.13 to 2.34).

Therefore, this meta-epidemiological study examined baseline imbalance in the comparisons of various exercise and antihypertensive medicines. It also examined whether sample size and/or the risk of selection bias were associated with potential baseline imbalances.

Methods

We preregistered the protocol for this study on the Open Science Framework (osf.io/dgu9b). We obtained the datasets of 391 RCTs (197 exercise trials and 194 antihypertensive medicine trials) used in a network meta-analysis of different modes of exercise and classes of antihypertensive medicines that was published in a leading sports medicine journal and has had substantial impact in its field [13].

One author (MAW) identified the manuscripts of each included RCT. Two authors (MAW and one of HJH, BS, YLG, or SRGD) independently extracted the number of participants and mean and standard deviation (SD) values at baseline for three outcomes: SBP, DBP, and age. We only extracted data for the groups included in the network meta-analysis. We preferentially extracted baseline data from all randomized participants in each RCT (typically outlined in Table 1 of a study), followed by data for participants who were analyzed. Discrepancies between authors were resolved via discussion and arbitration with a third author (MDJ) if needed. Where necessary, we transformed values reported in other forms to the mean and SD [14]; these data were most commonly presented as the median and range/interquartile range. We extracted data from figures using WebPlotDigistizer [15].

Table 1 Results for systolic blood pressure at baseline from 190 exercise studies

Because our study focused on the level of exercise mode (e.g., endurance, resistance, isometric or combined exercise) or medicine type, we combined intervention groups within RCTs that used the same exercise mode or medicine type (e.g., groups that examined high-intensity and low-intensity endurance exercise or different dosages of the same medicine) [14].

Two authors (MAW and one of HJH, BS, YLG, or SRGD) independently appraised each study for risk of bias in allocation concealment using guidance from the Cochrane Risk of bias tool [16]. We noted studies where data were not available for all participants at baseline (such as studies that only presented baseline data for a per-protocol analysis). Discrepancies between authors were resolved via discussion and arbitration with a third author (MDJ). We did not contact the authors of the studies to request any data due to the large quantity of articles included in this study, as well as the substantial difference in publication dates among articles investigating exercise (mean = 2008, range 1976 to 2018) and medicines (mean = 1994, range 1968 to 2009). The mean difference in publication dates was 14 years (95% CI 13 to 17). Given the age of the medicine articles, many had no email contact available.

We noted four studies in the endurance vs. control comparison that provided duplicate baseline data. We removed the duplicate studies, leaving only the baseline data from the original study publications.

Statistical analysis

We examined baseline imbalance for the outcomes of SBP, DBP, and age. We selected SBP and DBP because we observed baseline imbalance in these variables in our previous review, and they are often the primary outcomes in the management of hypertension [11]. We also selected age because it is a commonly reported variable, and an imbalance in this variable has been noted in previous research from other fields [2, 17]. Given that SBP, DBP, and age are related to other indicators of cardiometabolic health (e.g., body mass index), our outcome choices were likely to capture potential imbalances in other outcomes [18, 19]. The reduced number of outcomes also reduced our risk of type 1 error.

We examined baseline imbalance in the following comparisons:

  • Endurance exercise vs. control groups.

  • Resistance exercise vs. control groups.

  • Isometric exercise vs. control groups.

  • Combination exercise vs. control groups.

  • Diuretic vs. control groups.

  • Calcium channel blocker (CCB) vs. control groups.

  • Beta-blocker vs. control groups.

  • Angiotensin II receptor blocker (ARB) vs. control groups.

  • Angiotensin-converting enzyme (ACE) inhibitor vs. control groups.

Data permitting, we also analyzed the following comparisons between two interventions:

  • Endurance exercise vs. resistance exercise.

  • Endurance exercise vs. isometric exercise.

  • Endurance exercise vs. combination exercise.

  • Resistance exercise vs. combination exercise.

  • Isometric exercise vs. combination exercise.

  • Diuretics vs. ACE inhibitors.

  • Diuretics vs. ARBs.

  • ACE inhibitors vs. ARBs.

We used a fixed-effect meta-analysis to compare groups, which is the appropriate model in this circumstance because the true effect is zero difference between groups [1,2,3, 17, 20]. We considered groups to be different at baseline if the mean difference and 95% CI did not cross zero. We quantified heterogeneity with the Cochran Q test and considered I2 > 30% to indicate substantial inconsistency [2, 17]. We conducted univariate meta-regression to examine the potential moderators of total sample size (continuous variable), risk of selection bias (Low risk or Unclear/High risk), and whether baseline data were available for all randomized participants (Yes or Unclear/No). We did not investigate the impact of moderators on absolute imbalance (which ignores the direction of the imbalance), as this may disguise random variation that is expected in baseline data and artificially induce false imbalances.

We performed a preplanned sensitivity analysis on the exercise trials that examined participants with a mean baseline SBP ≥ 140 mmHg (the definition of hypertension in the network meta-analysis). This was not required for antihypertensive medicines because all participants had hypertension.

Results

Exercise

Of 193 RCTs of exercise, 190 were included. All studies used a parallel group design, but three RCTs compared two doses of the same exercise type and therefore were not analyzed. We extracted data from figures for six studies. Seven studies had a low risk of bias for allocation concealment, four studies had a high risk of bias, and the remaining 179 studies had an unclear risk of bias. Ninety-four studies reported baseline data for all randomized participants.

All participants

There were no baseline imbalances in any comparisons for SBP (Table 1). There was substantial inconsistency in the resistance exercise vs. control groups comparison (I2 = 33.0%; Supplementary Fig. 1), as well as some evidence of inconsistency in the endurance exercise vs. control groups comparison (I2 = 14.4%) and in the resistance exercise vs. combined exercise comparison (I2 = 22.0%). No moderator analyses were statistically significant.

There were no baseline imbalances in any comparisons for DBP (Table 2). There was substantial inconsistency in the endurance exercise vs. control groups (I2 = 30.3%; Supplementary Fig. 2), resistance exercise vs. control groups (I2 = 41.0%; Supplementary Fig. 3), and resistance exercise vs. combined exercise (I2 = 35.4%: Supplementary Fig. 4) comparisons. Sample size was a significant moderator in the endurance exercise vs. control groups comparison (ß = 0.00 (95% CI 0.00 to 0.01), p < 0.01; increasing sample size associated with higher DBP in endurance exercise) and the resistance exercise vs. control groups comparison (ß = −0.06 (95% CI −0.11 to −0.01), p = 0.01; increasing sample size associated with higher DBP in control groups). Data from all participants at baseline were a significant moderator in the endurance exercise vs. control groups comparison (studies not reporting data for all participants at baseline had a significantly higher DBP in endurance exercise compared to studies reporting all data; difference between subgroups = 1.30 mmHg (95% CI 1.84 to 0.77), p < 0.01) and the resistance exercise vs. combined exercise comparison (studies not reporting data for all participants at baseline had a significantly higher DBP in the combined group compared to studies reporting all data; difference between subgroups = 5.15 mmHg (95% CI 0.23 to 10.08), p = 0.04).

Table 2 Results for diastolic blood pressure at baseline from 190 exercise studies

In the analysis of age (Table 3), there was baseline imbalance in the resistance vs. control groups comparison: the resistance group was 0.3 years younger (95% CI 0.6 to 0.1) than the control group. Inconsistency was detected in the endurance exercise vs. control groups (I2 = 14.1%), resistance exercise vs. control groups (I2 = 14.6%), combined exercise vs. control groups (I2 = 16.8%), and endurance exercise vs. resistance exercise (I2 = 13.4%) comparisons. Data from all participants at baseline were a significant moderator in the combined vs. control groups comparison (studies not reporting data for all participants at baseline had a significantly higher age in combined exercise compared to studies reporting all data; difference between subgroups = 1.20 years (95% CI 2.28 to 0.11), p = 0.03).

Table 3 Results for age at baseline from 190 exercise studies

Participants with hypertension only

There were no baseline imbalances in any comparisons for SBP (Supplementary Table 1). There was substantial inconsistency in the endurance exercise vs. control groups comparison (I2 = 44.9%; Supplementary Fig. 5). Sample size was a significant moderator in the endurance exercise vs. control groups comparison (ß = 0.02 (95% CI 0.01 to 0.03), p < 0.01; increasing sample size was associated with higher SBP in endurance exercise).

There were no baseline imbalances or any inconsistencies in any comparisons for DBP (Supplementary Table 2). Sample size was a significant moderator in the endurance exercise vs. control groups comparison (ß = 0.01 (95% CI 0.00, 0.01), p < 0.01; increasing sample size was associated with higher DBP in endurance exercise).

There were no baseline imbalances in any comparisons for age (Supplementary Table 3). There was substantial inconsistency in the combined exercise vs. control groups comparison (I2 = 34.8%; Supplementary Fig. 6). Data from all participants at baseline were also a significant moderator in the combined vs. control groups comparison (studies not reporting data for all participants at baseline had a significantly higher age in combined exercise compared to studies reporting all data; difference between subgroups = 3.09 years (95% CI 5.81 to 0.36), p = 0.03).

Medicines

Of 194 RCTs of antihypertensive medicines, 152 were included. We were unable to analyze 42 crossover RCTs because the data were not presented separately for groups at baseline (39 compared beta-blockers to placebos and 3 compared diuretics to placebos). We extracted data from figures for four studies. One study had a low risk of bias for allocation concealment, and the remaining 151 studies had an unclear risk of bias. Baseline data were reported for all randomized participants in 105 studies, with the remaining studies either not reporting data for all participants (n = 34) or having insufficient information to determine a judgment (n = 13).

None of the comparisons in SBP (Table 4), DBP (Table 5), or age (Table 6) displayed evidence of baseline imbalance. There was inconsistency for SBP in the ACE inhibitor vs. control groups comparison (I2 = 16.0%) and in the CCB vs. control groups comparison (I2 = 5.8%). There was inconsistency for DBP in the beta-blocker vs. control groups (I2 = 3.6%), ARB vs. control groups (I2 = 16.7%), and CCB vs. control groups (I2 = 20.4%) comparisons. There was inconsistency for age in the diuretic vs. control groups (I2 = 5.8%), beta-blocker vs. control groups (I2 = 19.4%), and ACE inhibitor vs. control groups (I2 = 20.4%) comparisons. One moderator was significant: increasing sample size was associated with a higher baseline SBP in the beta-blocker group in the beta-blocker vs. control groups comparison (ß = 0.01 (95% CI 0.00 to 0.01), p < 0.01).

Table 4 Results for systolic blood pressure at baseline from 152 antihypertensive medicines studies
Table 5 Results for diastolic blood pressure at baseline from 152 antihypertensive medicines studies
Table 6 Results for age at baseline from 152 antihypertensive medicines studies

Discussion

Our meta-epidemiological study found one occurrence of baseline imbalance in the exercise comparisons and several occurrences of substantial inconsistency. It is the first to use a network meta-analysis as the data source, allowing us to explore a comprehensive dataset that reflects a large body of literature across different guideline-recommended types of exercise and medicines for the management of hypertension.

We observed one instance of pooled baseline imbalance of 0.3 years in the resistance vs. control groups comparison. A statistically significant difference at baseline should not be present, assuming the individual trials included in the analysis are methodologically sound and reported accurately. This may indicate a failure in the methodological procedures of some studies. None of our moderator analyses found statistically significant associations with this imbalance, but these are limited by poor reporting. Previous research noted pooled baseline imbalance in age in several meta-analyses [2], arguing that it is a marker for poor allocation practices that weaken the strength of a review’s conclusions. There may be methodological bias within some types of exercise for the management of hypertension.

There were no baseline imbalances in SBP or DBP in the exercise comparisons, contrary to our previous research [11]. We did observe substantial inconsistency in several comparisons, which is consistent with previous research [2, 17, 20]. This may be due to a few outlying trials within an analysis that contain marked baseline imbalance. For example, the SBP of the resistance exercise group in the Conceicao et al. study was 26.6 mmHg higher than that of the control group at baseline [21]. This study was an outlier, which may contribute to the inconsistency because other studies in this comparison had SBPs that ranged from 10.6 mmHg higher in the resistance group to 11.0 mmHg lower in the resistance group. The minimal clinically important difference for SBP is approximately 5 mmHg [11], indicating that some of these baseline imbalances are several-fold greater than the minimal clinically important difference. Outlying values may indicate bias within a study (e.g., poor randomization or allocation procedures) [2]. The results will also differ markedly depending on whether the change score or follow-up score is used in a meta-analysis [22]. The Conceicao et al. study reported that the SBP of the resistance exercise group changed from 138.4 mmHg to 130.8 mmHg, while that of the control group increased from 111.8 mmHg to 113.3 mmHg; if follow-up scores were used, the mean difference (17.5 mmHg) favored the control group, but if change scores were used, the mean difference favored the resistance group (−9.1 mmHg). The presence of heterogeneity/inconsistency at baseline in a meta-analysis can weaken its findings, especially if no attempt is made to adjust for these impacts [2].

We did not identify any baseline imbalance in the studies of medicines, and inconsistency did not reach our prespecified threshold for concern, although it may not be ignorable. However, it is important to note that fewer studies (but not fewer participants) were available for these comparisons because a) many studies used a crossover design, which could not be included, and b) there were many more missing data in these comparisons due to the age of the studies.

We observed that sample size and data availability at baseline were associated with the magnitude of the baseline difference in several comparisons, although not always favoring the intervention group, which would be the favorable direction if there is presumed investigator bias. The statistically significant associations between sample size and baseline imbalance were very small across all outcomes (ß ranging from 0.0 to 0.06), suggesting that these findings are likely to be spurious relationships with little meaning, driven by random imbalance in a larger trial in the analysis. However, the identification of marked imbalances in larger trials may also indicate further investigation over potential concerns about research integrity. We did not observe significant associations with allocation concealment, contrary to previous research [2]. However, these findings are limited by the small number of studies in some moderator analyses and poor reporting. Most studies of types of exercise and medicines did not clearly report their procedures for allocation concealment, which continues to be a concern for RCTs [23]. Given the much more recent publication of the exercise studies, this issue must be resolved to reduce the risk of bias. Compared to 47 medicine studies (31%), ninety-six exercise studies (51%) did not report baseline data for all randomized participants, instead only reporting data for participants who completed the study. This reporting threatens the internal validity of the results because missing data due to participant adherence in an RCT may not simply be ‘missing at random.’ RCTs should report baseline characteristics for all randomized participants in a publication irrespective of whether they completed the study.

Researchers who conduct meta-analyses in the field of exercise should initially consider conducting meta-analyses of baseline values of key prognostic variables to identify imbalances or heterogeneity [20]. We selected SBP, DBP, and age in this study because these variables are effect modifiers and are likely to encompass other cardiometabolic variables (e.g., body mass index). In other fields, different outcomes may be more relevant. Several methods have been proposed to explore heterogeneity/inconsistency in a meta-analysis of baseline values and identify ‘suspect’ RCTs; however, there is currently no consensus on how best to combat heterogeneity once it is identified [1, 2, 17, 20, 24, 25]. Some researchers promote individual patient data meta-analyses [1, 20, 25], while others propose simpler methods to identify and remove the individual studies that contribute the largest amount of baseline heterogeneity [20]. However, these methods cannot solve the underlying issue that some RCTs may not be conducted properly or may even be fraudulent [26]. Increasing methodological quality and reducing the risk of bias would improve the evidence base supporting the use of exercise and medicines in the management of hypertension.

Conclusion

We identified baseline imbalance and inconsistency in meta-analyses of exercise and antihypertensive medicines. These results may indicate evidence of bias in randomization/allocation procedures in these trials, particularly in studies examining exercise. Systematic reviewers should conduct meta-analyses on important baseline characteristics.