Introduction

The recently reported trial of “low” (6 ml/kg predicted body weight) tidal volume ventilation in acute lung injury (ALI) and acute respiratory distress syndrome (ARDS) by the ARDS Network [1] in 2000 appears to have established the efficacy of this ventilatory strategy although initial concerns were expressed in correspondence regarding the “unconventionally high plateau pressures in the group treated with traditional tidal volumes” [2]. The smaller difference in plateau pressures between treatment arms had already been advanced in the accompanying editorial [3] to the ARDS Network publication as a potential reason for the failure to demonstrate a treatment effect in three previous trials [4, 5, 6] of lung protective ventilation. Similarly, an editorial in early 2002 reviewing what was “best for ARDS management” concluded that, “These studies do not tell us whether ARDS patients should be ventilated with a tidal volume of 6 ml/kg body weight or simply only less than 12 ml/kg” [7].

Against this background, two more recent papers have added to the debate on protective ventilation; a meta-analysis [8] of the five randomised controlled trials of this intervention [1, 4, 5, 6, 9] and a health policy report detailing the institutional responses to the controversies over “How best to ventilate” [10]. Considerable retort has followed this meta-analysis: an editorial [11], correspondence from the original ARDS Network [12] trial authors and exchanges over the detail of both the meta-analysis and the trials.

The purpose of the present analysis is to contribute to the “continued scrutiny” [13] of the trials above, an endeavour parallel to that of the recent review by Petrucci and Iacovelli [14]. First, we departed from the approach of Eichacker et al. [8] who grouped the trials into beneficial/non-beneficial based upon heterogeneity and thus did not consider the implications of a pooled estimate of the efficacy of protective ventilation. Second, and contingent upon this pooled estimate, we investigated the underlying cause(s) of the heterogeneity using meta-regression [15]. Rather than the existence of heterogeneity precluding the finding of covariate modification of the pooled estimate [16, 17], the “primary value of meta-analysis is in the search for predictors of between-study heterogeneity” [18] and serves also to formalise the attempt by Eichacker et al. [8] to relate plateau pressure and mortality. Devoid of a pooled estimate, there is limited ability to examine those variables which may have determined heterogeneity [11]. Third, we highlight the question of the cause of the observed treatment effect in the two “positive” trials [1, 9]: an increase in control mortality as a consequence of high plateau pressures or decrease in treatment arm mortality due to low plateau pressures.

Some debate has occurred over the propriety of the increment of mechanical tidal volume in the treatment arm of the ARDS Network trial [19]; this question is not directly canvassed; rather the comment by Senn [20] is noted: “Clinical trials are not and never will be representative of general medical practice”.

Methods

The study population comprised the five trials [1, 4, 5, 6, 9] identified above [8] using protective ventilation in ALI/ARDS. The outcome end-point was 28-day mortality except for the Brower et al. [5] and Stewart et al. trials [6], where hospital mortality was used. Data for 28-day outcomes in both the Multicenter Trial group on Tidal Volume Reduction in ARDS (trial report, 60 day outcome [4]) and the ARDS Network Trial (trial report, 180 day outcome [1]) were supplied on request to the study authors. The relationship between Acute Physiology and Chronic Health Evaluation (APACHE) versions III and II scores was deemed to be: APACHE III score=4.48+3.259 ×APACHE II score (R2=0.81, P=0.0001). This was based upon a sample of 73,000 patients from the Australian and New Zealand Intensive Care Society (ANZICS) data-base, as recently reported [21]. Table 1 presents patients’ outcomes and demographic characteristics (age, gender, APACHE II score at study entry) for each arm of the trials. Overall the trials included 1,202 patients: 605 receiving protective ventilation and 597 control ventilation. Mean age was 50 years in both arms, 41% were women in the treatment arm and 38% in controls, and mean APACHE II score was 21.7 in the treatment arm and 21.8 in control.

Table 1 Patient outcomes, demographics and treatment variables (mean values) (N/A not available)

For individual trials values for mean mechanical tidal volume in milliliters per predicted kilogram, after the definitions of the ARDS Network [1], were derived from: the two trials in which tidal volume was prescribed according to these definitions [1, 5], estimates made by the ARDS Network [1] authors for two trials [4, 6], and calculated from values reported in the original meta-analysis [8] according to the estimated ratio, measured body weight=1.2 ×predicted weight [19], for the remaining trial [9]. Estimated average plateau pressures over day 1 and days 1–7 of ARDS were obtained from the same trial reports; in the Amato et al. trial [9] day 1 values were those over the first 36 h; in the Brower et al. trial [5] values for day 1 were derived from the graphic and values, as reported in the paper, for days 1–5 only were used. The range of mechanical tidal volumes was 6.3–8.1 ml/kg-predicted in the treatment arm and 10.2–14.2 ml/kg-predicted in controls. The range of plateau pressures was 21–30 cmH2O in the treatment arm and 27–37 cmH2O in controls.

The analytic strategy was: (a) Initial determination of the pooled treatment effect (fixed effects) as odds ratio [OR, treatment arm (protective ventilation) vs. control (control ventilation)] using the “metan” routine [22] and Stata statistical software (version 8.2, 2003, Stata, College Station, Tex., USA). Dependent upon evaluation of heterogeneity (see a, below), random effects estimates were also determined. Cumulative meta-analytic estimates, whereby the cumulative estimate up to and including each individual trial, were also graphically displayed [23]; trial year-date was determined as being that of actual trial termination in the respective published reports. (b) Assessment of heterogeneity: the extent of heterogeneity was assessed by the Q statistic, the Breslow-Day and Zelen exact test of homogeneity of ORs (StatXact 4 for Windows release 4.0.1, Cytel Software, CambridgeMass., USA). The (p value) level at which heterogeneity should be diagnosed is unclear, given that the Q statistic has low power, and Fleiss has recommended a value of at least 0.1 [24]. The impact of heterogeneity upon (the pooled estimates of) the meta-analysis was assessed using the H and I 2 statistics, where values of H >1.5 and I2 “substantially” >50% would give reason for “concern” and values of H <1.2 and I2 <30% would not be “disquieting” [16]. Heterogeneity was also assessed graphically using the Galbraith plot, where point-estimates outside the confidence intervals (2 units) have a major impact on heterogeneity [25] and an influence analysis [26]. (c) Meta-regression [15] was performed using the mean recorded values (above) of mechanical tidal volume and plateau pressures (control and treatment groups and differences, control vs. treatment group) for day 1 and days 1–7 as independent variables (dependent variable, log OR). Regression with both tidal volume and plateau pressure, plus interaction (volume-pressure product, VPP), was also performed. Statistical significance was set at p≤0.05. Further detail of the analytic instruments above is provided in Electronic Supplementary Material E1.

Results

The meta-analytic fixed effect pooled estimate (OR) of treatment effect (protective vs. control ventilation) was 0.71 (95% CI 0.56–0.91, p=0.006). However, heterogeneity was present at a p value of 0.06 (as assessed by the Q and Breslow-Day test and at 0.08 by the Zelen exact test). The H and I2 statistics were 1.5 and 56.8%, respectively. The corresponding random effects estimate was 0.80 (95% CI 0.49–1.31, p=0.37). A “Forrest plot” of the random effects estimates is presented in Fig. 1, with trial weights reflected as the point-estimate box size; the non-significant pooled estimate favoured protective ventilation. The Galbraith plot (Fig. 2) revealed substantial horizontal displacement of the ARDS Network trial due to its size and some scatter of the other trials between the confidence intervals, but, importantly, no trial was outside this range. Similarly, the influence analysis failed to show a substantive effect of deletion of any one trial on the “deleted” estimate with respect to the overall point estimate (Electronic Supplementary Material, Fig. E2.1). The implication of these two graphical displays was that heterogeneity was, at worst, modest, in agreement with the value of the H and I2 statistics. This impression was supported by the cumulative meta-analysis, which demonstrated an initial shift of the point-estimates over time, but these estimates became relatively stable at approx. OR=0.8 with the last three trials [1, 4, 6] and 95% confidence intervals of treatment effect spanned OR=1 from the second trial (Brower et al. [5]) onwards (Electronic Supplementary Material, Fig. E2.2).

Fig. 1
figure 1

Mortality outcome: Forrest plot, random effect estimates. Horizontal axis Odds ratio; vertical axis individual trials; vertical solid line line of null effect; vertical dashed line pooled estimate; horizontal lines 95% CI of point estimates indicated as solid squares, the size of which reflects the weight (% weight column) accorded the study in the analysis

Fig. 2
figure 2

Galbraith plot. Horizontal axis Reciprocal of the standard error (SE) of the estimate of effect (θ); vertical axis, Z statistic (θ/SE θ); middle solid line (unweighted) regression line constrained through the origin interval has a slope equal to the overall log odds ratio; outer lines 95% CI (2 U of Z statistic); names, individual trials (first author)

The meta-regression of mechanical tidal volume vs. log OR yielded a significant relationship for the treatment ventilation arm (p=0.05) such that below a tidal volume of 7.7 ml/kg-predicted the log OR fell below 0 (≡ OR of 1) and a benefit, treatment vs. control was evident (Fig. 3, left panel). A borderline significant relationship for the control ventilation arm (p=0.08) was also evident such that above a tidal volume of 11.2 ml/kg-predicted the log OR fell below 0 (≡ OR of 1) and a detriment, control vs. treatment, was evident (Fig. 3, right panel). The log OR also showed a significant relationship (p=0.004) with tidal volume difference (control vs. protective ventilation) such that log OR was less than 0 (OR <1) for a tidal volume difference greater than 4.2 ml/kg-predicted (Electronic Supplementary Material, Fig. E2.3).

Fig. 3
figure 3

Log odds ratio vs. average mechanical tidal volume, protective and control ventilation. Two panels have same configuration with mechanical ventilation tidal volume on the horizontal axis and treatment effect (as log odds ratio) on vertical axis. Individual studies are named and highlighted with circle size proportional to inverse of the variance of the treatment effect. Left Treatment arm (protective ventilation); right control arm (standard ventilation). Regression relationship (log OR vs. tidal volume) is indicated by a long dash-dot line

Between plateau pressure and log OR on day 1 a significant relationship was demonstrated for the control (p=0.02) but not treatment (p=0.18) ventilation arm (Fig. 4) such that beyond a control plateau pressure of 29–30 cmH2O the log OR fell below 0 (≡ OR of 1) and a benefit, treatment vs. control, existed. Seen somewhat differently (Fig. 5, left panel), a plateau pressure difference (control vs. protective ventilation, day 1) greater than 5.5 cmH2O was associated with treatment arm benefit. Over days 1–7 a similar relationship between plateau pressure difference and log OR was seen, but the benefit for the treatment arm required a plateau pressure difference greater than 7 cmH2O (Fig. 5, right panel).

Fig. 4
figure 4

Log odds ratio vs. mean day 1 plateau pressure for treatment and control arms. Two panels have same configuration with (mean) plateau pressure (day 1) on the horizontal axis and treatment effect (as log odds ratio) on vertical axis. Individual studies are named and highlighted with circle size proportional to inverse of the variance of the treatment effect. Left Treatment arm (protective ventilation); right control arm (standard ventilation). Regression relationship (log OR vs. plateau pressure) is indicated by a long dash-dot line

Fig. 5
figure 5

Log odds ratio vs. difference in plateau pressures for treatment and control arms, day 1 and days 1–7. Two panels have same configuration with (mean) plateau pressure difference (control vs. treatment arm) on the horizontal axis and treatment effect (as log odds ratio) on vertical axis. Individual studies are named and highlighted with circle size proportional to inverse of the variance of the treatment effect. Regression relationship (log OR vs. plateau pressure) is indicated by a long dash-dot line. Left Mean plateau pressure difference for day 1; right mean plateau pressure difference for days 1–7

The univariate relationship between log OR and VPP (average plateau pressure over days1–7) was significant for control ventilation (VPP 310–530 ml/kg-predicted per 1 cmH2O) such that for a VPP greater than 355 ml/kg-predicted per 1 cmH2O the log OR was less than 0 and a benefit, treatment vs. control, existed (p=0.004; Electronic Supplementary Material, Fig. E2.4, right panel). For protective ventilation (VPP 160–200 ml/kg-predicted per 1 cmH2O) no relationship with OR was demonstrated (p=0.26; Electronic Supplementary Material, Fig. E2.4, left panel).

No multivariable regression was significant (P always >0.12 for individual regression coefficients) and no simple quadratic (non-linear effect) of volume, plateau pressure or VPP was demonstrated.

Discussion

Combining or splitting

The original meta-analysis [8] based its strategy of separate consideration of beneficial (n=2)/non-beneficial trials (n=3) upon the non-homogeneity of odds ratio as diagnosed by the Breslow-Day test (p=0.06). Using data from this meta-analysis [8] the pooled fixed effects estimate of treatment effect was calculated as 0.76 (95% CI 0.60–0.95, p=0.02), with heterogeneity being present at p=0.064 (Q and Zelen exact tests). The corresponding random effects estimate was 0.83 (0.52–1.34, p=0.43). For the current meta-analysis, which incorporated 28-day survival estimates for two trials [1, 4], not given in the initial trial reports, heterogeneity was also diagnosed at similar (p) levels. There was also evidence of clinical heterogeneity between the trials in terms of ventilation (the levels of positive end-expiratory pressure employed) and patient type (enrolment of those at risk of ARDS and number of organ failures). However, the heterogeneity was assessed as being of modest impact upon the (pooled) results of the current meta-analysis (H and I2 statistics), this supposition also being reflected in the graphical analyses. Thus the heterogeneity did not preclude the consideration of a consistent random effects pooled estimate [27], which was non-significant (p>0.05).

Trial conduct

As noted [12], four of the trials were stopped early because of efficacy [1, 9] or futility [4, 5], and such premature termination is known to bias treatment effects in individual trials [28]. Exaggeration of treatment effects also occurs when interventions are conducted in a non-blinded manner [29], as was obviously the case in the trial ventilation protocols. More importantly, the use of “stopping rules” may induce artificial heterogeneity into overviews of clinical trials [30] and increase the type I error rate in tests of homogeneity [31]. Heterogeneity may be assessed in trials where early stopping did not occur in order to quantify the impact of early stopping upon heterogeneity [30], but such an estimate is obviously vitiated in the current meta-analysis by the small number of such trials (one: [6]). However, it may be surmised that the heterogeneity detected in both the current and original meta-analysis was incremented by early stopping.

Heterogeneity predictors

Within the range of mechanical tidal volumes used in protective ventilation in the five trials a beneficial trial effect, at p=0.05, was demonstrated (log OR<0) with tidal volumes less than 7.7 ml/kg-predicted in the treatment arm, whereas adverse effects were noted with control tidal volumes greater than 11.2 ml/kg-predicted, albeit at p=0.08 (Fig. 3). Adverse effects (Fig. 4) of plateau pressure appeared at values higher than 30 cmH2O (achieved in the control group only), a threshold similar to that identified by Petrucci and Iacovelli [14], although the latter used subset analysis at an empirical threshold rather than derivation from a general (meta-)regression approach. Thus our results counter the claim that “as long as tidal volumes produce airway pressures between 28 and 32 cmH2O, there is no benefit from using low tidal volumes... and it may be harmful” [8] and support the conclusions (based upon an individual patient analysis) of Amato et al. [32], who noted “no evidence of harm of the lower tidal volume strategies”. However, their analysis was restricted to the three “non-beneficial trials” [4, 5, 6], and their use of non-linear (Cox) regression models to analyse and (covariate) adjust the results is known to increase the variance of treatment estimates (and displace them from the null [33]), as illustrated by the 95% CI of the “best adjusted” model compared with the “univariate” model (Fig. 1 in [32]). The claim in reply by Eichacker et al. [34] that, “Even the ‘best adjusted’ 95% confidence intervals suggest that there is a one in three chance that low tidal volumes produce an increase in mortality rates” must be understood in this context. Moreover, the interpretation given to these (95%) confidence intervals is somewhat problematic: the apposite interpretation of a 95% CI of a parameter θ is that in (an infinite number of) repetitions of a study, an exact proportion (95%) of all such intervals would enclose θ. Once the data have been collected, and a single 95% CI has been calculated, the probability that θ lies within this CI is now 0 or 1, that is, a 95% CI is not equivalent to a 95% probability interval, which has a Bayesian explication.

A log OR–VPP relationship was established for VPP greater than 355 ml/kg-predicted per 1 cmH2O, which levels were seen only with control ventilation; no relationship was evident for the VPP range of protective ventilation. However, caution needs to be exercised over the interpretation of VPP [35], as it represents an interaction term and the “main effects”, volume and plateau pressure, were not significant in a multivariable meta-regression (see above).

The treatment effect favouring protective ventilation was also dependent upon the existence of a threshold difference (control vs. treatment ventilation) of both tidal volume and plateau pressure. Thus “modest” increases in mean plateau pressure from 30 to 33 cmH2O after randomisation in the control group of the ARDS Network trial [36] may not have been benign. The results of a further ARDS Network subanalysis suggesting that “intermediate tidal volumes and inspiratory pressures are not as safe as the lower tidal volumes and pressures used in [the ARDS Network trial] lower tidal volume group” [36] have not been confirmed by the current meta-analysis and appear to need formal trial demonstration. Although disagreeing in principle with the editorial response [11] to the Eichacker et al. [8] meta-analysis over the question of the integrity of a pooled effect estimate, the above results are supportive of the conclusion in the editorial that “high plateau pressures in the control arms likely contributed to the observed differences in mortality”.

Critique of methodology

The small number of trials considered in both the meta-analysis and the meta-regression and the consequent inability to effectively test for multivariate regression and non-linear relationships [37] are problematic. The varying methods of prescribing ventilator tidal volume (four different prescription units [12]) and measuring plateau pressure [11] may have led to a considerable degree of uncertainty in the summary estimates of both tidal volume and plateau pressure and in the statistical relationships described between these two variables and the treatment effect, as log OR (in particular the significance levels achieved relative to p=0.05). Furthermore, the relationships discerned may be associations across trials and have neither a causal interpretation nor reflect within-trial relationships [15]. This being said, the overall quantitative threshold findings of tidal volume and plateau pressures are consistent with clinical and experimental evidence [7].

Conclusions

The pooled estimate of treatment effect of in ALI/ARDS favours protective ventilation, but it failed to achieve statistical significance on random effects estimation. Early stopping in four of the five trials considered may have been a factor contributing to heterogeneity. In ALI and ARDS mechanical ventilation with low tidal volumes is not detrimental and may have advantage below threshold levels of 7.7 ml/kg-predicted. Further trials, with standardised prescription of ventilatory parameters, appear to be necessary to define optimal mechanical ventilation in ALI/ARDS.