FormalPara Take-home message

Mechanical power normalized to body weight (norMP) showed significantly greater diagnostic performance in predicting mortality than absolute value of mechanical power, tidal volume, plateau pressure, and driving pressure.

The predictive performance of norMP cannot be further improved by using sophisticated machine learning techniques.

The impact of norMP on mortality outcome was dependent on the severity of ARDS. While the norMP was not significantly associated with mortality outcome in patients with mild ARDS, it was associated with increased risk of mortality in moderate and severe ARDS.

norMP is a potential biomarker for monitoring ventilator-induced lung injury.

Introduction

Mechanical ventilation is strongly recommended for ARDS patients to avert life-threatening hypoxia and hypercapnia in respiratory failure; however, it is also associated with ventilator-induced lung injury (VILI) [1, 2]. There was experimental evidence showing that VILI was influenced by every aspect of ventilator settings. For example, the severity of VILI was demonstrated to be dependent on respiratory rate at a given level of tidal volume [3]. Thus, we needed to monitor a number of key parameters to ensure that mechanical ventilation do not lead to VILI, such as low tidal volume, high positive end-expiratory pressure (PEEP), limited plateau pressure and driving pressure [4,5,6,7]. Mechanical power (MP), as calculated by the combination of tidal volume, PEEP, plateau pressure, peak inspiratory pressure (PIP), and respiratory rate, was proposed to better capture the total energy delivered to the lung parenchyma [8, 9]. Since MP integrated many aspects of mechanical ventilation, it is theoretically superior to each of the individual ventilator variables. There was empirical evidence showing that MP was able to predict risk of mortality in mechanically ventilated patients [9]. However, we are not sure whether MP can better stratify risk than other individual ventilator variables. The first aim of the study is to empirically compare the discrimination power of using MP in predicting ARDS mortality versus using other individual ventilator variables. We hypothesized that MP-based prediction can be significantly better than other ventilator variables.

Furthermore, we know that the effect of MP might be influenced by the functional lung size. In other words, the discrimination power of MP normalized to lung size would theoretically be improved as compared with the absolute MP value [10]. This is in line with the fact that tidal volume when normalized to predicted body weight (PBW) showed greater accuracy than the absolute value [11]. Our second goal in this paper was to investigate whether the discrimination power of MP normalized to PBW was better than the absolute value of MP. Additionally, the functional lung size is not always proportional to the PBW in ARDS patients. For example, the severely injured lung might have smaller functional lung size regardless of the body height of the patient. Thus, we needed to take the severity of ARDS into the consideration when we examined the association of MP with mortality. Our third hypothesis was that the MP is more harmful in severe ARDS (e.g., severe ARDS has smaller functional lung size than mild ARDS) than mild ones.

MP is a ventilator parameter with concrete meaning in physics. However, the mathematical equation for computing MP may not fully capture the predictive information from ventilator variables. Thus, we intended to explore whether the predictive power could be further improved with the same ventilator variables by using the gradient boosting machine (GBM). GBM has been shown to be able to approximate nearly any functional form by increasing the number of boosting iterations [12, 13]. If the GBM could not further improve the predictive power, MP would be considered as the near-optimal ventilator variable to monitor potential VILI. Many modern machine learning algorithms can identify high-order interactions among variables, while minimizing the risk of overfitting.

Methods

Data source

The study included ARDS patient data from eight randomized controlled trials (RCTs) conducted by the ARDSNet [5, 7, 14,15,16,17,18,19,20,21]. The LaSRS trial was excluded from analysis because it enrolled patients with stable or worsening ARDS for 7–28 days and was simply not comparable to all the other early ARDS intervention trials. All patients included for these trials were analyzed. All individual RCTs were approved by the ethics committee of participating centers and informed consent was obtained. The data were available in the Biologic Specimen and Data Repository Information Coordinating Center (https://biolincc.nhlbi.nih.gov/). The secondary analyses of the data were approved by the ethics committee of the Sir Run Run Shaw Hospital.

Variables

All crude data used in our study were extracted from the original trials. Patient demographics such as Acute Physiology and Chronic Health Evaluation (APACHE) III, age, gender, type of ICU, and admission source were extracted from the case report form. Baseline (day 0) ventilator parameters were defined as the most recent values prior to randomization. Corrected inspired tidal volume (\( V_{\text{T}} \)) as recorded in original trials was used for current analysis. Positive end-expiratory pressure (PEEP) was the external or applied PEEP, not the total PEEP, auto-PEEP, or intrinsic PEEP. The plateau pressure (\( P_{\text{plateau}} \)) was measured with a 0.5-s inspiratory pause. Peak inspiratory pressure (PIP) should be obtained while the patient is relaxed, not coughing or moving in bed. Mechanical power was computed by the following equation [8, 9]:

$$ {\text{MP}}\,({\text{J/min}}) = 0.098 \times V_{\text{T}} \times {\text{RR}} \times ({\text{PIP}} - \Delta P \times 0.5), $$

where the driving pressure \( \Delta P = P_{\text{plateau}} - {\text{PEEP}} \).

$$ {\text{norMP}}\,( \times 10^{ - 3} \,{\text{J/min/kg}}) = {\text{MP}}/{\text{PBW}}, $$

where PBW was the predicted body weight measured in kilograms.

Lung compliance was computed by the following equation:

$$ {\text{Compliance}} = V_{\text{T}} /(P_{\text{plateau}} - {\text{PEEP}}). $$

Then MP could be normalized by the compliance:

$$ {\text{MP}}\,{\rm{normalized}}\,{\text{to}}\,{\rm{compliance}} = \frac{\text{MP}}{\rm{Compliance}}. $$

Missing data [electronic supplemental material (ESM) Fig. 1] were handled with a multiple imputation method [22].

Clinical outcome

The primary outcome of the study was the mortality rate at 90 days. A patient is defined as alive if he/she is at home with unassisted breathing at any time up through day 90. Home is defined as the place the patient lived prior to study hospital admission. A patient was defined as expired if he/she died before being discharged from the hospital and staying at home with unassisted breathing or died prior to achieving unassisted breathing at home for 48 h.

Comparing ventilator parameters

Each of the ventilator variables including \( V_{\text{T}} \), \( V_{\text{T}} \) normalized to predicted body weight, respiratory rate, PIP, \( \Delta P \), MP, \( P_{\text{plateau}} \), PEEP, norMP, and MP normalized to compliance were entered into logistic regression models, resulting in a total of ten models being trained. The whole sample was split 3:1 into training and testing subsamples. The discrimination of each model was calculated in the testing subsample using the area under receiver operating characteristic curve (AUROC). The difference of AUROC was tested by using Delong’s method [23].

Gradient boosting machine

GBM is able to produce a prediction model in the form of an ensemble of weak prediction models (decision trees in our study). It builds the model in a stage-wise fashion and it generalizes them by allowing optimization of an arbitrary differentiable loss function. The feature space of the GBM comprised ventilator variables used for computing MP including \( V_{\text{T}} \), \( \Delta P \), \( P_{\text{plateau}} \), PEEP, PIP, and respiratory rate. The advantage of GBM is that it can approximate any functional form between mortality outcome and ventilator variables. Again, the GBM was trained on the training subsample and tested on the testing subsample. Hyperparameter tuning was implemented with the tenfold cross-validation method. A tuning parameter grid was used by varying the number of trees (number of boosting iterations) from 10 to 300 and the maximum depth of each tree from 1, 3, 5, and 7. The learning rate or step-size reduction was set to 0.1. The minimum number of observations in the terminal nodes of the trees was 20. Gradient boosting was performed by using the caret package (v6.0-81) in R.

Effect of MP on mortality by ARDS severity

The interaction between MP and ARDS severity was explored by a multivariable logistic regression model, adjusting for potential confounders such as age, gender, APACHE III, admission source, and ICU type. ARDS severity was described by using the Berlin definition, where the mild, moderate, and severe ARDS were categorized at the cutoff P/F ratio of 100 and 200 mmHg [24]. However, we could not make sure that every criterion of the Berlin definition was met. We further investigated the association of norMP with risk of worsening in patients with mild ARDS at day 0. Worsening ARDS was defined as when moderate or severe ARDS criteria were met after day 2 [25].

All statistical analyses mentioned above were performed using RStudio (Version 1.1.463).

Results

Characteristics of included trials

A total of 5159 patients with acute onset ARDS were included for analysis. Characteristics of patients and ventilator variables across the ARDSNet trials are shown in ESM Table 1. The FACCT study (n = 1000) had the largest sample size and the OMEGA trial (n = 272) included the smallest number of patients. While patients in the ALVEOLI trial had the highest APACHE III score (96.7 ± 30.3), the ARDSnet1 had the lowest APACHE III score (85.1 ± 24.8). On day 0, tidal volume was the smallest in the OMEGA trial (430.5 ± 90.1 mL), and was the highest in the ARDSnet1 trial (669.6 ± 120.2 mL). The MP was also the highest in the ARDSnet1 trial [29.6 (21.3, 41.1) J/min], and was the lowest in SAILS trial [19.2 (13.6, 26.3) J/min]. The SAILS (347.7 ± 179.5 × \( 10^{ - 3} \) J/min/kg) and OMEGA (352.3 ± 190.3 × \( 10^{ - 3} \) J/min/kg) trials reported the lowest norMP versus the ARDSnet1 trial (512.9 ± 264.8 × \( 10^{ - 3} \) J/min/kg).

Predictive performance of ventilator variables

Each of the ventilator variables was regressed on mortality in the training set and AUROC of each model was computed in the testing set (Table 1). The norMP showed the highest AUROC among all ventilator variables. The discrimination of norMP was significantly better than the absolute MP (p = 0.011 for DeLong’s test). The gradient boosting machine was not able to improve the discrimination as compared to the MP (p = 0.973 for DeLong’s test) or norMP (p = 0.913 for DeLong’s test). MP normalized to compliance was among the variables with the highest predictive discrimination (AUC 0.753; 95% CI 0.722–0.783), and it was not significantly different from the norMP (p = 0.659).

Table 1 Discrimination of ventilator parameters measured on day 0 in predicting mortality outcome

Sensitivity analysis by excluding patients with pressure control ventilation showed that norMP or MP normalized to compliance had higher discrimination than other ventilator variables such as tidal volume and driving pressure. The GBM was not able to significantly increase the discrimination (ESM Table 2).

Multivariable regression model including an interaction term between norMP and ARDS severity

A multivariable regression model including interaction between norMP and ARDS severity was trained on the whole data set (Table 2). The results showed that there was significant interaction between norMP and ARDS. While the norMP was not significantly associated with mortality outcome (OR 0.99; 95% CI 0.91–1.07; p = 0.862) in patients with mild ARDS, it was associated with increased risk of mortality in moderate (OR 1.11; 95% CI 1.02–1.23; p = 0.021) and severe (OR 1.13; 95% CI 1.03–1.24; p = 0.008) ARDS (Fig. 1). The results indicated that the negative impact of higher norMP was dependent on the severity of ARDS. By restricting our analysis to mild ARDS at day 0, norMP was significantly associated with increased risk of subsequent worsening (OR for each 100-unit increase in × \( 10^{ - 3} \) J/min/kg, 1.09; 95% CI 1.02–1.17; p = 0.019; Table 3).

Table 2 Multivariable regression model with interaction between ARDS severity and mechanical power
Fig. 1
figure 1

Interaction between norMP and ARDS severity. While the norMP was not significantly associated with mortality outcome (OR 0.99; 95% CI 0.91–1.07; p = 0.862) in patients with mild ARDS, it was associated with increased risk of mortality in moderate (OR 1.11; 95% CI 1.02–1.23; p = 0.021) and severe (OR 1.13; 95% CI 1.03–1.24; p = 0.008) ARDS. norMP mechanical power normalized to predicted body weight, ARDS acute respiratory distress syndrome, OR odds ratio

Table 3 Multivariable logistic regression model investigating independent risk factors for worsening ARDS

Gradient boosting machine

A tree-based gradient boosting machine was trained on the same ventilator variables used for computing norMP. The training process showed that the prediction accuracy was not significantly improved with increasing tree depth and the number of boosting iterations (Fig. 2). The AUROC of the best GBM model was not significantly different from the norMP (0.748; 95% CI 0.717–0.779 versus 0.750; 95% CI 0.720–0781; p = 0.973 for DeLong’s test).

Fig. 2
figure 2

The training process of the gradient boosting machine. The four lines represent trees with different complexities as represented by the maximum tree depth. It appears that a maximum depth of 9 (the maximum number of edges from the node to the tree’s root node) can give the highest accuracy. However, the accuracy reaches a plateau at 50 boosting iterations

Discussion

The results of our study provide evidence to support our hypotheses. First, among all ventilator variables, norMP displayed a significantly higher AUROC value than any other individual variables including MP. Second, the effect of norMP was dependent on the severity of ARDS. Although norMP had no significant effect on mild ARDS patients with respect to the mortality outcome, norMP was associated with significantly increased risk of worsening of ARDS. This result supports the hypothesis that the effect of MP on VILI was dependent on the functional lung size. However, we must acknowledge that ARDS severity defined in the Berlin definition was not without critics and functional lung size and P/F ratio may not correlate well in some cases [26]. Third, our study employed GBM to explore whether there was additional information that can be extracted from simple ventilator variables to predict mortality. The result showed that GBM was not better than norMP in predicting mortality outcome, suggesting that norMP had fully captured predictive information from ventilator variables.

Individual ventilator variables have been widely investigated in previous studies. Early studies showed that low tidal volume ventilation is not only beneficial for ARDS patients but also for those with healthy lungs [5, 27,28,29]. In fact, low tidal volume is a component of protective ventilation and other components include PEEP, plateau pressure, respiratory rate, and peak inspiratory pressure [6, 7, 30,31,32]. Driving pressure (\( \Delta P \)), which is computed by subtracting PEEP from plateau pressure, has received more attention in recent years because it considers lung compliance. There is a large body of evidence showing that lower driving pressure is associated with improved outcome in both injured and non-injured lungs [6, 10, 33, 34]. In our study, we found that the discrimination of driving pressure in predicting mortality was slightly lower than that of norMP. This result suggests that there are residuals not captured by the \( \Delta P \). For example, the respiratory rate is not considered in \( \Delta P, \) but it is an important determinant of VILI [3, 35]. The PIP is related to the inspiratory flow, which may contribute to the locally intensified concentration of stress, a problem influenced by viscoelastic tissue properties. There is evidence that for a given plateau pressure, the rate at which the volume is given plays an important role in the genesis of VILI [36, 37]. Again, the driving pressure did not consider this component.

In contrast to conventional ventilator variables such as tidal volume, plateau pressure, driving pressure, and PEEP, MP consists of all aspects of mechanical ventilation with the assumption that VILI can be best predicted by the MP imposed on lung parenchyma. In previous studies, MP has been investigated for its association with mortality and the result found that MP was a strong predictor of in-hospital mortality (OR for each 5 J/min increase in MP, 1.06; 95% CI 1.01–1.11) [9]. More importantly, the result was robust even in patients receiving low tidal volume, indicating that tidal volume cannot fully explain the mortality outcome, and the residual can be captured by MP. However, that study did not directly compare the MP with other ventilator variables. Our study moved one step forward by providing evidence suggesting that the MP was better than other individual ventilator variables, and the discrimination power can be further enhanced by normalizing MP to the PBW or compliance.

Several limitations must be acknowledged. The first limitation came from clinical measurement on VILI. In routine clinical practice, we cannot directly quantify the occurrence of VILI in patients with ARDS. We used mortality rate in this study because VILI was closely linked to the mortality rate. The difficulty in quantifying the functional lung size was the second limitation of this study. At present, we can only indirectly describe the functional lung size by the PBW and ARDS severity. In the quantification of the severity of ARDS, the degree of hypoxemia is found to be predictive of mortality, while other candidate variables such as radiographic severity, respiratory system compliance (≤ 40 mL/cmH2O), PEEP (≥ 10 cmH2O), and corrected expired volume per minute (≥ 10 L/min) were not associated with mortality [24]. Thus, our study employed the degree of hypoxemia to describe the homogeneity of the injured lungs. The third limitation was that we employed ventilator variables on day 0 in ARDSNet trials for predicting the mortality. In effect, the subsequent ventilator settings can have a significant impact on mortality outcome. However, ventilator variables on day 0 can better reflect real clinical practice and provide a wide distribution of each variable in the feature space. Further studies considering temporal changes of ventilator settings may be useful to confirm the value of norMP. Finally, the study was conducted in patients with injured lungs, and it is not known whether the result can be generalized to patients with healthy lungs but requiring mechanical ventilation. Therapeutic interventions in original trials may be confounding factors in our analysis. However, all trials except for the ARDSnet1 reported a neural effect of the intervention versus the control [38]. Since the ARDSnet1 trial compared low versus conventional tidal volume, and the comparators had been included in our analysis, the confounding induced by therapeutic interventions in the ARDSnet1 trial can be minimized.

In conclusion, we proved that norMP was a good ventilator variable in predicting mortality in ARDS patients. The predictive discrimination of norMP cannot be further improved with a sophisticated machine learning method, which means that norMP would be considered as the near-optimal ventilator variable to monitor potential VILI. Further studies are needed to investigate whether adjusting ventilator variables according to norMP will significantly improve clinical outcomes.