Introduction

The use of noninvasive ventilation (NIV) in critically ill patients has dramatically increased [1] as it significantly reduces the work of breathing in patients with acute respiratory failure, thereby reducing the need for intubation [2, 3]. Although NIV is frequently used in patients with hypoxemic respiratory failure, its failure rate remains high (25–59%) [49], indicating that not all patients benefit from this treatment.

Previous studies have reported that patients who experience NIV failure have a higher heart rate, lower pH, lower Glasgow Coma Scale (GCS) score, lower oxygenation, and higher respiratory rate than those who experience successful NIV [5, 1016]. These variables can be used to predict NIV failure. However, the predictive power of NIV failure is low when based only on a single variable. Hypothesizing that a combination of these variables has to potential to increase the predictive power, we combined several variables that are easily obtained by simple bedside measurements in patients with hypoxemic respiratory failure to develop a scale for the prediction of NIV failure. We then explored how to use this scale to guide the clinical use of NIV.

Methods

This was a prospective observational study performed in the respiratory intensive care unit (ICU) of a teaching hospital (The First Affiliated Hospital of Chongqing Medical University, Chongqing, China). The ethics committee and institutional review board approved this study (NO. 22013). All patients who were admitted to the ICU for NIV due to hypoxemic respiratory failure were enrolled in the study, but patients were subsequently excluded due to: presence of do-not-intubate orders, presence of chronic obstructive pulmonary disease, requirement for emergency intubation, and NIV intolerance. NIV intolerance was defined as patient refusal for NIV because of discomfort [17]. Informed consent was obtained from patients or their family members.

The decision to initiate NIV (BiPAP Vision or V60; Philips Respironics, Carlsbad, CA) was made by the attending physicians based on the following criteria: clinical presentation of respiratory distress at rest (such as active contraction of the accessory inspiratory muscles or paradoxical abdominal motion), partial pressure of arterial oxygen (PaO2) of <60 mmHg or a PaO2/fraction of inspired oxygen (FiO2) ratio of <300 with supplemental oxygen.

The NIV was managed by attending physicians, respiratory therapists, and nurses according to previously published methods [18]. A face mask (ZS-MZA Face Mask; Shanghai Zhongshan Medical Technology Co., Shanghai, China) was the first choice for NIV treatment. Selection of the mask was based on the patient’s facial type. The straps of the mask were kept as tight as possible while remaining comfortable to the patient. Patients were placed in a semirecumbent position to avoid aspiration, assuming there was no contraindication to this position. The positive-end expiratory pressure was maintained at 4–8 cmH2O. Inspiratory pressure was initially set at 10 cmH2O (above zero) and then increased in increments of 2 cmH2O to achieve the best control of dyspnea and tolerance of the patient. If a patient did not tolerate 10 cmH2O of inspiratory pressure, the latter was allowed to decrease to 8 even further to 6 cmH2O. The fractional concentration of oxygen was set to achieve peripheral oxygen saturation of >92%. At the beginning of treatment, continuous use of NIV was encouraged. Once the patient recovered from respiratory failure, NIV was used intermittently until the patient could be completely weaned from it.

NIV failure was defined as requirement of intubation after NIV intervention based on the following criteria: respiratory or cardiac arrest, failure to maintain a PaO2/FiO2 of >100, development of conditions necessitating intubation to protect the airway (coma or seizure disorders) or to manage copious tracheal secretions, inability to correct dyspnea, lack of improvement of signs of respiratory muscle fatigue, and hemodynamic instability without response to fluids and vasoactive agents [12, 13].

The main objective was to develop and validate a scale to predict NIV failure in patients with hypoxemic respiratory failure. The secondary objective was to report outcomes in high-risk NIV failure patients who underwent intubation at different time points.

Statistical analysis

Data were analyzed using statistical software (SPSS 17.0; IBM Corp., Armonk, NY). Data are reported as the mean and standard deviation (SD) or as the median and interquartile range, as appropriate. Normally distributed continuous variables were analyzed using the unpaired Student’s t test. Non-normally distributed continuous variables were analyzed using the Mann–Whitney U test. Categorical variables were analyzed using the Chi-squared test or Fisher’s exact test. The ability to predict NIV failure was determined using the area under the receiver operating characteristic curve (AUC). A p value of <0.05 was considered to be statistically significant.

We developed the risk model as follows. First we used univariate analysis to identify variables associated with NIV failure collected at 1 h of NIV in test cohort. Second, variables with a p value of <0.2 in the univariate analysis were entered in a stepwise multivariate logistic regression analysis to identify independent risk factors associated with NIV failure. The probability of stepwise was 0.05 for entry and 0.1 for removal. We then obtained a regression model. We evaluated the final model for goodness of fit using the Hosmer–Lemeshow test (p > 0.05). Third, we used the method suggested by Sullivan et al. to create the risk scale [19]. We classified the variables in the final model to clinically meaningful categories and recorded the midpoint in each category. For each variable, we set a category with the lowest risk for NIV failure as the within-group reference and assigned zero points; we then calculated the weight in each category (the difference between reference category multiplied by the β regression coefficient per unit increase). Finally, we assigned 1 point to the category with the lowest weight and set this weight as the between-groups reference. The value of that weight in the other category divided by the between-groups reference was then calculated and this value rounded off to the nearest integer as the assigned points. The risk scale for NIV failure was the sum of the points. We performed 1000 bootstrap samples to estimate the odds ratio (OR) and 95% confidence interval (CI) of NIV failure per 1-point increment for internal validation. The cutoff value was determined based on the positive likelihood ratio of NIV failure of >5.

The sample size was calculated by Buderer’s formula [20]. As there was no standard method to diagnose NIV failure, we could not obtain the known sensitivity and specificity. Based on clinical experience, we estimated that the risk scale for NIV failure reached 70% of sensitivity and 90% of specificity. The average prevalence of NIV failure was 43.5% in previous studies [48]. We chose the α = 0.05 and maximum marginal error of estimate = 5%. Thus, a minimal sample size of 742 cases was required.

Results

We enrolled 449 patients in the test cohort from June 2011 to June 2014, and another 358 patients in the validation cohort from July 2014 to June 2016 (Table 1). Patients comprising the test cohort had higher APACHE II (Acute Physiology and Chronic Health Evaluation II) scores at start of NIV treatment that did the patients in the validation cohort (mean ± SD: 18 ± 5 vs. 16 ± 5; p < 0.01). This difference resulted in a higher NIV failure rate in the test cohort (47.8 vs. 39.4%; p = 0.02). Taking both cohorts into consideration, at 1, 12, 24 and 48 h after initiation of NIV, 807, 667, 555 and 438 patients, respectively, were still receiving NIV.

Table 1 Demographics of patients with noninvasive ventilation failure and success

At 1 h of NIV, multivariable logistic regression analysis showed that heart rate, acidosis (assessed by pH), consciousness (assessed by GCS), oxygenation, and respiratory rate were independent risk factors for NIV failure in the test cohort [Electronic Supplementary Material (ESM) Table 1]. We used these five variables to develop a risk scale, which we named HACOR, to predict NIV failure. The categories, weights, and assignment of points in each variable are summarized in ESM Table 2. The HACOR scale ranges from 0 to 25 points. At 1 h of NIV, bootstrap analysis showed that the OR of NIV failure was 1.73 (95% CI 1.58–1.95) per 1-point increment in test cohort.

A summary of the NIV failure rate in relation to different ranges of HACOR scores is show in Fig. 1. In both the test and validation cohort, patients with NIV failure had a higher HACOR score at NIV initiation and at 1, 12, 24 and 48 h of NIV than those with NIV success (Table 2; Fig. 2). Before intubation, the highest value of the HACOR score was reached in patients with NIV failure. Further, from the initiation of NIV treatment to 48 h of NIV the HACOR score barely improved in the patients with NIV failure, but it dramatically improved in those patients with NIV success.

Fig. 1
figure 1

Noninvasive ventilation (NIV) failure rate in patients with different HACOR (heart rate, acidosis, consciousness, oxygenation, and respiratory rate) scores at 1, 12, 24, and 48 h of NIV

Table 2 The HACOR score at different time points
Fig. 2
figure 2

The HACOR score in patients with NIV failure and success from initiation to 48 h of NIV

The predictive power of NIV failure diagnosed by HACOR score is summarized in Tables 3 and 4 and ESM Figs. 1 and 2. The AUC was 0.88 and 0.91 in the test and validation cohorts, respectively, at 1 h of NIV, which shows the good diagnostic power of the HACOR score for NIV failure. Using a HACOR score of 5 as the cutoff value, the diagnostic accuracy for NIV failure assessed at 1 h of NIV was 81.8 and 86.0% in the test and validation cohorts, respectively. In subgroups classified by diagnosis, age, or disease severity, the diagnostic accuracy also exceeded 80% at 1 h of NIV; furthermore, the diagnostic accuracy also exceeded 80% when the HACOR score was assessed at 1, 12, 24 or 48 h of NIV (Table 4).

Table 3 Predictive power of noninvasive ventilation failure diagnosed by the HACOR score assessed at 1 h of NIV
Table 4 Predictive power of noninvasive ventilation failure diagnosed by the HACOR score assessed at 1 h, 12 h, 24 h and 48 h of NIV

There were 505 and 302 patients with a HACOR score of ≤5 and >5 at 1 h of NIV, respectively. Among those with a HACOR score of ≤5, the NIV failure rate was 18.4% and hospital mortality was 21.6%. In comparison, among patients with a HACOR score of >5, the NIV failure rate was 87.1% and hospital mortality was 65.2%. Among the NIV failure patients, 88 patients were intubated within 12 h of NIV initiation (early intubation) and 175 patients were intubated after 12 h of NIV initiation (late intubation) (Table 5). Patients who were intubated early had a higher HACOR score at NIV initiation and 1 h of NIV than those who were intubated late, but the earlier had lower hospital mortality [58/88 (66%) vs. 138/175 (79%); p = 0.03]. Compared with late intubation, the crude OR of early intubation for death in hospital was 0.52 (95% CI 0.29–0.92). We adjusted the OR by age, APACHE II score, diagnosis and HACOR scores and still found that early intubation was a protective factor for death in hospital (adjusted OR 0.52, 95% CI 0.27–0.99).

Table 5 Early versus late intubation in patients with a HACOR score of >5 at 1 h of noninvasive ventilation

Discussion

We developed a novel scale, called HACOR, for prediction of NIV failure in patients with hypoxemic respiratory failure. This scale takes into account heart rate, acidosis, consciousness, oxygenation, and respiratory rate, variables which are easily obtained by simple bedside measurements. Thus, the HACOR scale is a rapid and convenient tool to assess and predict NIV failure. We also showed that a HACOR score of 5 as cutoff value has good diagnostic accuracy for NIV failure even when the scale was assessed in different subgroups classified by diagnosis, age, or disease severity, or at different time points.

In previous studies the average rate of NIV failure was reported to be 43.5% (range 25–59%) in patients with hypoxemic respiratory failure [48]. In our study, the failure rate was 47.9 and 39.4% in the test and validation cohorts, respectively. These failure rates in our study thus fall within the range of those reported previously, indicating that the results of our study can be extrapolated to other studies.

Patients with more severe illness are more likely to experience NIV failure [5, 714]. In our study, the APACHE II score in the test cohort was much higher than that in the validation cohort, resulting in higher NIV failure rate in the test cohort. Although the NIV failure rate was different in the two cohorts, the HACOR score showed good distinguishing power for NIV failure in both cohorts. Further, in the subgroups classified by APACHE II score, the HACOR score also showed good distinguishing power for NIV failure. These results indicate that the HACOR score can be used in patients with different disease severity.

The power of a single variable to predict NIV failure is low [5, 7, 11, 12]. To the best of our knowledge, our study is the first to report and assess a scale based on multiple variables to predict NIV failure in hypoxemic patients. We used five variables to develop this scale to predict NIV failure in 449 patients and validated it in another 358 patients. We found that the diagnostic accuracy of NIV failure was good in both the test and validation cohorts. We therefore conclude that this scale is a good tool to help clinical practitioners manage NIV in patients with hypoxemic respiratory failure.

Patients with ARDS or cancer are reported to have a high NIV failure rate [13, 21], while those with heart failure are reported to have a very low NIV failure rate [22]. Our study confirmed these results, although in contrast to previous studies, we used the HACOR score to predict NIV failure. Although the NIV failure rate was quite different between patients who had different diagnoses, the HACOR scale achieved a good power to distinguish NIV failure in each subgroup. The NIV failure rate also differs greatly according to patient age [23]. The HACOR scale also showed good distinguishing power for NIV failure in young, old, or very old patients.

Identifying patients who respond well or badly to NIV is also important. Previous studies have reported that clinical conditions improved in responders but only slightly improved or did not improve in nonresponders [5, 12, 13, 24, 25]. Similar outcomes were also found in our study. The HACOR score was lower in patients who successfully underwent NIV, but it did not improve in patients with NIV failure. Therefore, we suggest that the HACOR score can also be used to assess the efficacy of NIV.

Intubation for invasive mechanical ventilation is associated with many complications, such as diaphragmatic weakness and ventilator-associated pneumonia [26, 27]. Thus, intubation should be avoided if at all possible; however, delayed intubation increases mortality in patients for whom intubation is indicated [28, 29]. Therefore, it is important to be able to identify those patients who require intubation and those who do not. The results of our study demonstrate that a HACOR score of 5 as cutoff value had a good distinguishing power for NIV failure. At 1 h of NIV, 87.1% of patients with a HACOR score of >5 required intubation and 81.6% of patients with HACOR score ≤5 did not require intubation. These values indicate that the risk of NIV failure was high in patients with a HACOR score of >5. Further, the high-risk patients who received early intubation had lower hospital mortality than those who received late intubation. Thus, the HACOR score can also be used to assess the need for intubation. Earlier intubation may benefit patients at high risk for NIV failure.

There are a number of limitations to our study. First, as this study was performed in a respiratory ICU, some patients enrolled in the study were not admitted to our unit, such as those with postoperative respiratory failure and trauma. Therefore, caution is advised when the HACOR scale is used to assess the efficacy of NIV in other patient groups. Second, the sample size was small in some subgroups, such as pulmonary embolism and heart failure and, consequently, the efficacy of the HACOR scale in these patients may be skewed. In future studies the sample size should larger to improve the diagnostic power. Third, the result that early intubation may reduce mortality is a secondary analysis. As our study was an observational study, this result should be investigated further in randomized controlled trials.

In conclusion, we found that the HACOR scale was able to effectively predict NIV failure in patients with hypoxemic respiratory failure. A higher score indicates a higher chance of NIV failure. Because the scale consists of variables that are easily obtained at bedside, it can be used conveniently to assess the efficacy of NIV. Patients with a HACOR score of >5 had a very high risk of NIV failure. In these high-risk patients, early intubation may reduce hospital mortality.