Introduction

Noninvasive ventilation (NIV) reduces inspiratory muscle effort and improves oxygenation in hypoxemic patients with acute respiratory failure [1]. As it offers several major advantages over invasive ventilation (e.g., preserving the ability to swallow, cough, and communicate verbally), NIV is widely used to avoid intubation [2]. However, the rate of NIV failure is 40–54% in hypoxemic patients [3,4,5,6]. Moreover, NIV failure is associated with increased mortality [7, 8]. Among patients who experience NIV failure, late failure further increases mortality [5, 9]. Therefore, early identification of patients at high risk for NIV failure and early application of invasive ventilation may reduce mortality.

Our team previously developed a scale that produces the HACOR score, which takes into account heart rate, acidosis, consciousness, oxygenation, and respiratory rate (Additional file 1: Table 1), to predict NIV failure in patients with hypoxemic respiratory failure [5]. This scale was developed based on data from a respiratory intensive care unit (ICU). Although it has high predictive power for NIV failure, extensive use of the scale may be limited by the fact that the majority of respiratory failure results from respiratory etiology. Furthermore, baseline data such as the presence of acute respiratory distress syndrome (ARDS), septic shock, immunosuppression, organ failure, and so on are also associated with NIV failure [6, 7, 10, 11]. Because these baseline data may improve the predictive power of the score, we aimed to incorporate them into the HACOR score to improve its predictive power for NIV failure in patients with hypoxemic respiratory failure.

Methods

This multicenter prospective observational study was performed in 17 hospitals in China from September 2017 to September 2021 and one hospital in Turkey from November 2018 to August 2020. The study protocol was approved by the ethics committee of the First Affiliated Hospital of Chongqing Medical University (No. 2016150) and the ethics committee of Istanbul University Cerrahpasa (No. 88295). Informed consent was obtained from patients or their family members.

Patients admitted to the ICU for NIV due to hypoxemic respiratory failure were enrolled. However, patients who were younger than 16 years old, who experienced hypercapnic respiratory failure, who required emergency intubation, who underwent the use of NIV after extubation, who received NIV after accidental extubation, and who received NIV because of acute exacerbation of chronic obstructive pulmonary disease were excluded. Patients who received NIV because of high-flow nasal cannula failure or had undergone NIV more than 2 h before being admitted to the participating center were also excluded. Emergency intubation means that intubation was required immediately because the patient was in respiratory or cardiac arrest, was experiencing respiratory pauses with loss of consciousness, or was gasping for air.

All patients who used NIV were managed by attending physicians, respiratory therapists, and nurses in charge based on current guidelines, consensus, and previously published methods [5, 12,13,14,15]. The indications for NIV were as follows: (1) respiratory rate > 25 breaths/min, (2) clinical presentation of respiratory distress at rest (such as active contraction of the accessory inspiratory muscles or paradoxical abdominal motion), or (3) PaO2 < 60 mmHg at room air or PaO2/FiO2 < 300 mmHg with supplemental oxygen. If supplemental oxygen was used, FiO2 was estimated as follows: FiO2 (%) = 21 + 4 × flow (L/min) [16, 17]. However, the use of NIV was at the physician’s discretion. Continuous positive airway pressure (CPAP) or bilevel positive pressure ventilation was used to relieve patients’ dyspnea. Parameters were increased gradually based on patients’ tolerance. CPAP, expiratory positive airway pressure, or positive end expiratory pressure was usually maintained between 4 and 10 cmH2O. Inspiratory pressure was maintained between 10 and 20 cmH2O. The fractional concentration of oxygen was set to achieve peripheral oxygen saturation greater than 92%. In addition, appropriate strategies were used to improve NIV tolerance, such as controlling leakage, keeping the anchoring system as comfortable as possible, providing adequate humidification, alternatively using different interfaces, and administering sedation [18].

We encouraged patients to use NIV as long as possible initially. If their respiratory distress was relieved and oxygenation improved, NIV was used intermittently until patients could be completely liberated. If respiratory failure progressively deteriorated, intubation for invasive mechanical ventilation was performed. The major criteria for intubation were as follows: respiratory or cardiac arrest, PaO2/FiO2 < 100 mmHg after NIV intervention, the development of conditions necessitating intubation to protect the airway (coma or seizure disorders) or to manage copious tracheal secretions, and hemodynamic instability without response to fluids or vasoactive agents [5, 19]. Minor criteria were as follows: PaO2/FiO2 < 150 mmHg after NIV intervention, respiratory rate > 35 breaths/min, lack of improvement in respiratory muscle fatigue, and acidosis with pH < 7.35. Intubation was recommended if one major criterion or more than two minor criteria were reached. However, the decision to intubate was at the discretion of the attending physician. The need for intubation was defined as NIV failure [6].

We collected baseline data, vital signs, and arterial blood gas (ABG) from initiation to 24 h of NIV. Baseline data included ICU type, age, sex, reason for NIV, underlying disease, severity of disease (assessed by sequential organ failure assessment [SOFA] score), presence of COVID-19, presence of septic shock, and presence of ARDS. The presence of COVID-19 means that hypoxemic acute respiratory failure resulted from SARS-CoV-2 infection. Vital signs included consciousness (assessed by the Glasgow Coma Scale), heart rate, respiratory rate, systolic blood pressure, and diastolic blood pressure. The SOFA score was calculated before NIV. Urine output was obtained from medical records. If urine output was not available from medical records, it was estimated by the patient. Pneumonia was diagnosed based on current guidelines (i.e., a radiographic infiltrate that is new or progressive along with clinical findings suggesting infection, including the new onset of fever, purulent sputum, leukocytosis, shortness of breath, and a decline in oxygenation) [20, 21].

The aim of the current study was to update the HACOR score to predict NIV failure in hypoxemic patients. We used data collected in nine hospitals in Chongqing, China (N = 1451), to train the scale (training cohort). Data from another eight hospitals elsewhere in China and one hospital in Turkey (N = 728) were used to validate the scale (external validation cohort). The current reporting is based on transparent reporting of a multivariable prediction model for individual prognosis or diagnosis [22].

Statistical analysis

We used SPSS (version 25.0) and R (version 4.0.5) to analyze the data. Given an estimated NIV failure rate of 44% and estimated sensitivity and specificity of more than 70% (assuming the expected standard error of 5%), at least 734 patients were required to update the HACOR score for α = 0.05 [5]. Multiple imputations were performed to address missing data. The area under the receiver operating characteristic curve (AUC) was used to analyze the predictive power of NIV failure. A p value less than 0.05 was considered to be statistically significant.

The data from the training cohort were used to update the HACOR score. First, we selected variables via elastic net regularization, using logistic models and tenfold cross-validation, selecting the regularization parameter λ when binomial deviation was within one standard error of the minimum [23, 24]. Collinearity between continuous variables was identified if the absolute value of the correlation coefficients was > 0.7 [25]. The selected variables were used to develop a basic score for predicting NIV failure. Then, we combined this basic score and the original HACOR score to create the updated HACOR score. The final model for goodness of fit was tested using the Hosmer–Lemeshow test. Details of the development of the updated HACOR score in the training cohort can be seen in Additional file 1: Method 1. The predictive powers for NIV failure of the original and updated HACOR scores were compared with the Hanley and McNeil method [26]. For clinical reference, three cutoff values were selected for probabilities of NIV failure equal to 25%, 50%, and 75% [27]. Probabilities of NIV failure of less than 25%, 25–50%, 50–75%, and more than 75% were defined as low, moderate, high, and very high risk for NIV failure, respectively. According to the original HACOR study, patients with HACOR scores ≤ 5 and > 5 were defined as being at low and high risk for NIV failure, respectively [5].

Results

Demographic characteristics

The flow of patient screening is summarized in Additional file 1: Fig. 1. In the training cohort, 24 patients had missing data (0.2% for ABG before NIV, 1.4% for ABG after 1–2 h of NIV, and 0.07% for SOFA score). In the validation cohort, 18 patients had missing data (0.1% for ABG before NIV, 0.1% for heart rate before NIV, and 2.2% for ABG after 1–2 h of NIV). All missing data were interpolated by multiple imputations.

In the training cohort, 529 patients (36.5%) experienced NIV failure (Table 1). In the validation cohort, 328 patients (45.1%) experienced NIV failure. In both cohorts, about half of patients were from medical ICUs, one-third were from mixed ICUs, and the rest were from surgical ICUs.

Table 1 Demographic characteristics

In the training cohort, patients who experienced NIV failure were more likely to have septic shock, pneumonia, pulmonary ARDS, hypertension, chronic kidney disease, and immunosuppression compared to those who experienced successful NIV. However, they were less likely to have pancreatitis and cardiogenic pulmonary edema (CPE). These results were confirmed in the validation cohort.

Development of the updated HACOR score in the training cohort

Details of the development of the updated HACOR score are summarized in Additional file 1: Method 1. A diagnosis of pneumonia was a risk factor for NIV failure, and a diagnosis of CPE was a protective factor identified by elastic net logistic regression (Additional file 1: Figs. 2 and 3). The presence of pulmonary ARDS, immunosuppression, or septic shock and the SOFA score before NIV were risk factors for NIV failure. Thus, we updated the HACOR score to take these six pre-NIV variables into account (Additional file 1: Tables 2–4).

Therefore, the updated HACOR score is as follows: original HACOR score + 0.5 × SOFA + 2.5 if pneumonia is diagnosed – 4 if CPE is diagnosed + 3 if pulmonary ARDS is present + 1.5 if immunosuppression is present + 2.5 if septic shock is present. The p value for goodness of fit was 0.21 when the Hosmer–Lemeshow test was used. This indicates that the final model was properly fitted.

The predictive powers for NIV failure of the original and updated HACOR scores

In both cohorts, the AUCs for predicting NIV failure were higher when tested by the updated HACOR score than the original HACOR score from initiation to 24 h of NIV (all p values < 0.01; Table 2). The AUCs for predicting NIV failure were 0.85 (95% confidence interval 0.84–0.87) and 0.78 (0.75–0.81) tested by the updated HACOR score assessed after 1–2 h of NIV in the training and validation cohorts, respectively. The AUCs in the different subgroups are summarized in Table 3.

Table 2 AUCs for the HACOR and updated HACOR scores for predicting NIV failure
Table 3 AUCs for the updated HACOR score for predicting NIV failure in different subgroups

From initiation to 24 h of NIV, the updated HACOR score was much higher in patients who experienced NIV failure than those who experienced successful NIV (Fig. 1). The rate of NIV failure increased with an increase in the updated HACOR score, whether it was assessed before NIV or after 1–2, 12, or 24 h of NIV (Additional file 1: Figs. 4 and 5). In patients at low risk as assessed by the original HACOR score, the rate of NIV failure was greater than 50% if the updated HACOR score was more than 12 (Fig. 2). In contrast, in patients at high risk as assessed by the original HACOR score, the rate of NIV failure was low in most cases if the updated HACOR score was less than 8.

Fig. 1
figure 1

Updated HACOR scores of patients with successful NIV and NIV failure from initiation to 24 h of NIV. Data are means and standard deviations. *p < 0.01 for the comparison of patients with successful NIV versus NIV failure. H0 = before NIV, H1-2 = after 1–2 h of NIV, H12 = after 12 h of NIV, H24 = after 24 h of NIV, NIV = noninvasive ventilation, HACOR = heart rate, acidosis, consciousness, oxygenation, and respiratory rate

Fig. 2
figure 2

Rate of NIV failure within 24 h of NIV. A1, B1, and C1 indicate the rate of NIV failure in different subgroups classified by updated HACOR scores among patients with an original HACOR score ≤ 5. A2, B2, and C2 indicate the rate of NIV failure in different subgroups classified by updated HACOR scores among patients with an original HACOR score > 5. NIV = noninvasive ventilation, HACOR = heart rate, acidosis, consciousness, oxygenation, and respiratory rate

When 7, 10.5, and 14 points of updated HACOR score were selected as cutoff values, the probability of NIV failure was 25%, 50%, and 75%, respectively. The predictive power is reported in Table 4. Using the three cutoff values, we classified patients as being at low (≤ 7), moderate (7.5–10.5), high (11–14), and very high (> 14) risk for NIV failure. The cumulative incidence of NIV failure is summarized in Fig. 3. For all patients, the rate of NIV failure was 12.4%, 38.2%, 67.1%, and 83.7% among patients with a low, moderate, high, and very high probability of NIV failure, respectively.

Table 4 Predictive power for NIV failure of the updated HACOR score
Fig. 3
figure 3

Cumulative incidence of NIV failure in patients at low, moderate, high, and very high risk for NIV failure when the updated HACOR score is assessed after 1–2 h of NIV. Patients with updated HACOR scores of ≤ 7, 7.5–10.5, 11–14, and > 14, respectively, were classified as being at low, moderate, high, and very high risk for NIV failure. NIV = noninvasive ventilation, HACOR = heart rate, acidosis, consciousness, oxygenation, and respiratory rate

Discussion

The current study evaluates and confirms the power of an updated HACOR score that incorporates data on six baseline variables to predict NIV failure. The predictive power for NIV failure tested by the updated HACOR score was significantly improved compared to that of the original HACOR score. Three cutoff values indicating low, moderate, high, and very high probability of NIV failure were developed to aid clinical staff in decision-making.

The original HACOR score assessed heart rate, acidosis, consciousness, oxygenation, and respiratory rate [5]. The predictive power for NIV failure was high in the original study. However, the predictive power in the current study was not as good. In patients at low risk as assessed by the original HACOR score, the rate of NIV failure was high if the updated HACOR score was high. In contrast, in patients at high risk as assessed by the original HACOR score, the rate of NIV failure was low in most cases if the updated HACOR score was low. The reasons for this are as follows: The original HACOR score was developed based on vital signs and ABG results only. However, patients with different baseline data have different risks for NIV failure even when they have similar vital signs and ABG results. The presence of pulmonary ARDS, septic shock, immunosuppression, and organ failure at baseline are associated with NIV failure [6, 7, 10, 11, 28]. Moreover, patients with CPE have a very low rate of NIV failure, which acts as a protective factor against NIV failure [29, 30]. In this study, we incorporated these pre-NIV variables into the original HACOR score to update the score. That is why the predictive power was significantly improved.

In addition, the original HACOR score was developed and validated using data from a respiratory ICU. All patients had a respiratory etiology and were managed by respiratory physicians. However, the updated HACOR score was developed in nine hospitals and validated in another nine hospitals. The patients had different etiologies, came from different ICUs, and were managed by different physicians. Therefore, the patients and physicians in the updated HACOR study were more representative of the real world. This is another reason for the better predictive power of the updated HACOR score than the original one.

The use of NIV in patients with de novo acute respiratory failure, pneumonia, or ARDS is controversial because of the high risk for NIV failure [3, 4, 9, 30]. In some cases, the rate of NIV failure can reach 70% [31]. Guidelines contain no recommendations for using NIV with these patients [12]. In our study, the updated HACOR score had high predictive power for NIV failure in these patients whether it was assessed after 1–2, 12, or 24 h of NIV. A higher updated HACOR score indicates a higher risk for NIV failure. Therefore, the updated HACOR score provides an important reference point for clinical staff managing NIV. In patients at high risk for NIV failure identified by the updated HACOR score, early intubation can be considered.

A good risk scoring system can help clinical staff manage their patients. In our study, the updated HACOR score assessed after 1–2 h of NIV had high predictive power for NIV failure. It can identify patients who are more likely to experience intubation in the future. As late NIV failure is associated with increased mortality [5, 9], close monitoring, more staffing, and better device supply may benefit high-risk patients.

In routine clinical work, physicians may partly refer to the values of PaO2/FiO2 ratio, respiratory rate, and pH before they decided intubation. However, how to combine these variables and other risk factors together to make more suitable decision-making is difficult. One may consider more in PaO2/FiO2 ratio, and the other may consider more in respiratory rate. In other words, the weight is different for different physicians. The main contribution of the updated HACOR score is to calculate the weight in each variable. It took into account the major risk factors and quantitatively calculated the weight in each variable. Although the updated HACOR score may be explained by self-fulfilling prophecy in part, it provides more useful information for physicians to make reasonable decision.

This study has several limitations. First, we enrolled only 47 patients with COVID-19 in this study. This small subsample size may diminish the predictive power for NIV failure. Clinicians should be cautious when assessing the updated HACOR score in COVID-19 patients. Second, although we suggested intubation criteria, the decision to intubate was at the discretion of the attending physician. However, this reflects true conditions in the real world and thus may partly improve the generalizability of the updated HACOR score. Third, the benefit of an updated HACOR score is unclear, as the current study was observational in design. The effects of an updated HACOR score should be strictly demonstrated in randomized controlled trials. Fourth, we did not predefine the interface or a sedation plan. These issues were determined by the attending physicians in charge. Whether the interface or sedation plan is associated with NIV failure is unclear. Fifth, the expired tidal volume (standardization to predicted body weight) is associated with NIV failure [4]. In our study, most of the ventilators have used single-limb circuit. It is unable to measure expired tidal volume. In addition, the predicted body weight is unavailable because we did not record the patient’s height. So, we did not include the tidal volume in the model.

Conclusions

The updated HACOR score, which combines data on six baseline variables and the five original scale items, has significantly improved predictive power for NIV failure compared to the original HACOR score. A higher score indicates a higher risk for NIV failure. Patients with updated HACOR scores of ≤ 7, 7.5–10.5, 11–14, and > 14, respectively, were classified as having a low, moderate, high, and very high probability of NIV failure. This updated score provides a reference for clinical staff in decision-making.