Background

Non-invasive positive pressure ventilation (NIV) delivers two levels of pressure during the respiratory cycle—a lower pressure during the expiratory phase and a higher pressure during the inspiratory phase. The pressure differential assists with the washout of accumulated carbon dioxide (CO2) and supports respiratory muscles to reduce work of breathing [1]. As such, NIV has been found to reduce mortality and need for intubation in patients with acute hypercapnic respiratory failure secondary to acute exacerbation of chronic obstructive pulmonary disease (AECOPD) [2, 3], NIV is also suggested for use in acute respiratory failure in immunocompromised and postoperative patients, and for prevention of post-extubation respiratory failure in high-risk patients [2].

Despite wide potential for application, NIV use can be limited due to patient intolerance of the interface or positive pressure. NIV requires a tight-fitting mask or helmet, delivery of high pressures to an awake patient, is associated with skin breakdown after prolonged use, causes gastric insufflation with increased risk of aspiration, can be associated with patient-ventilator asynchrony, and limits both secretion management and nutritional intake [4, 5]. Patients who cannot tolerate NIV will often require invasive mechanical ventilation [6,7,8].

High-flow nasal cannula (HFNC) is an oxygen delivery device which utilizes high inspiratory flows of up to 60L/min through a nasal cannula to deliver up to 100% fraction of inspired oxygen (FiO2). HFNC has been studied in the hypoxemic population and is recommended in the setting of hypoxemic respiratory failure, post-extubation in selected patients, and in the postoperative setting for high-risk patients after cardiac or thoracic surgery [5, 9]. While the majority of evidence for HFNC is in the setting of acute hypoxemic respiratory failure, it is of increasing interest as an alternative to NIV in hypercapnic respiratory failure. Physiological studies suggest that the high gas flows of HFNC may improve ventilation by increasing mean airway pressure and washout of dead space, all while being more comfortable and tolerable by the patient [10,11,12]. Initial observational studies have demonstrated improvement in hypercapnia with the use of HFNC [13, 14].

Hence, our objective was to conduct a systematic review and meta-analysis to determine the efficacy and safety of HFNC compared to NIV for adults with acute hypercapnic respiratory failure. While previous systematic reviews have compared HFNC to NIV for the treatment of hypercapnia, they have important limitations, such as including heterogeneous patient populations [15, 16]. Additionally, these systematic reviews do not include several recently published randomized clinical trials (RCTs) [17, 18]. We hypothesized that there would be no increased risk of mortality when HFNC is used compared to NIV, but potentially an increased risk of intubation.

Methods

Study selection

We included parallel-group and crossover RCTs that enrolled adults ≥ 18 years old presenting with acute hypercapnic respiratory failure, defined as a pH < 7.35 or partial pressure of carbon dioxide (PaCO2) > 45 mmHg, regardless of the etiology. Eligible studies compared HFNC (any setting or duration) to NIV (defined as those with bi-level positive airway pressure, regardless of setting, interface or duration). Studies reporting on at least one of the following outcomes were included: the primary outcome of mortality at longest follow-up, or secondary outcomes of endotracheal intubation and invasive mechanical ventilation, hospital length of stay (LOS), Intensive Care Unit (ICU) LOS, change in PaCO2, change in partial pressure of oxygen (PaO2), respiratory rate (measured at the end of treatment), comfort (measured on a 10-point analog scale at the longest duration of treatment), or dyspnea (defined by the Borg scale taken at longest follow up). In addition to study inclusion criteria, collected characteristics were patient age, patient sex, Acute Physiologic Assessment and Chronic Health Evaluation II (APACHE II) score, and characteristics of the intervention and control group. We excluded pseudo- or quasi-randomized trials, and studies including patients with tracheostomy or were immediately post-extubation. Ethics approval was not obtained as no patient-level data was used in this systematic review.

Electronic search strategy

We searched EMBASE, MEDLINE, and the Cochrane library from inception to October 2021 (Additional file 1: Tables S1 and S2), without limits on publication status or language. Existing systematic reviews and meta-analyses were cross-referenced for potentially eligible studies. Retrieved references were uploaded to Covidence for data management and screening (Covidence systematic review software, Veritas Health Innovation, Melbourne, Australia).

Data collection and analysis

Two independent pairs of reviewers (SO, EH; and NO, KL) screened titles and abstracts in duplicate, and any potentially relevant study was advanced to full-text review. Full-text review was also performed in duplication, with disagreements resolved through discussion. Reviewers (NO and KL) extracted relevant data from eligible trials independently and in duplicate using a pre-designed and piloted data extraction form.

Risk of bias

Two reviewers (NO and KL) independently assessed the studies for risk of bias (RoB) using the original Cochrane risk-of-bias tool (RoB) for randomized trials [19]. RoB was assessed in each study by outcome with reference to: random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessors, incomplete outcome data, selective reporting, and other biases. RoB was judged to be low if all domains had low risk of bias. High risk of bias in any domain resulted in a high-risk categorization for that outcome. Disagreements were resolved by discussion between the two reviewers, or with arbitration with senior authors (KL and SO) if needed.

Analysis

Measurement of treatment effect

We uploaded extracted data into RevMan (Review Manager, version 5.3. Copenhagen: The Nordic Cochrane Centre, The Cochrane Collaboration, 2014) for meta-analysis. We used the DerSimonian and Laird random-effects model to pool the weighted effect of estimates across all studies [20]. The Mantel–Haenszel method was used to estimate study weights for dichotomous outcomes and inverse variance for continuous outcomes. Pooled relative risks (RRs), mean differences (MDs) or standardized mean differences (SMDs) were calculated for dichotomous and continuous outcomes (respectively), with corresponding 95% confidence intervals (CIs). When required, medians and interquartile ranges were converted to means and standard deviations for the purpose of the meta-analysis [21]. Funnel plots were inspected to assess for any publication bias if ten or more studies existed for that outcome [22].

Unit of analysis

For all main outcomes, only one pair-wise comparison was conducted so the same groups of participants were only included once in the meta-analysis. For crossover trials, data was extracted only from the first phase to avoid the potential of carry-over effects.

Heterogeneity and subgroup analysis

Statistical heterogeneity was assessed using Chi2 and I2 statistics. A Chi2 P value of < 0.1 or an I2 > 50% was pre-determined to meet the criteria of significant heterogeneity [23]. Significant heterogeneity between studies was explored through predefined subgroup analyses to investigate whether certain baseline factors influenced treatment effects. We had two planned subgroup analyses: etiology of hypercapnic respiratory failure (AECOPD vs non-AECOPD diagnoses, hypothesizing a larger treatment effect in AECOPD subgroup), and severity of acidosis (7.30–7.34 vs < 7.30, hypothesizing larger treatment effect in the 7.30–7.34 subgroup).

Sensitivity analysis

We conducted a pre-specified sensitivity analysis restricted to studies without concerns for risk of bias. We hypothesized that the treatment effect would be smaller after excluding studies with some or high concerns of bias. Additionally, we conducted a post hoc analysis excluding one study (Wang et al.) which was only available as an abstract [15, 24].

Assessing the certainty of evidence

Certainty of evidence for all major outcomes was assessed using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach [25]. GRADE considers individual study risk of bias, inconsistency, indirectness, imprecision, and publication bias. This was performed by two reviewers (NO and KL) independently and in duplicate for each outcome. Certainty of evidence was ranked as very low, low, moderate, or high.

GRADEpro software [GRADEpro GDT: GRADEpro Guideline Development Tool (Software), McMaster University, 2020] was used to prepare the Summary of findings (SoF) table (Table 1) [26]. Justification of all decisions are presented in the footnotes. We used minimal important differences to assist in judgements of imprecision. The minimal important differences can be found in the SoF table footnotes and all values were based on clinical judgements post hoc.

Table 1 Summary of Findings

Trial sequential analysis

We used trial sequential analysis (TSA) to determine if the required sample size to reach the threshold for statistical significance was met for the important outcomes of morality, intubation and ICU LOS. We performed these analyses using TSA software v. 0.9.5.10 Beta (Copenhagen Trial Unit, Center for Clinical Intervention Research, Rigshospitalet, Copenhagen, Denmark available at http://ctu.dk/tsa/). We constructed cumulative z-scores and the required information sizes (RIS) to definitively accept or refute the effect size of interest. We conducted primary TSA using an alpha of 0.05, power of 0.90 (beta 0.10), estimated diversity, unweighted control event proportions for binary outcomes and variances as estimated in the included trials for continuous outcomes. We defined relative risk reduction (RRR) of 15% as a clinically important difference for the outcomes of mortality and intubation and a mean difference (MD) of 24 h for the outcome of ICU LOS. Of note, the TSA was performed post hoc at the request of the journal.

Results

Screening

Following the electronic search, 7735 studies were imported for screening and 4915 were screened by title and abstract after removal of duplicates (Fig. 1). Full-text review was completed for 273 studies and eight were included in the analysis [17, 18, 24, 27,28,29,30,31]. All studies except for one were published as full manuscripts [24]. Excluded studies and reasons for exclusion are available in the supplement (Additional file 1: Table S3).

Fig. 1
figure 1

PRISMA flow diagram

Characteristics of included studies

The eight studies included a total of 528 patients (Table 2) [17, 18, 24, 27,28,29,30,31]. The mean age of participants was 65.9 ± 11.8 years, with 43% being females. The mean APACHE II score was 21.0 ± 7.6. The mean pH of patients on presentation was 7.32 ± 0.04 and the mean PaCO2 was 64.33 ± 7.25 mmHg. All studies were limited to patients with acute hypercapnic respiratory failure. Six studies were parallel group RCTs [17, 18, 24, 27, 28, 32], and two were crossover trials [29, 31].‬

Table 2 Characteristics of included studies

Five studies assessed the outcomes of HFNC vs. NIV in patients with AECOPD [18, 24, 27,28,29]. One study studied patients with cystic fibrosis [31] and two studies enrolled patients with any cause of hypercapnic respiratory failure [17, 32]. Two studies included patients in the emergency department (ED) [17, 30] and one limited to ICU patients [28]. Four studies had broad inclusion criteria of inpatients or admissions to the ED, ICU, or respiratory unit [18, 27, 29, 31]. Location of admission was not available for one study [24].

Inclusion criteria for pH and PaCO2 varied. Three studies set a limit of a pH ranging from 7.25 to 7.35 [18, 27, 29], whereas another required patients to have a pH > 7.20 [17]. One study’s inclusion criteria for hypercapnic respiratory acidosis was based on pH alone (< 7.35) and another was based on PaCO2 alone [31, 32]. Two studies did not set specific pH or CO2 cutoffs in their inclusion criteria [24, 28].

Risk of bias

Risk of bias varied significantly based on the type of outcome measure (Additional file 1: Table S4). Risk was overall low for objective measures (mortality, intubation, hospital LOS, ICU LOS, respiratory rate, PaO2, and PaCO2) with the exception of one study which had a high loss to follow-up rate resulting in high risk of bias [27]. Two studies were deemed to be at potentially high risk of bias due to their funding [30, 31]. One study had high risk of bias due to selective reporting, with the addition of outcomes measured following trial registration [29]. Risk of bias was rated as high in all studies for the subjective outcomes of dyspnea and comfort in all studies due to lack of blinding.

Outcomes

Mortality

Four studies (n = 250) reported on mortality at the longest follow-up [17, 18, 24, 27]. The use of HFNC compared to NIV did not demonstrate a difference (RR 0.86, 95% CI 0.48–1.56, I2 = 0%, low certainty) (Fig. 2). The absolute risk difference was − 2% (95% CI – 9–10) (Table 1).

Fig. 2
figure 2

Mortality. HFNC High flow nasal cannula; NIV Non-invasive ventilation; RCTs Randomized controlled trials

Endotracheal intubation

Four studies (n = 275) reported on endotracheal intubation outcomes [18, 24, 27, 30]. The confidence interval was imprecise, indicating no difference in outcome (RR 0.80, 95% CI 0.46–1.39, I2 = 0%, low certainty) (Fig. 3). This translates into an absolute risk difference of − 3% (95% CI – 9–7) (Table 1).

Fig. 3
figure 3

Intubation. HFNC High flow nasal cannula; NIV Non-invasive ventilation; RCTs Randomized controlled trials

ICU length of stay

The pooled point estimate from two studies (n = 67) demonstrated no statistically significant reduction in duration of ICU LOS when HFNC was used compared to NIV (MD 0.08 days, 95% CI − 1.16–1.32, I2 = 56%, low certainty) (Fig. 4) [24, 30].

Fig. 4
figure 4

Secondary Outcomes. HFNC High flow nasal cannula; NIV Non-invasive ventilation; RCTs Randomized controlled trials

Hospital length of stay

Four studies (n = 352) measured hospital LOS [17, 18, 28, 30]. HFNC did not change the duration of hospital LOS compared to NIV (MD − 0.82 days, 95% CI − 1.83–0.20, I2 = 0%, high certainty) (Fig. 4).

Comfort

Two studies (n = 101) measured comfort at the longest duration of treatment [18, 31]. The comfort of patients on HFNC did not differ from those receiving NIV (SMD − 0.32 points, 95% CI − 1.78–1.13, I2 = 91%, very low certainty) (Fig. 4) [18, 31].

Dyspnea

Four studies (n = 191) reported on dyspnea using a Borg scale or equivalent [33, 34]. The pooled estimate showed no clinically important difference in dyspnea scores after treatment when HFNC was used compared to NIV (MD − 0.04 points, 95% CI − 0.54–0.45, I2= 18%, very low certainty) (Fig. 4) [18, 29,30,31].

Respiratory rate

Five studies (n = 234) reported on respiratory rate [17, 18, 29,30,31]. There was no statistical difference in the respiratory rate between the two interventions (MD − 0.85 breaths/min, 95% CI − 1.88–0.18, I2 = 0%, low certainty).

PaO2 and PaCO2

Five studies (n = 427) measured change in PaO2, and no difference in PaO2 level was observed (MD − 0.78 mmHg, 95% CI − 4.18–2.62, I2 = 0%, high certainty) (Additional file 1: Fig. S1) [17, 18, 27, 28, 30].

Pooling the results across seven studies (n = 487) showed no difference in change in PaCO2 between those treated with HFNC versus NIV (MD − 1.87 mmHg, 95% CI − 5.34–1.60 mmHg, I2 = 47%, moderate certainty) (Additional file 1: Fig. S2) [17, 18, 27,28,29,30,31].

Subgroup and sensitivity analyses

Subgroup analysis by AECOPD category for the comfort outcome demonstrated a subgroup effect favoring HFNC in AECOPD (P-interaction = 0.001, I2 = 90.6%; Additional file 1: Fig. S3), however this analysis only included two studies. There was no subgroup effect for the remaining outcomes (Additional file 1: Figs. S4–S6). We were unable to conduct subgroup analyses by severity of acidosis.

Sensitivity analyses excluding high risk of bias trials or excluding the only study published as an abstract [24] did not alter the results of analyzed outcomes (Additional file 1: Figs. S7–S16).

The TSA for all outcomes was inconclusive, as they did not meet the RIS and the boundaries for benefit, harm, or futility were not crossed (Additional file 1: Figs. S17–S19).

Discussion

In this systematic review and meta-analysis of eight RCTs (n = 528 patients), there was no difference in the need for endotracheal intubation (low certainty), mortality at longest follow-up (low certainty), ICU LOS (low certainty), hospital LOS (high certainty), or change in PaCO2 (moderate certainty) or PaO2 (high certainty) when HFNC was compared to NIV in patients with hypercapnic respiratory failure.

While NIV use may reduce risks of death and endotracheal intubation in patients with hypercapnic respiratory failure compared to conventional oxygen therapy, it is not tolerated by all patients, leaving physicians with few options other than proceeding with endotracheal intubation. HFNC is increasingly used in acute hypoxic respiratory failure, but theoretically may also assist in ventilation, potentially with increased comfort and tolerance compared to NIV. Recent ERS guidelines made a conditional recommendation for a trial of NIV prior to use of HFNC in patients with COPD and acute hypercapnic respiratory failure, noting that there is high certainty that NIV reduces intubation, and that more evidence was needed before HFNC could be considered equivalent or superior to NIV. It was noted that there was limited evidence outside of COPD, and that more information was needed to identify patient populations where HFNC could be trialed prior to NIV.

Overall, our results are similar to those of previous systematic reviews, even accounting for the differences in trial selection [15, 16]. Specifically, previous systematic reviews included post-extubation studies. This population is excluded in the current analysis as they may have reasons other than hypercapnic respiratory failure for requiring reintubation, including post-extubation stridor, ineffective cough, and secretion management [35].

The study has a number of strengths, including use of a peer-reviewed electronic search strategy, with iterative searches up to October 2021. Screening, risk of bias, and certainty of evidence assessment were done in duplicate. We considered a priori subgroups of patient populations, hypothesizing that effect of HFNC may be different in patients with AECOPD.

The interpretation of these results is limited by the relatively small number of studies and patients, which resulted in imprecision of the results. As an emerging clinical entity, many studies evaluated physiologic variables rather than the patient-important outcomes of mortality and intubation. Additionally, patient goals of care (whether or not they would be candidates for intubation) were not reported and would be valuable for assessment of the mortality and intubation outcomes. Although a lack of significance may be seen as a limitation, this simply means that we have identified a knowledge gap and there needs to be a call to action by critical care researchers to expand on this important topic. This is further supported with the TSA. Some subgroup analyses may be underpowered due to small number of included studies. Moreover, we hypothesized that patients with more severe respiratory acidosis treated with HFNC may require intubations more frequently than those treated with NIV. Unfortunately, we were unable to complete an analysis based on degree of acidosis due to a complete lack of subgroup data. Study populations were also heterogenous, without consistent stratification between AECOPD and non-AECOPD causes of hypercapnic respiratory failure, thereby limiting conclusions on this specific question. Lastly, we were unable to examine funnel plots to detect publication bias given the small number of available studies. We attempted to minimize publication bias through extensive searches of databases, employing no language restrictions, and discussing the findings with experts in the field. Although, this systematic review protocol was not registered or published, this study was a sub-study of an ongoing clinical practice guideline that follows pre-specified methodology. As indicated above, the only post hoc analysis was a sensitivity analysis where we excluded abstracts. All other decisions were made a priori.

Conclusions

In summary, emerging evidence is inconclusive in identifying whether HFNC may be an alternative to NIV for patients with hypercapnic respiratory failure. Further trials, such as an upcoming randomized non-inferiority trial [36], may improve the precision of the estimates.