Introduction

The COVID-19 pandemic is an unprecedented global health crisis, with more than 651 million infections and more than 6.6 million deaths [1]. A serious complication and common cause of death in patients with COVID-19 infection is acute hypoxemic respiratory failure (AHRF), which occurs in 15–30% of COVID-19 patients [2]. This can be managed by high flow nasal cannula (HFNC) oxygen therapy [3], a non-invasive method of oxygen supplementation, using a large-bore nasal cannula to administer up to 100% FiO2 at a high flow of 60 L/min [4, 5]. This intervention creates positive airway pressure [6] and lessens the anatomical dead space and the work of breathing [7]. The use of warmed (31–37 °C) [8], humidified oxygen protects the mucosal lining, allows oxygenation with lower transpulmonary driving pressure, and facilitates secretion clearance [9]. HFNC can be used to spare patients from invasive mechanical ventilation [10].

However, HFNC use can delay endotracheal intubation, and such delays are associated with longer invasive mechanical ventilation and a poorer prognosis [11]. In particular, COVID-19 patients have a high failure rate of non-invasive treatment (i.e., worsening in severity so as to require intubation and invasive ventilation) [12]. Hence, prognostic tools to predict HFNC failure have high clinical relevance, to discriminate in a timely manner the patients who are poor candidates for HFNC continuation, from those who can be safely spared from invasive mechanical ventilation. Current risk stratification tools for HFNC failure include respiratory parameters—the oxygen saturation to fraction of inspired oxygen ratio (SpO2/FiO2), respiratory rate (RR), partial pressure of carbon dioxide (PaCO2), and partial pressure of oxygen (PaO2) [13, 14]. However, these parameters in isolation are unable to identify the need for intubation reliably [15].

The respiratory oxygenation (ROX) index is a prognostic index that has gained popularity during the COVID-19 pandemic, and was specifically developed to prognosticate HFNC failure in patients with pneumonia and AHRF [16,17,18]. It combines SpO2/FiO2 and respiratory rate (RR) using the formula \(\left( {\frac{{{{{\text{SpO}}_{2} } \mathord{\left/ {\vphantom {{{\text{SpO}}_{2} } {{\text{FiO}}_{2} }}} \right. \kern-0pt} {{\text{FiO}}_{2} }}}}{{{\text{RR}}}}} \right)\) [15], and can be easily done at the bedside [19]. The original description of the ROX index by Roca et al. measured the ROX at 2, 6, and 12 h after HFNC initiation. The proposed cut-offs were ROX > 4.88 to predict HFNC success at 2, 6, and 12 h, and ROX < 3.85 after 12 h to predict HFNC failure. A ROX of 3.85–4.88 was described as an indeterminate range, and the authors suggested that such patients should have the ROX reassessed at a later time point.

Multiple studies have validated the ROX index in recent years [20, 21]. However, as there are no universally accepted protocols for ROX use, studies on the ROX index have adopted different cut-offs and monitoring intervals. There is thus uncertainty [17, 21] over optimal cut-off values in general, and the specific cut-offs when ROX is measured at different times from HFNC initiation. We note a previous meta-analysis by Prakash et al. [19], which included all studies on ROX as a predictor for HFNC failure up till early 2021. Prakash et al. provided a simple dichotomisation of the ROX index into high versus low categories, with no specific analysis of time from ROX initiation. There are no present meta-analyses to determine the optimal ROX index cut-offs [8, 19, 22], at initiation of HFNC and at subsequent time points.

Hence, in this systematic review and meta-analysis, we aimed to describe the performance of the ROX index in predicting HFNC failure amongst COVID-19 patients at different time points from HFNC initiation. We further aimed to derive optimal cut-off values at various timings to guide the interpretation of ROX index in various clinical settings.

Methods

Search strategy

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [23] informed the design and execution of this study. The protocol was registered on PROSPERO (CRD42023388254). Medline/PubMed, Embase and Cochrane Central Register of Controlled Trials (CENTRAL) were searched from inception to 20 December 2022 for eligible studies with keywords related to Covid-19 and ROX index. The search strategy was developed in collaboration with a medical librarian (Medical Library, National University of Singapore). The search strategy can be found in the Additional file 1. No language filters were applied. Two authors independently carried out the preliminary eligibility screening in a blinded fashion. The authors screened the titles and abstracts before retrieving and reviewing the full texts. Studies were included if they (1) included Covid-19 patients on HFNC and (2) utilised the ROX index. Reviews, commentaries, animal studies and case reports were excluded. A senior author resolved differences by discussion and consensus.

Data extraction and selection criteria

From each study, two authors used a standardised data extraction sheet to extract information on the study period, country, population demographics independently. ROX cut-off values for each study, and the definition of HFNC failure or success were extracted. Since some studies used HFNC success as the outcome, while others used HFNC failure, we standardised HFNC failure as the outcome of interest. HFNC failure included the outcomes of intubation, escalation to mechanical ventilation (including non-invasive and invasive ventilation), and death.

Confusion matrices (a form of the 2 × 2 contingency table) were constructed for each study detailing the number of true positives (patients who scored below the ROX cut-off and experienced HFNC failure), true negatives (patients who scored above the ROX cut-off and experienced HFNC success), false positives (patients who scored below ROX cut-off and experienced HFNC success), and false negatives (patients who scored above the ROX cut-off and experienced HFNC failure).

Statistical analysis

All analysis was done using RStudio (version 2021.9.1.372). Statistical analysis was conducted with meta (version 5.2–0) and diagmeta (version 0.5–0) packages. Using diagmeta which implements the approach outlined by Steinhauser et al. [24] and also previously successfully applied to other acute clinical research questions [25], various linear mixed models were fitted to estimate the distribution function of the ROX index within the included studies. For the linear mixed models that converged, we applied the restricted maximum likelihood criterion and the model that minimised this criterion was selected. Area under the summary receiver operator characteristic curve (sAUC) was derived for different individual time points and time windows, and the optimum cut-off values were calculated. The optimum cut-off value was the cut-off which maximised the weighted sum of sensitivity and specificity.

Subgroup analysis

Analysis of the diagnostic accuracy of the ROX index was conducted in prespecified subgroups: average age of included patients, admission year of patients, corticosteroid usage. Sensitivity analysis was conducted by excluding conference abstracts and letters and including only journal articles. Further sensitivity analysis was conducted for the type of outcome—excluding studies that had progression to NIV or death as an outcome. Pairwise comparison of sAUC values between different subgroups was made in accordance with the methods outlined by Hanley et al. [26, 27].

Risk of bias assessment

Two independent and blinded authors assessed studies for methodological quality, using the Quality Assessment of Prognostic Accuracy Studies (QUAPAS) tool for prognostic studies [28]. The QUAPAS tool assesses the quality of studies across five key domains: participant selection, index test, outcome, flow and timing, and analysis. Disagreements were resolved through discussion with a third author.

Patient and public involvement

No patients or members of the public were directly involved in this research study.

Results

Article search and included studies

The search strategy identified 242 relevant studies (Fig. 1) after removal of duplicates. Fifty studies were included in the full-text review. Our final analysis included 24 [16, 21, 29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50] studies, comprising 4790 patients (Table 1). Of the 24 studies, nine were multicentre studies and 15 were single-centre studies.

Fig. 1
figure 1

Preferred reporting items for systematic reviews and meta-analyses (PRISMA) flow chart

Table 1 Characteristics of included studies

Studies came from all major global regions, including one multicontinental study involving two institutions from Europe and South America. Studies from single regions included one from Africa, five from Asia, eleven from Europe, three from North America, and two from South America. One did not report the location of the study.

Six were prospective studies, 15 were retrospective studies, and two were retrospective analyses of prospectively collected data. One study did not report on the study type. Seven studies were published in 2020, eight in 2021, and nine in 2022.

The mean age of included patients ranged from 51.3 to 70. ROX cut-offs investigated ranged from 1.96 to 8.36. The definitions of HFNC success or failure used by each study are shown in Table 1.

Most of the studies were evaluated to be moderate or high in bias (Additional file 1: Fig. S1).

Overall performance of the ROX index

Results from the 24 studies were pooled. ROX had an overall sAUC of 0.771 (95% CI: 0.666–0.847) (Fig. 2), and the overall optimal cut-off value of ROX was 5.23. At this cut-off, sensitivity was 0.732 (95% CI: 0.578–0.846) and specificity was 0.690 (95% CI: 0.539–0.809). The cut-off to achieve 80% sensitivity was 5.70, while the cut-off to achieve 80% specificity was 4.45. The cut-off to achieve 90% sensitivity was 6.69, while the cut-off to achieve 90% specificity was 3.37.

Fig. 2
figure 2

Overall summary receiver operator characteristic curve

A sensitivity analysis was conducted by including only original journal articles (i.e. excluding letters and abstracts). In these 19 studies with 4535 patients, sAUC was 0.770 (95% CI: 0.656–0.849), and the optimal cut-off value was 5.27. At this cut-off, sensitivity was 0.741 (95% CI: 0.563–0.864) and specificity was 0.681 (0.515–0.811). No significant difference in sAUC was found (two-tailed p-value: 0.921). Another sensitivity analysis was conducted by including original journal articles and letters (i.e. excluding abstracts). In these 21 studies with 4625 patients, sAUC was 0.764 (95% CI: 0.648–0.845), and the optimal cut-off value was 5.10. At this cut-off, sensitivity was 0.722 (95% CI: 0.552–0.846) and specificity was 0.688 (0.526–0.814). No significant difference in sAUC was found (two-tailed p-value: 0.488).

As a further sensitivity analysis, we analysed the overall optimal cut-offs with the exclusion of Hamou et al. [38] (with the lowest cut-off), and Kim et al. [33] (with the highest cut-off), removed from the analysis separately. Removing Hamou et al., sAUC was 0.766 (95% CI: 0.661–0.841); removing Kim et al., sAUC was 0.779 (95% CI: 0.688–0.848); removing both, sAUC was 0.774 (95% CI: 0.684–0.842). There was no significant difference from the overall sAUC in these three sensitivity analyses (two-tailed p-value 0.620, 0.423 and 0.767, respectively).

Diagnostic accuracy of the ROX index measured within specific time windows

In 5 studies which used ROX measured ≥ 2 h but < 6 h of initiation of HFNC, sAUC was 0.754 (95% CI: 0.604–0.863) (Fig. 3), and the optimal cut-off was 5.71. At this cut-off, sensitivity and specificity were 0.635 (95% CI: 0.411–0.813) and 0.769 (95% CI: 0.465–0.927), respectively (Table 2).

Fig. 3
figure 3

Area under summary receiver operator characteristic curve at various time windows with 95% confidence intervals

Table 2 Diagnostic accuracy at various time windows

In 8 studies which used ROX measured ≥ 6 h but < 12 h of initiation of HFNC, sAUC was 0.795 (95% CI: 0.287–0.908) (Fig. 3), and the optimal cut-off was 6.50. At this cut-off, sensitivity and specificity were 0.835 (95% CI: 0.389–0.976) and 0.659 (95% CI: 0.192–0.940), respectively (Table 2).

In 7 studies which used ROX measured ≥ 12 h but < 24 h of initiation of HFNC, sAUC was 0.821 (95% CI: 0.713–0.894) (Fig. 3), and the optimal cut-off was 5.78. At this cut-off, sensitivity and specificity were 0.749 (95% CI: 0.575–0.868) and 0.759 (95% CI: 0.511–0.905), respectively (Table 2).

No analysis of the ROX was done for < 2 h from ROX initiation as there were only 5 studies with 5 cut-offs, insufficient to fit the linear mixed models for the ROX index distribution in this time window.

Diagnostic accuracy of the ROX index in clinically important subgroups

Elderly patients

Twenty one studies included data on the average age of patients. No significant difference in sAUC value was found between studies where the mean age was less than 65 and those where the mean age was more than or equal to 65 (two-tailed p-value: 0.067).

Use of corticosteroids

In five studies where more than 75% of patients received corticosteroid treatment, sAUC of ROX was 0.711 (95% CI: 0.293–0.883). In three studies where the proportion of patients who received corticosteroid treatment was less than or equal to 75%, sAUC of ROX was 0.721 (95% CI: 0.005–0.997). sAUC values were not significantly different between the two groups (two-tailed p-value: 0.696).

Admission year of patients

Fifteen studies included patients admitted in 2020 only, while four studies included patients admitted in 2021 only. For this sub-analysis, we did not include four studies which had included patients across both 2020–2021, and one study that did not state the year of admission. sAUC value in the group admitted in 2021 was insignificantly different from that in the group admitted in 2020 (two-tailed p-value: 0.404).

The sAUC values and p-values for between subgroup differences are detailed in Table 3.

Table 3 Area under the summary receiver operator characteristic curve (sAUC) in clinically important subgroups

Other subgroups

Three studies of the ROX index were performed in unvaccinated COVID-19 patients. The cut-offs in these few studies widely ranged from 1.96 to 5.63, and this can affect the quality of sAUC estimate. In particular, Hamou et al. used an unusually low cut-off of 1.96. Most studies did not report on the vaccination status of their included patients. There were no studies which made direct comparison of the ROX index between vaccinated and non-vaccinated patients. We decided not to perform further analysis of the ROX index based on vaccination status.

Diagnostic accuracy of the ROX index for different outcomes

The definition of HFNC failure was heterogenous across different studies, with some studies including mortality as part of a composite outcome, and some including progression to NIV as an endpoint. We present the diagnostic accuracy of the ROX index stratified by the definition of HFNC failure in Table 4.

Table 4 Diagnostic accuracy stratified by definition of high flow nasal cannula failure

Discussion

HFNC is an important ventilation sparing therapy in COVID-19 patients. However, delayed intubation in HFNC has been shown to lead to increased mortality [11]. Use of the ROX index has allowed for objective assessment of HFNC failure risk, to allow early stratification of patients who can be safely continued on HFNC, from those who are likely to need invasive ventilation. Previous meta-analyses [8, 19] have dichotomised patients to an overall high or low ROX index, to handle different ROX cut-off values used in different studies. On the other hand, our up-to-date systematic review and meta-analysis of the ROX index in COVID-19 patients has included information on all cut-off values and time points described in the constituent studies. We have modelled the sROC of the ROX index in general, as well as at different time points. This has allowed us to estimate the optimal cut-off at each time point, as well as the clinically relevant 80% sensitivity and specificity cut-offs.

Overall, we found that the ROX index had a sAUC of 0.771. The optimal cut-off of the ROX index in general that maximised sensitivity and specificity for HFNC failure was 5.23 (sensitivity: 0.732, specificity: 0.690). This is higher than the cut-off proposed by Roca et al. [18] (who originally developed the ROX index), where a universal cut-off of ROX > 4.88 at 2, 6, and 12 h post-HFNC initiation was suggested to predict HFNC success. Our cut-off derived from analysis of the sAUC agrees more with subsequent studies such as Prakash et al. and Zhou et al. [19, 20], which have suggested an optimal ROX index cut-off of around 5. We suggest that if a general cut-off for the ROX index is sought, without regard for time from initiation, that 5.23 be used as it maximises both sensitivity and specificity. Alternatively, cut-off values of 5.70 and 4.45 are 80% and 80% sensitive and specific, respectively, while cut-off values of 6.69 and 3.37 are 90% sensitive and 90% specific, respectively, in determining HFNC failure without regard for time from initiation. These cut-offs could be used if a provider prefers higher sensitivity or specificity.

For most patients on HFNC therapy, the time from initiation is known, and the ROX can be calculated at different times from HFNC initiation. We provide a further meta-analysis of the performance of the ROX index at different time windows. Providers should be aware of the differences in ROX index performance at different time windows and note the ROX index performs better at time windows from 6 h onwards. We observed that the sAUC improved from 0.754 at 2–6 h, to 0.795 at 6–12 h, to 0.821 at 12–24 h. This echoes the findings of Roca et al. [18], where ROX measured at earlier time points had poorer sensitivity and specificity of < 70%. One possible explanation [20] for the better ROX index performance in later time windows is that patients may be relatively undifferentiated initially, but the additional time afforded them by HFNC therapy allows the underlying pace of COVID-19 disease to become evident, and allows for COVID-19 therapeutics (such as steroids or antiviral drugs) to exert their clinical effect. There were relatively few studies investigating the ROX index in the 0–2 h time window, and we were unable to perform an sAUC analysis. We are unable to comment on the overall diagnostic performance of the ROX index during this window, or to recommend a cut-off.

We suggest that at times closer to HFNC initiation, a cut-off that is highly specific for HFNC failure be used to identify patients at high risk of needing intubation. For these patients, early intubation should be considered. For 2–6 h post-HFNC initiation, we propose the cut-off of < 5.33 (80% specific); for 6–12 h post-HFNC initiation, we propose the cut-off of < 3.69 (80% specific).

On the other hand, at later time points, a cut-off that is highly sensitive for HFNC failure could be used to identify patients with higher likelihood of HFNC success. Such patients might benefit from continued treatment with HFNC. They might even be eligible for non-invasive monitoring with the ROX index given their high likelihood of HFNC success, as opposed to invasive monitoring with arterial blood gas sampling. For 12–24 h post-HFNC initiation, we propose the cut-off of > 6.07 (80% sensitive).

We have also conducted subgroup analyses that examined the ROX index in studies stratified by patient age, use of corticosteroids, and the study period. There were no significant differences in the diagnostic power of the ROX index when comparing among these subgroups, suggesting broad applicability of the ROX index regardless of older age, corticosteroid use, and across different years of the COVID-19 pandemic.

Strengths and limitations

This is the first meta-analysis to estimate optimal cut-off values for the ROX index and evaluate its diagnostic accuracy at different time points and time windows, and to provide high sensitivity and high specificity cut-off values at each time point. This can allow clinicians to make informed choices about the ROX cut-off to use while the patient is undergoing HFNC therapy.

There are some limitations of this analysis. As with any systematic review, we are dependent on the availability of data from the constituent studies. First, publicly available data were only available in aggregate form, instead of individual patient data (IPD), which would be most ideal for estimating diagnostic test performance. This necessitated estimation of the sAUC from the aggregates, rather than directly calculating the AUC. We were thus also unable to analyse the distribution of the ROX index for patients in general, to determine if changes in ROX index within the same patient had any diagnostic value, or to examine the performance of the ROX index in detail for other subgroups (for example, vaccinated versus unvaccinated individuals, immunocompromised individuals, or individuals who received antiviral or immunomodulatory drugs). This lack of IPD necessitated the subgroup analysis to be performed based on study-level characteristics such as mean age and proportion of patients receiving corticosteroids. The lack of IPD, coupled with the sparsity of large studies with cut-off values that achieved 90% sensitivity/90% specificity, is a limitation to reliable calculation of estimated cut-off values for 90% sensitivity/90% specificity in subgroups such as time windows and nature of outcome. We also appreciate that sAUC estimated confidence intervals are relatively large, although this may be improved in the future as more data on HFNC use becomes available.

We are also aware that the absence of standardised reporting protocols for HFNC use can lead to sources of bias that are difficult to account for based on the extant literature. For example, few studies report at which point from onset of COVID-19 illness that the patient presented (which is potentially important as patient presenting very late or very early could have a different disease course [51,52,53]). Other sources of heterogeneity include the definition of HFNC failure, where some studies reported a composite outcome including mortality, on top of the need for intubation or mechanical ventilation. Another source of heterogeneity is the lack of reporting of intubation criteria. The variability in determining when intubation should be initiated would affect the performance of the ROX index. There was an overall lack of standardisation of the time horizon of outcomes in general. As more studies on HFNC and ROX are done, it may be worthwhile for consensus definitions of HFNC success and failure to be agreed on, and minimum reporting criteria for HFNC studies.

We are aware that modified versions of the ROX index exist, such as a modified ROX index [17] incorporating heart rate that was validated in a cohort of 145 patients. There is an overall paucity of data on modified ROX models, and the performance of these alternative indices may be better addressed in future meta-analyses as more data become available.

Conclusion

The ROX index has a good diagnostic accuracy for HFNC failure in COVID-19 patients and performs the best at 6–12 h or later, post-initiation of HFNC. We suggest an optimal cut-off of 5.23 in general, but propose that healthcare providers also contextualise interpretation of the ROX index depending on the time from HFNC initiation. A higher specificity cut-off may be preferred closer to initiation to rule in HFNC failure, whereas a higher sensitivity cut-off could be used further from initiation to rule out HFNC failure and justify continuation of HFNC treatment.