FormalPara Take-home message

In a cohort of  COVID-19 patients treated in the United Kingdom, progressive respiratory failure was increasingly associated with mortality. Evidence-based triggers for ARDS interventions, in particular prone position, were not implemented, had delayed application, or showed poor responsiveness in a sizeable proportion of patients with progressive hypoxaemia. How this implementation gap and lack of response to conventional ARDS interventions may have contributed to excess mortality across the pandemic deserves further interrogation.

Introduction

Coronavirus disease 2019 (COVID-19) caused by the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) was declared a global pandemic on March 11, 2020 by the World Health Organisation. COVID-19 related severe acute hypoxemic respiratory failure invariably leads to intensive care unit (ICU) admission. These patients fulfil Acute Respiratory Distress Syndrome (ARDS) Berlin definition criteria [1,2,3,4]. However, uncertainties around the extent of pathological and physiological differences between COVID-19 related ARDS and other causes of ARDS and the pulmonary angiopathy of COVID-19 further fuel the uncertainties regarding the full disease progression and management of these acutely unwell patients with a high mortality rate [5,6,7,8,9]. This ambiguity leads to an ongoing debate on the application of existing evidence-based ARDS management to COVID-19 patients [10].

Pre-COVID evidence-based guidelines for ARDS management include lung-protective ventilation, prone positioning, conservative fluid strategies with the option of open lung strategy and neuromuscular blockade (NMBA) alongside patients with severe hypoxaemia refractory to these interventions having timely access to extracorporeal membrane oxygenation (ECMO) support [9, 11, 12]. Moreover, reports suggest that real-world compliance with evidence-based ARDS management strategies is difficult at a system level [13]. Furthermore, these interventions are implemented at various stages of ARDS progression and are time-sensitive over the natural history of illness [14, 15]. Monitoring of dynamic responsiveness to interventions is fundamental to clinical practice in critical care and is increasingly facilitated by artificial intelligence analytics [16]. Whilst there have been reports of the epidemiological and admission characteristics of hospitalised patients with COVID-19 admitted to intensive care from around the world, none have a focus on a complete clinical trajectory in combination with clinical application and response to ARDS management strategies [1,2,3,4].

Accordingly, we undertook a cohort study across several intensive care units in the United Kingdom, to report the natural history and management of mechanically ventilated COVID-19 patients. Our specific aims were to define, from routine clinical measurements, crucial factors associated with disease progression and mortality; and to ascertain use, compliance, duration and effect of established evidence-based ARDS management strategies.

Methods

Study design

We performed a multicentre, observational cohort study in patients with SARS-CoV-2 infection who required mechanical ventilation for severe COVID-19 infection in the United Kingdom.

Eligibility criteria

Adult patients (aged ≥ 18 years) with laboratory-confirmed SARS-CoV-2 infection who required invasive mechanical ventilation (IMV) in the United Kingdom between March 1st and August 31st, 2020. Only patients transferred to study sites within 48 h of intubation were included and due to the nature of ECMO provision in the UK those patients were excluded (see Supp. Methods: ECMO).

Ethical approval

The UK Health Research Authority exempted this study from review by an NHS Research Ethics Committee due to its urgent need. Each site registered the study protocol as a service evaluation. The “Strengthening the Reporting of Observational Studies in Epidemiology” statement guidelines were applied (see Supp. Appendix p.4–5) [17].

Data collection and procedures

We setup standardised data processing pipelines to manage the considerable daily data flow. Only routine, pseudonymised data were collected with no change to clinical care. In brief, the case report form captured admission demographics, twice daily (8 am and 8 pm) respiratory physiology and blood gas results, daily ARDS interventions, daily COVID-19 interventions, daily blood results and outcome status. Table S1 lists the participating sites. Data were extracted from either electronic healthcare records (EHRs) or paper-based records into the COVID-ICU secure REDCap database (REDCap v10.0.10; Vanderbilt University, US).

Missing data

We made the heuristic decision of setting the threshold of data completeness (i.e. missingness) to balance off the number of patients, against the number of variables. We defined this by examination of the available variables in the first 48 h of admission or the last 36 h before prone or the first 36 h after prone. If in these 3 or 4 12-h measurement points, all were missing, then we counted this patient as ‘missing’ data. The missingness is thus the percentage of patients where there is no measurement in this 36/48-h window for a modality. Percentage of missing data per modality are shown in Table S2, and details of missing data are shown in Table S3 and S4.

Data presentation and group definitions

Descriptive variables are expressed as percentage, or median and interquartile range (IQR), as appropriate. Continuous variables were analysed with Mann–Whitney-U or Kruskal–Wallis tests, as appropriate. Categorical variables were compared using Fisher's exact test or the Chi-square test for an equal proportion, as appropriate. All statistical tests were 2-sided and P ≤ 0.05 was considered statistically significant. Mortality was defined as ICU mortality. The incidence and duration of interventions, as well as ventilation settings, were analysed and reported to current strategies e.g., low tidal volume ventilation and ARDSNet Positive End Expiratory Pressure (PEEP) tables. We defined an intervention period as a daily application of the intervention with a day of no intervention defining the end of the current period. For group-wise analysis, the outcome of the therapies was measured as categorical variables of “Mild, Moderate, or Severe”, “Survival or Death”, “resolver or non-resolver”, and “prone responder or non-responder”. The severity of hypoxaemia was categorised as per Berlin Definition criteria [18]. To evaluate features associated with the progression of hypoxaemia, we analysed the evolution of hypoxaemia over the first 7 days of invasive mechanical ventilation and categorised them into two groups, “resolvers” and “non-resolvers”. Patients whose hypoxemia categorisation improved or got discharged from ICU were considered “resolvers” while those whose hypoxaemia categorisation worsened, or died, were considered “non-resolvers”. We further considered the longer-term effect on PaO2/FiO2 after prone positioning and defined prone responsiveness as maintenance of a mean PaO2/FiO2 > 20 kPa over 7 days after the first prone episode. We defined a prone opportunity as per inclusion criteria for the PROSEVA study to assess opportunities to apply a prone intervention [19]: a PaO2/FiO2 < 20 kPa, with an FiO2 ≥ 0.6, a PEEP ≥ 5cmH2O. Prone opportunities were measured at 8 am and 8 pm with the ventilator and arterial blood gas (ABG) evaluation.

Multivariate logistic regression using backward method was applied for variable selection (with screening univariate, P < 0.1) to each outcome variable to test associations with independent variables. The full list of variables tested for inclusion (and missingness) in these models is shown in Supplementary Table S2. For details see Supp. Methods “Logistic regression in statistical analysis (details)”.

Statistical analysis of longitudinal measures

The association between the change over time of each independent variable and the outcome measures was tested in repeated measures (rm) ANOVA. For the survival and first week resolver outcome, rmANOVA was applied on the physiology variables over the first week of mechanical ventilation, while for the prone responder outcome, it was applied on the physiology variables over a week from the day before the first PP episode. We accounted for multiple comparisons in the interaction statistic by controlling the false discovery rate (FDR). For details see Supp. Methods “Statistical analysis of longitudinal measures (details)”.

Machine learning models for daily and week-long- mortality prediction

We evaluated both logistic regression and more potent machine learning models at predicting mortality. We used these models to predict (a) mortality based on a single day's data, for each of the first seven days of admission of a patient, and (b) used data from the whole week together to predict mortality. We designed our model to be compatible with both daily prediction data (all used clinical parameters for each day in the first week) and weekly prediction (all used clinical parameters over all 7 days). We used a deep learning framework as a foundation to implement both a logistic regression (LR) predictor and a deep neural network (three-layer multilayer perceptron model—3MLP) predictor. We provide the complete details in the Supp. Materials “Machine Learning models for of daily and weeklong- mortality prediction (details)”.

Grouping of dynamic clinical parameter importance through Explainable AI (XAI)

We analysed our deep learning model and the logistic regression model with an XAI approach. We used SHAP (SHapley Additive exPlanations) to explain how the prediction models weighted the importance of individual input features for its output, i.e. here, the clinical parameters [20]. The SHAP value effectively is the difference in how well a system performs when knowing all parameters minus the performance of the system when knowing all but one parameter, namely the one for which the SHAP contribution is computed. It thus measures, the explanatory cost of leaving the parameter out, and so high SHAP values mean that they are very important for the prediction. SHAP values should not be confounded with conventional regression weights, as a high SHAP value of a variable may imply that a larger, or a lower value, or vicinity to a specific value may increase mortality. It is, therefore, that we used SHAP values, as we could directly compare deep learning and logistic regression models using a common ‘currency’ of explainability. For complete details please see Supp. Methods “Grouping of dynamic clinical parameter importance through Explainable AI (details)”.

All statistical analyses were carried out using MATLAB (MathWorks Inc., Natick, MA). Detailed data science methods are described in the supplementary appendix.

Results

Clinical progression of critically ill COVID-19 patients

A total of 633 mechanically ventilated patients admitted to 13 UK National Health Service (NHS) Trusts with 18 ICU sites between 01 March 2020 and 31 August 2020 had complete daily data up to ICU death or discharge (Fig. 1, Table S1). Baseline demographics (Fig. S1 and Table S5) were similar to the Intensive Care National Audit and Research Centre cohort [21] (Table S6). On initiation of mechanical ventilation, the severity of mild, moderate and severe hypoxaemia was 23.2%; 50.6%, and 26.2%, respectively, with mortality increasing with severity (Fig. 2, Table S7). On admission, increased severity was associated with higher settings for mechanical ventilation, higher severity of organ failure (including dynamic respiratory system compliance, oxygenation index (OI), and ventilatory ratio (VR)) (Table 1), and greater application of interventions (Table 2).

Fig. 1
figure 1

Study population flowchart

Fig. 2
figure 2

ICU survival curves based on admission severities of hypoxaemia as defined by the Berlin definition of ARDS

Table 1 Clinical and physiological characteristics and outcomes according to severity of hypoxaemia on admission
Table 2 Interventions according to severity of hypoxaemia on admission

Determinants of mortality

Survival to ICU discharge was 57.7%. There was a difference in mortality between quartiles of patients admitted (peak: 31st March; median: 1st April 2020) during the first surge (P = 0.053). This showed the first quartile [1st–26th March 2020] of admitted patients during the surge had a mortality of 37.3%; the second quartile [27th March–2nd April 2020], 53%; the third quartile [3rd–9th April 2020], 43.4%; and the last quartile [10th April–31st August], 35.9% (see Fig. S1, Table S8). Admission respiratory SOFA increased across the pandemic quartile (P = 0.036). In those that died, active withdrawal of support occurred in 65% of patients (85/130), in the 13 sites which reported, and unanticipated cardiac arrest occurred in 11% of patients (13/122). There was an increased rate of reported withdrawal of life support in patients admitted during the second and third quartiles of the surge (first quartile, 55.9%; the second quartile, 73.8%; the third quartile, 71%; and the last quartile, 56.5%; P = 0.018). Patients who had life support withdrawn had a median age of 64(57–70) years, a length of mechanical ventilation of 11 (6–18) days; a last PaO2/FiO2 of 12.8 (10–19.5) kPa and had a higher application of prone intervention (72%). Median PaO2/FiO2 in non-survivors on the day of death was 12.3(8.9–18.4)kPa.

Our multivariate model showed clinical variables on ICU admission independently associated with mortality were higher age (HR 1.95 per decade, 95% CI 1.58–2.4), male gender (HR 2.05, 95% CI 1.17–3.61), higher lactate (HR 1.52 per quartile (0.6 mmol/L), 95% CI 1.21–1.92), and higher SOFA coagulation score (HR 1.95, 95% CI 1.17–3.26) (Fig. S2; Table S9). Over the first week, statistically significant interaction differences were noted in the group-wise ANOVA between survivors and non-survivors within several respiratory, inflammatory and coagulation parameters (Fig. S2; Table S10). Machine learning models using admission data predicted mortality with 60% accuracy. Predictive capacity increased to 74.5% and 76.3% accuracy, respectively, when longitudinal data from the first week were added to LR and 3MLP models (Fig. 3). Critically, using Explainability AI methods, we were able to identify key clinical parameters which started at relatively low importance at admission but then greatly increased and exceeded others in importance over the first week (Fig. 3): these were lower PaO2/FiO2, higher peak pressure, higher ventilatory ratio (VR), lower pH, higher lactate, lower platelet count, higher C-Reactive Protein (CRP), lower oxygen saturations, and higher PaCO2 (see Fig. S3).

Fig. 3
figure 3

a Week-long day-by-day performances of our logistic regression (LR) and three-layer multilayer perceptron (3-MLP) model. As shown in the figures, our models' performances rise daily from admission to the end of the first week. Daily accuracy for mortality prediction are: [Log-Reg, 3MLP]: Day 0—[58.9%, 61.9%], Day 1—[63.2%, 63.3%], Day 2—[65.3%, 66.7%], Day 3—[70%, 69.7%], Day 4—[70.3%, 71.9%], Day 5—[71.7%, 72.8%], Day 6—[74.1%, 73.6%]. b The importance of clinical parameters for the week-long mortality prediction model changes day-by-day. We are showing here the evolution of the SHAP (SHapley Additive exPlanations) importance of the clinical variables from admission (Day 1–2) to end of the first week, particularly PaO2/FiO2, higher peak pressure, higher ventilatory ratio, lower pH, higher lactate, lower platelet count, higher CRP, lower oxygen saturations, higher PEEP and higher PaCO2. Note, how some of the variables are systematically high importance or low importance while others are systematically increasing from day 1 and vice versa, see Fig. S4 for details

Determinants of oxygenation

Movement across hypoxaemia severity groups (mild, moderate and severe PaO2/FiO2 group) showed deterioration in 31.4% of cases, stasis in 45.1%, and resolution in only 23.5% of patients over the first 7 days (Fig. 4 and Table S11). Overall, progression to a worse PaO2/FiO2 group occurred in twice the number of patients as compared to pre-COVID studies of ARDS (Table S11). ICU mortality in those who did not resolve hypoxaemia within the first week was significantly higher than those that did (60.4% versus 17.6%; P < 0.001; Fig. S4). Admission and time-course differences between resolvers and non-resolvers in demographic, ventilatory, physiological, and laboratory parameters are shown in Fig. S4 and Tables S12 and S13. Resolvers were younger [57 (47–64) vs 60 (54–67) years; P < 0.001] and showed a longer duration of symptoms prior to ICU admission 9.0 (7–14) vs 7 (6–11) days (P = 0.004). Multivariate regression showed that increased age and worse cardiovascular SOFA were associated with deteriorating  hypoxaemia within the first week of IMV (Fig. S4; Table S14).

Fig. 4
figure 4

Alluvial diagram of patient movements between ARDS severity groups: Mild hypoxaemia (PaO2/FiO2 > 26.7), Moderate hypoxaemia (PaO2/FiO2: 26.6–13.3), and severe hypoxaemia (PaO2/FiO2 < 13.3) and patient outcome (discharged—green, deceased—red). Each solid bar represents an ARDS severity group at a given number of days since initiation of IMV. Shaded coloured streams between bars represent transitions of patients between the severity groups from one-time point to the next, which is either their new severity or their outcome. The height of the bars represents the proportion of patients at that time point (i.e., they stack up to 100%) and the height of a stream field represents the size of the components contained in both bars connected by the stream

ICU management

The application, median start date and duration of the first episode of each intervention and for each site is shown in Figs. S5, S6 and Table S15. The reported ideal body weight overestimated our calculated ideal body weight derived from reported height (http://ardsnet.org) in 92.6% of patients (Fig. S7). Hence, median tidal volume per kg on actual ideal body weight was 7.0 [IQR 6.0–8.4] mL/kg across all breaths and 5.6 [IQR 4.7–6.6] mL/kg on reported ideal body weight. Survivors and non-survivors showed the same distribution of tidal volume variation. Over 65% of reported PEEP values were set outside ± 1cmH2O and 53% set outside ± 2cmH2O of the ARDSNet PEEP-FiO2 tables (Fig. S7). Patients with BMI < 40 had a higher set PEEP than recommended by the PEEP-FiO2 table. In contrast, patients with BMI > 40 had a lower set PEEP than recommended by the PEEP-FiO2 table. Inhaled nitric oxide and prostacyclin were commenced on day 6 (3–9) and 7 (3–15) and were continued for 4 (2–7) days and 3 (1–7) days, respectively. Tracheostomy was performed in 29% at a median 14(9–18) days in patients mainly likely to survive (40% versus 10.9%; P < 0.001). Application of high PEEP, NMBA, and prone position was significantly higher during the second and third quartiles (Table S8). Corticosteroid usage increased across the surge whereas use of diuresis reduced (Table S8).

Responsiveness to open lung and prone interventions

Changes in PEEP were widespread over the first 7 days of IMV with both increases and decreases leading to unpredictable changes in PaO2/FiO2 (Fig. S7). We analysed the immediate change in PaO2/FiO2 over 36 h around the first prone intervention. Indeed, there were both positive and negative changes in PaO2/FiO2 in response to prone intervention over the first 36 h (Fig. S8). Improvements in oxygenation in response to prone position was found to decrease the later the prone episode was initiated after intubation (Fig. S8; Spearman r = − 0.16, P = 0.012). Patients that resolved hypoxaemia in the first week had prone position applied significantly earlier (2 [1–5] vs 4 [2–7] days; P = 0.007) than those that did not resolve. Importantly, in those that received no prone position, there were a higher number of missed opportunities to prone in non-resolvers compared to resolvers (6 [3–13] versus 1 [0–4] opportunities per patient; P < 0.001; Table S12).

Only 44.4% of patients maintained a mean PaO2/FiO2 > 20 kPa over 7 days after the initiation of prone position. Mortality was significantly higher in prone non-responders than in responders (69.5% versus 31.1%, P < 0.001 as seen in Fig. 5 and Table S16). Time series analysis showed that non-responders showed worse mean airway pressure, worse oxygenation index (OI), higher platelet count and higher alkaline transaminase (ALT) over the first week of prone position (Fig. S8 and Table S17). Multivariate analysis showed non-responders to be older with a higher pre-pronation peak pressure (OR 1.42[1.06–1.91]; P < 0.05), higher respiratory component (OR 1.71[1.17–2.5]; P < 0.01) and higher cardiovascular component (OR 1.36[1.04–1.75]; P < 0.05) of the sequential organ failure assessment (SOFA) score and raised lactate (OR 1.33[0.99–1.79]; P = 0.057) (Fig. S8 and Tables S18 and S19). Whilst there were no significant differences in the duration of IMV prior to the first prone period, the duration of the first period, or the number of future prone periods between responders and non-responders; non-responders had a higher number of missed prone opportunities (prior to first prone position event) than responders (3 [1–7] versus 2 [1–5] opportunities per patient; P < 0.05; Table S16).

Fig. 5
figure 5

ICU survival curves for the prone responder (blue) versus non-responder (red) versus patients who received no prone position (grey). Responsiveness to prone position with responders defined as maintenance of a mean PaO2/FiO2 > 20 kPa over 7 days after the first prone episode

Clinical implementation gap in proning interventions

The application of prone position occurred in 49.5% of patients and was applied on day 2 (1–5) and lasted 2 (1–4) days. Prone position was applied earlier in patients with greater severity on admission [mild: 4 (2–8) days; moderate 4 (2–7) days; severe: 2(1–4) days after onset of IMV; P < 0.001]. While patients that did not undergo prone position may overall have had a milder disease, we found that 76% of patients who had moderate hypoxaemia and 46% who had severe, at any stage of admission, did not undergo prone position at all. We measured the opportunity to apply prone position when there was a PaO2/FiO2 < 20 kPa, with an FiO2 ≥ 0.6, and a PEEP ≥ 5cmH2O, as per the PROSEVA study 20. In patients who received no prone positioning, there was 1 (IQR 0–2) prone opportunity per patient ignored during the first 48 h and 3 (IQR 1–10) during the whole patient journey. In patients who received prone interventions, there were on average 3 (IQR 1–6) prone opportunities per patient before prone initiation that were missed. There was no difference in the number of prone sessions between survivors and non-survivors, however, patients who died without receiving prone position had a greater number of missed prone opportunities [7 (3–15) versus 2(0–6); P < 0.001; Table S7]. Patients admitted before the peak of the surge had a lower application of prone position, a greater duration of IMV prior to application of first prone position and a tendency towards having more missed prone opportunities.

Discussion

We describe the natural longitudinal history of critically ill COVID-19 patients undergoing invasive mechanical ventilation (IMV). Mortality was 43.3%, consistent with described mortality from IMV from the UK intensive care national audit and research centre (ICNARC; 47.7%) [21]. The median PaO2/FiO2 in non-survivors on the day of death was 12.3 (8.9–18.4)kPa suggesting many patients died with refractory hypoxaemia. This was associated with variable application of, and non-responsiveness to, ARDS interventions such as high PEEP and prone position. Evidence-based ARDS measures were imperfectly implemented, with inaccurate tidal volume calculations and missed prone positioning opportunities noted. Non-pulmonary clinical factors were associated with a lack of response to prone positioning, suggesting a role for a wider diagnostic assessment. Our machine learning models highlight the importance of including longitudinal week-long data to more accurately assess mortality prognostication. We then used Explainable AI to look under the hood of the machine learning models by computing SHapley Additive exPlanations for all clinical parameters for each day of ICU stay. This approach suggests that the focus of attention should shift over the course of the first week after admission to specific clinical parameters (such as PaO2/FiO2) which increase in predictive importance.

Progression of COVID-19 respiratory failure in ICU

Trajectory in terms of severity of ARDS and oxygenation is not only dependent on cardiopulmonary factors (e.g. pulmonary consolidation, thrombosis, fibrosis and right ventricular compromise) but also responsiveness to interventions (many of which aim to reduce ventilator-induced lung injury). The longitudinal natural history shows key modalities associated with pulmonary dysfunction, i.e. PaO2/FiO2, VR and peak pressure, had the highest importance in predicting mortality across the entire first week. Patients with progressive hypoxaemia over the first week suffered a mortality of 59.4% versus 16.3% in those that resolved hypoxaemia. Over 75% of patients remained in either static or worse hypoxaemic categories, despite an increased application of adjunctive ARDS interventions suggesting that many patients were refractory to traditional ARDS interventions, ultimately dying with refractory hypoxaemia. We hypothesise that this progressive gas exchange failure (hypoxaemia and hypercarbia) observed in  COVID-19 may be due to the immunothrombotic nature of the disease pathophysiology, with increasing clot burden to the lung and subsequent right heart dysfunction in patients that progress and show reduced responses. We recently showed in COVID-19, that right ventricular fractional area change (FAC) and ventricular-pulmonary artery coupling (as measured by FAC:Right ventricular systolic pressure (RVSP) ratio) correlated significantly not only with troponin, BNP and pulmonary vascular resistance but also with measures of ventilation (namely PEEP and PaO2/FiO2) and a liver marker of congestion (ALT) [22]. A further determinant of hypoxaemia trajectory (in addition to underlying disease processes) could be the evolving interplay between timing, application and responsiveness of ARDS interventions which protect the lung from ventilator-induced lung injury (VILI) [23].

Responsiveness to ARDS interventions in COVID-19

We identified four key points that challenged our assumptions and inform management about COVID-ARDS during the pandemic: (1) changes in PEEP did not equate to improvements in PaO2/FiO2, suggesting other approaches to PEEP titration are needed, e.g. electrical impedence tomography [24] or recruitment/inflation index [25]; (2) over half of the patients who underwent prone positioning did not maintain a sustained response in PaO2/FiO2 over the following week. Multivariate analysis showed that patients with a higher peri-pronation lactate, PaCO2, peak pressure and worse cardiovascular and respiratory SOFA had a worse oxygenation response to prone position; (3) responsiveness to prone positioning decayed with a longer duration of IMV prior to the first prone position intervention. Solutions could include the earlier implementation of prone position either, immediately after or even prior to intubation for less severe states.; (4) half of patients with severe ARDS did not have prone interventions applied at all and those which did had a significant number of missed opportunities when it could have been applied earlier. Those that resolved hypoxaemia in the first week underwent prone position on average 2 days earlier than those that did not and, prone non-responders had many opportunities to receive prone positioning earlier in their course.

Opportunities for rapid improvements in mortality

This evaluation aims to recognise this gap in the implementation of the current ARDS evidence base and enable real-time feedback during a pandemic. The mortality for patients receiving IMV in our cohort is considerably higher than reports from other countries [2,3,4] and these outcomes may reflect existing clinician prognostication biases rather than prognostic characteristics in the "natural course" of  COVID-19 ARDS. We show increased mortality in patients admitted during the peak of the surge with many patients dying with (and possibly as a result of) severe hypoxaemia refractory to many interventions. Other reports with lower mortality also show a greater application of ARDS interventions e.g. prone position (70% [2] and 76% [3] versus 50% in our study), and earlier in the disease process. Whilst prone positioning improves oxygenation, there are conflicting reports as to whether this physiological response equates to improved mortality [26, 27]. It is important to consider if the excess mortality in our cohort is secondary to worse VILI (as a result of non-application of therapies in overwhelmed heath systems during the pandemic) which has been shown to have a causal association to mortality [28]. Patients in our cohort that showed improvements in oxygenation with prone position also showed improvements in PaCO2, OI, VR, and lower peak pressures. Oxygenation should not be used as a standalone measure of response to proning as survival benefit is likely a non-linear interaction between improved ventilation/perfusion matching, more homogenous distribution of lung stress and lung strain with lower VILI, and reduced loading and strain of the right ventricle [29]. We chose PaO2/FiO2 as it is the main criterion for starting, terminating, and assessing response to ARDS interventions [30]. Additionally, it also shows a strong correlation to OI (Fig. S9) which has been suggested over the first 7 days to predict the failure of interventions in clinical trials [31]. Most patients received lung-protective ventilation with tidal volumes less than 8 mL/kg and plateau pressures less than 30 cmH2O. This was despite systematic errors in measurement of height and derived ideal body weight. However, PEEP was set higher than the low PEEP ARDSNet table and conversely lower than advised by the high PEEP ARDSNet table.

Mortality assessment across the pandemic

While many reasons may exist for differences in mortality between countries (e.g. illness severity, healthcare dilution from large numbers etc.), our data show ICU mortality being highest during the busiest period for admissions around the peak of the surge (second and third quartiles between 27th March and 9th April 2020). Hence, a poor implementation of proven evidence-based ARDS interventions during pandemic surge may have contributed to this higher mortality (e.g. systems-related or lack of clinical awareness, judgement that hypoxaemia is not severe enough or cardiovascular instability). In a pandemic, there may have been other workload pressures including inadequate staffing and training where the healthcare system is overwhelmed. Prior to COVID-19, the most common explanation for why prone position was not applied was oxygenation not being sufficiently impaired with application reserved for rescue therapy for severe hypoxemia [32]. We examined national data for the period analysed in this study and found that approximately 20% of IMV patients with COVID-19 (1596 of 7874 patients) were referred for advice to the NHS England severe acute respiratory failure ECMO service with approximately only 4% (306 patients) fulfilling new acceptance criteria [33] for ECMO in the United Kingdom (data from NHS England commissioned service).

Strengths and limitations

A key strength of this study was to take a longitudinal view and evaluate trajectories based on granular clinical ICU data. These insights enabled by our use of ML and XAI methods complementing standard techniques, helped us separate human bias (from pre-COVID ARDS) from objective, data-derived analysis on actual COVID-ARDS. There are limitations of this service evaluation, not least its observational, retrospective nature. While predictive models traditionally use a derivation/validation model, this is not applicable in the setting of an evolving pandemic and hence, we used within cohort separation (70/30) as discussed in the methods. Furthermore, the impact of overwhelmed healthcare systems during surge on variables and progression remains uncertain. We chose to focus on patients undergoing invasive mechanical ventilation as this remains a key defining criteria for admission to ICU as well as active treatment [34, 35]. We opted for a twice-daily collection of data in contrast to a worst daily value, to appreciate the overall progression of disease and impact of complex interventions, but also achieve a pragmatic balance with ease of data collection for sites. In view of this, an important caveat to our analysis of prone position is that we were unable to accurately collect the duration of each prone position event, which also determines responsiveness [19]. We took a pragmatic approach to define responsiveness to prone position using PaO2/FiO2 as a surrogate. This has limitations but is clinically relevant for implementation and termination of prone position [30]. However, other factors may influence PaO2/FiO2 further along the ICU stay, such as superinfection (e.g. antibiotic usage in non-responders was 79.3% versus 52.9% in responders, although this was non-significant). With respect to missing data, this study evaluated routine clinical care, and hence, “missing” data is predominantly through variations in care, for instance, a site not performing the test in the first place. Finally, the collection of certain parameters may not be physiologically “pure” measurements e.g. dynamic compliance is that shown on the ventilator and not calculated [VT/(End-inspiratory plateau pressure—PEEP)].

Implications for clinical service and future research

We show in a cohort of mechanically ventilated patients with COVID-19, that a trajectory of worsening respiratory failure as a result of disease factors and a lack of responsiveness, inappropriate timing and non-implementation of ARDS interventions, is associated with worse outcome. Our XAI analyses of longitudinal disease trajectory emphasises the importance of gas exchange, respiratory mechanics, inflammation, thrombosis, haemodynamic/cardiac dysfunction (particularly reflected by the cardiovascular and coagulation SOFA components) in predicting disease progression. In those that had prone positioning applied, less than half maintained a PaO2/FiO2 above 20 kPa after application, and crucially, its effectiveness decayed the later it was applied. This potential lack or decay in responsiveness contrasts traditional ARDS interventions and prompt the development of studies to develop a COVID-19 specific evidence-base. While this evidence-base evolves, management may benefit from serial re-evaluation of actual disease trajectory and prognostic models due to (1) the application and impact of novel interventions (e.g. dexamethasone and tocilizumab) and (2) variations in clinical practice that may influence the implementation gap, for instance, better-prepared processes to prevent healthcare services being overwhelmed. Our data-driven approach demonstrates how a form of "standing" multi-centre service evaluation could help monitor and directly inform better clinical practice and future research during the pandemic.