Background

Acute respiratory distress syndrome (ARDS) is a heterogeneous condition characterized by respiratory failure and diffuse pulmonary noncardiogenic edema [1]. ARDS is associated with poor short- and long-term outcomes including high mortality rates, disability, and poor quality of life indicators. Several trials have failed to identify treatments for ARDS, and only corticosteroids [2], prone position [3], and limiting the damage caused by mechanical ventilation [4, 5] have shown improved outcomes.

The COVID-19 pandemic hit Europe in late winter/early spring 2020, probably resulting in the highest number of ARDS patients ever treated concomitantly. COVID-19 has increased the knowledge about the physiopathology and treatment of ARDS, highlighting the importance of anti-inflammatory treatment over antivirals [6,7,8,9], thromboembolic events [10], and endothelial damage [11].

Sub-phenotypes of ARDS have been described as hypoinflammatory and hyperinflammatory, using clinical data and plasma inflammatory biomarkers in latent class analysis [12]. These sub-phenotypes have been associated with different outcomes, and reanalyses of several clinical trials found that the hyperinflammatory sub-phenotype may have better outcomes when receiving simvastatin [13], or high-positive end-expiratory pressure (PEEP) levels [12]. In addition, these sub-phenotypes responded differently to fluid management [14]. Models for a parsimonious identification of sub-phenotypes using a few variables and machine-learning strategies are being developed. However, during the COVID-19 pandemic, fewer patients could be classified into the hyperinflammatory sub-phenotype [15]. A possible limitation of clustering is that the correct identification of sub-phenotypes is needed, and only one cross-sectional evaluation may underestimate the impact of the changes or evolutions of markers in the first days of admission. Bos et al. failed to identify sub-phenotypes in patients with COVID-19 by cross-sectional analysis with data from the respiratory system but two sub-phenotypes were identified through longitudinal analysis [16]. Few data evaluating clustering using inflammation data and common variables measuring organ failure are available [17, 18].

We hypothesized that patients with COVID-19-associated ARDS may be described by more than one discrete cluster and the clustering process and its result may change during the first days of their admission, which may have an impact on their prognosis. To evaluate this, we aimed to study a cohort of patients and analyze the clustering of patients at baseline and day 3 using readily available data measuring organ failure, respiratory variables, and inflammation.

Methods

Study design

CIBERESUCICOVID is a multicenter, observational, prospective, retrospective cohort study that enrolled patients with COVID-19 admitted to Spanish intensive care units (ICUs) [19]. The study was approved by the institution’s Internal Review Board (Comité Ètic d’Investigació Clínica, registry number HCB/2020/0370), and written informed consent was obtained from either the patients or their relatives. Local researchers were contacted by a member of the study team, and the participating hospitals obtained approval from the local ethics committee. De-identified patient data were collected and stored via the REDCap electronic data capture tool [20, 21] hosted at the Centro de Investigación Biomédica en Red (CIBER), Spain. Trained local researchers incorporated data from the patient’s medical records into a separate database. Before statistical analyses, the data were checked by three independent experienced data collectors trained in critical care, and site investigators were contacted for any queries. Missing analyses were performed, and site investigators were contacted to obtain as much reliable and complete data as possible. Results were reported following the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines [22].

Study population and data collection

All consecutive patients admitted to the ICU at a participating center from February 25, 2020, to September 30, 2021, were enrolled if they fulfilled the following criteria: ≥ 18 years of age, admission to the ICU, and laboratory-confirmed SARS-CoV-2 infection. For this study, we selected those patients with a diagnosis of ARDS based on the Berlin definition [23] who needed invasive mechanical ventilation during the first day of admission to the ICU and who remained ventilated at least 3 days later. Patients were excluded if they had a non-confirmed SARS-CoV-2 infection, had no data at baseline, or were admitted to the ICU for other reasons.

After enrollment, prior epidemiological data, including demographics, comorbidities, clinical symptoms, disease chronology, and treatment administered upon hospital admission, were collected. The site researchers subsequently collected data acquired at hospital admission, ICU admission, start of mechanical ventilation (MV), 72–96 h after ICU admission, weaning, ICU discharge, and hospital discharge, including vital signs, respiratory support devices (i.e., oxygen mask, high-flow nasal cannula, and noninvasive and invasive mechanical ventilation), use of adjunctive therapies (i.e., neuromuscular blockade, prone position, and recruitment maneuvers), laboratory findings, arterial blood gases, and mechanical ventilation settings, if appropriate. Hemodynamic parameters and organ dysfunction were assessed using the Sequential Organ Assessment Failure Score (SOFA) [24] at ICU admission. Pharmacological treatments administered upon and during ICU admission until discharge from the ICU or hospital or death were also collected.

Specific data regarding MV since the start of invasive mechanical ventilation and on day three were analyzed. MV parameters related to ventilation-induced lung injury (VILI), such as tidal volume, respiratory rate, end-inspiratory plateau, peak inspiratory pressures, positive end-expiratory pressure (PEEP), driving pressure, and static compliance of the respiratory system (Crs) were collected. Oxygenation impairment was analyzed using the PaO2/FiO2 ratio, and abnormalities in CO2 metabolism were studied using the ventilatory ratio (VR), a surrogate parameter of Vd/Vt. The worst event values were recorded preferentially. We did not analyze data from inflammatory cytokines.

Definitions

The diagnosis of ARDS was based on the Berlin definition [23]. Tidal volume was reported in mL/kg of predicted body weight (PBW). The driving pressure was defined as plateau pressure minus PEEP. Crs was calculated as tidal volume/plateau pressure—PEEP. The ventilatory ratio was defined as follows: (minute ventilation × PaCO2)/(PBW × 100 × 37.5).

Outcome

The primary outcome was the 90-day mortality.

Statistical analysis

To derive the clusters, we first evaluated the distributions, missingness, and correlation of the candidate variables (Additional file 1: Fig. S1). Multiple imputation with chained equations was used to account for missing data, the results from the first dataset are presented, and the additional datasets were evaluated for consistency. After evaluating correlation, we excluded highly correlated variables (|rho|> 0.5) and those variables with more than 60% of missing values. Ordering Points To Identify the Clustering Structure plots were used to determine the optimal clustering strategy [25]. Based on these plots, we applied the k-prototypes clustering to 44 variables (age, gender, height, weight, chronic heart failure, diabetes mellitus, obesity, hypertension, hematologic disease, cancer, immunosuppression, systolic blood pressure, mean blood pressure, shock, temperature, heart rate, respiratory rate, pH, Hco3, hemoglobin, white blood counts, lymphocytes, platelet, D-dimer, prothrombin time, C-reactive protein, bilirubin, lactate, LDH, sodium, potassium, creatinine, procalcitonin, albumin, ventilatory ratio, tidal volume/ predicted body weight, driving pressure, Pao2/FiO2, PEEP, peak pressure, neuromuscular blocking agents use, prone position, ECMO) collected at baseline and replicated with data collected at day 3 (33 of 44 variables changed their values according to variable evolution) using a partitioning approach. The optimal number of clusters was determined based on the Silhouette index [26, 27]. Once the optimal number of clusters was determined, patterns of clinical variables were visualized in two ways: (1) chord plots (showing how clusters differ based on categorical variables) and (2) standardized values of each continuous variable by cluster.

To understand the implications of clusters and changes of clusters on 90-day mortality, Kaplan–Meier survival curves and hazard ratios adjusted by SOFA score were estimated, with 95% confidence intervals (CI).

In order to assess whether the clusters were explained by well-established measures of disease severity, we fitted logistic regression models including the SOFA score and/or the most influential variables in clustering as independent variables, and the C-indexes were obtained.

Data were presented as means and standard deviations (SD) or medians and percentiles of 25% and 75%. For comparisons between clusters, we used t tests and the Mann–Whitney U test for continuous data and the χ2 test for categorical data. The statistical significance was set at 0.05 for two-sided tests (without adjustment for multiple comparisons; therefore, statistically significant differences between clusters should be considered exploratory). Analysis was performed with the R statistical package version 4.3.1 (R Foundation for Statistical Computing).

Results

Out of 6205 eligible patients, 3743 (60%) patients were included in the study (Fig. 1). Following the Silhouette statistic (Additional file 1: Fig. S3), patients were grouped into two clusters using the k-prototypes algorithm. At baseline, 1402 (37%) patients were included in cluster 1 and 2341 (63%) in cluster 2. The baseline characteristics of the patients are presented in Table 1. The patients included in cluster 2 were older, had more hypertension, shock, inflammation measured by CRP, and better respiratory indexes (including the requirement of the prone position, NMBA requirement, oxygenation by Pao2/Fio2, ventilatory ratio, and driving pressure) (Table 1). On day 3, 3643 patients were analyzed, 1557 (42%) patients were included in cluster 1 and 2086 (57%) in cluster 2. One hundred patients (3%) died or were discharged before day 3. The weight of each continuous variable in the clusters is shown in Fig. 2 as standardized variable values and counts for categorical variables. In the clustering process between baseline and day 3, several variables changed their influence with an evolution of the values for each variable, but at both moments the clinical characteristics of the clustering were similar.

Fig. 1
figure 1

Flowchart of patient screening and enrollment

Table 1 Demographic and clinical characteristics of patients by clusters at baseline and day 3
Fig. 2
figure 2

Comparison of variables that contribute to clusters. A Standardized values of each continuous variable by cluster at baseline. B Chord plots (showing how clusters differ based on categorical variables) at baseline. C Standardized values of each continuous variable by cluster at day 3. D Chord plots (showing how clusters differ based on categorical variables) at day 3. CRP C-reactive protein; FiO2 fraction of inspired oxygen; IL Interleukin; MV mechanical ventilation; NT-proBNP N-terminal pro-brain natriuretic peptide; PaCO2 arterial partial pressure of carbon dioxide; PaO2, partial pressure of arterial oxygen; PBW, predicted body weight; SOFA: sequential organ failure assessment score. aDefined as plateau pressure—PEEP. bDefined as tidal volume/ (Plateau pressure − PEEP). cDefined as (minute ventilation × PaCO2)/ (PBW × 100 × 37.5)

The 90-day mortality for patients included in cluster 2 at baseline was 43.8% (n = 1025) versus 27.3% in cluster one (n = 383) (adjusted by SOFA score hazard ratio [aHR] 1.44, 95% confidence interval [95% CI] 1.21 to 1.70). The 90-day mortality was 49% (n = 1023) for patients included in cluster 2 on day 3 versus 20.6% (n = 321) for patients in cluster 1 (aHR 2.18, 95% CI 1.80–2.63). Unadjusted Kaplan–Meier survival curves are shown in Fig. 3.

Fig. 3
figure 3

Unadjusted survival curves for clusters

Four hundred and fifty-eight (33%) patients clustered in the first group were clustered in the second group on day 3. In contrast, 638 (27%) patients clustered in the second group, were clustered in the first group on day 3. Lower survival probability at 90 days was observed in those patients grouped in cluster 2 at both time points compared to those who were grouped at cluster 1 at both time points (aHR 2.45, 95%CI 1.89–3.19), followed by those who changed from cluster 1 to cluster 2 over time (aHR 2.11, 95% CI 1.56–2.87). Those who changed from cluster 2 to cluster 1 overtime had similar survival probability (aHR 1.18, 95% CI 0.85–1.62) to those patients grouped at cluster 1 at both moments (Table 2, Figs. 3 and 4).

Table 2 Unadjusted hazard ratio (95% CI) for 90-day survival probability by clusters
Fig. 4
figure 4

Sankey plot for ARDS population

We conducted an evaluation to determine if we could cluster patients using parsimonious variables. To do this, we assessed the ability to group patients using C-index for SOFA score at baseline and on the third day. The results showed that the C-index was 0.74 (95% CI 0.72–0.76) for baseline and 0.79 (95% CI 0.77–0.81) for the third day. We further analyzed the results by selecting the four most differential variables of each clustering process (age, glucose, sodium bicarbonate and pH at baseline and ventilator ratio, peak inspiratory pressure, PaO2/FiO2, and pH on day 3). The C-index was 0.74 (95% CI 0.72–0.76) for baseline and 0.82 (95% CI 0.80–0.85) for the third day.

Discussion

In our study, we found that patients with COVID-19-related ARDS could be clustered into two groups by k-prototypes at baseline and on day 3. Patients with more severe organ dysfunction, older age, worse oxygenation, respiratory mechanics, higher ventilatory ratio, pH, bicarbonate, and inflammatory markers were clustered in cluster 2 and had the highest mortality at both times. It is noteworthy that patients may change the cluster between the analyses at baseline or during day 3, improving or worsening their prognosis.

Strategies based on personalized medicine involve the identification of groups or clusters of patients with similar clinical and biological characteristics in order to tailor treatment to populations where it is most likely to be beneficial [28, 29]. This is usually not easy, as some conditions common to critically ill patients, such as sepsis or ARDS, are heterogeneous, share common causes and inflammatory pathways, and may differ in other ways. Clustering based on common clinical and biological characteristics has been proposed as a strategy to identify homogeneous populations to which treatments can be applied [30,31,32]. Post hoc analysis of negative clinical trials has shown that the same treatments may be beneficial in certain clusters. The clustering of ARDS patients was developed by cross-sectional analysis in the first hours of development; however, clinical characteristics and biological responses are dynamic and may change. In sepsis, a hyperinflammatory phase followed by a regulatory hypoinflammatory response has been described [33]. In pneumonia, markers of hyperinflammation may be followed by signs of immunosuppression, such as lymphocytopenia, with a more dysregulated response for both phases being associated with the worst prognosis [34, 35].

Typically, clinical trials aim to enroll patients within the first 24–48 h of admission. However, there may be a delay before treatment is administered during which the sub-phenotype classification of patients may change. In real-life situations, this delay may be even longer. Therefore, it is critical to continuously evaluate readily available measures that can dynamically classify patients and account for changes in sub-phenotypes. This is necessary because it affects prognosis. A previous study showed that sub-phenotypes analyzed by latent class analysis showed stability from baseline to day 3 and 94% remained in the same class [36]. Bos et al. did discover two trajectories related to ventilatory ratio and mechanical power that could potentially predict the duration of mechanical ventilation and the risk of death [16]. More recently, Chen and colleagues [37] described 3 longitudinal phenotypes with several changes between phenotypes during the first 4 days. Lu et al. analyzed the incidence of sub-phenotypes and their trajectories during 10 days according to respiratory support [38]. Sinha et al. observed that transcriptional profiling of the phenotypes reveals divergent biological signatures and changes in gene expression over time [39]. These results are in line with our findings and emphasize the need to classify patients just before planning an intervention taking into account the changes in clinical and analytical variables. According to our results, the probability of survival at 90 days is better predicted by clustering at day 3 (Fig. 3b, c), and the differences between clusters are also greater at day 3, but as expected with lower values compared to baseline (Table 1). Whether prognostic enrichment could be modified over time is also an important point to resolve.

Previous clustering processes were based on the inflammatory status, a fundamental key in the pathogenesis of ARDS. However, it is not well known how much inflammation is enough to damage the lungs. Whether or not a cytokine storm is associated with critical COVID-19 has been discussed in relation to anti-inflammatory treatment. It is recognized that in critically COVID-19 there is less inflammation than other diseases [40], although the impact on organ failure is large. It seems that the clustering process in our study is mostly influenced by organ failure, as suggested by the individual SOFA score components (as shown in Fig. 2). This is significant because the SOFA score is commonly used and can accurately predict how the subjects are assigned to clusters in our cohort. Changes in SOFA scores have been used to measure the outcome of several studies and as an indicator of poor prognosis. However, the prognosis of clustering in our cohort is independent of the SOFA score, as demonstrated by the adjusted HR.

We decided to use k-prototypes, an unsupervised machine-learning method for clustering. This method has shown good performance in identifying clusters for heterogeneous data [41] and allowed us to include continuous and categorical variables that we considered clinically relevant. Our results are similar to the clusters developed by latent class analysis in the order of inflammation, severity, respiratory parameters, and prognosis, and we found a significant difference in mortality between clusters at both time points evaluated.

It is important to acknowledge both the strengths and limitations of our study. On the positive side, we were able to collect data from a large and well-defined group of patients. However, it is important to note that our study focused only on patients with COVID-19-associated ARDS, and some results may be specific to COVID-19 rather than ARDS. However, most ARDS studies show pneumonia as the primary cause, which may help to mitigate this limitation. Other limitations are that we did not collect a complete biological, physiological or radiological assessment with cytokines, esophageal pressure or tomographic measurements for all patients. However, we were able to include several standard analytical variables and found clusters with similar characteristics as previously described. Furthermore, we included several variables that could vary according to the management of each center and also used all data as a derivation cohort and did not validate our results in another cohort. Finally, we found that the clustering process was different at baseline and day 3, with several variables changing their weight and the differences between clusters becoming more pronounced at day 3.

Conclusion

During the first few days, patients can be clustered into two groups and the process of clustering patients may change as they continue to evolve. This means that despite a vast majority of patients remaining in the same cluster a minority reaching 33% of patients analyzed may be re-categorized into different clusters based on their progress. Such changes can significantly impact their prognosis. Such changes can significantly impact their prognosis, either positively or negatively. To ensure better classification, the clustering process must take into account the evolving condition of patients.