Introduction

Since the outbreak of COVID-19 disease began in December 2019 in China, soaring cases of confirmed SARS-CoV-2 are pummeling the global health system. More than 91 million people have developed SARS-CoV-2 infection, and more than 2 million have died [1]. Critical illness from COVID-19 has constrained intensive care unit (ICU) material and human resources [2]. As of January 18, 2021 more than 2.5 million people in Spain have been infected with SARS-CoV-2 and more than 53,000 have died [3]. Short-term mortality reported rate ranges from 16 to 62% of patients admitted to ICU [4,5,6,7,8]. The heterogeneity of patients that have been treated in China [4], Italy [8], USA [5,6,7] or Spain [9, 10] may explain the wide variation of mortality rate due to their population characteristics, presence of comorbidities and healthcare systems. A recent international survey [11] reported significant practice variations in the management of severe COVID-19 patients, including differences at the regional, hospital, and patient level. Therefore, it is necessary to characterize phenotypes, by extending the enrolment of patients outside of one ICU site to multiple patients being treated in different hospitals. Allowing to adequately measure mortality-related factors adjusted by the inter-hospital variation to determine clinical outcomes.

Risk factors represent the most important approach when defining treatment of hospitalized patients as these measures can inform clinical courses most likely for a patient given their a priori risk. However, risk factors can also interplay differently when they are included in different patient clusters. A single model based on general risk factors (one-size-fits-all) might be limited for clinical data interpretation and application across sites. Different combinations of risk factors may naturally cluster into previously undescribed subsets or phenotypes that may have different risks for a high mortality rate and that may therefore help to determine the response to treatments in COVID-19. We hypothesize that the presence of well-defined phenotypes in COVID-19 could help to more appropriately identify patients at risk of ICU mortality than general models for the entire population considering that this disease results in a constellation of symptoms, laboratory derangement, immune dysregulation, and clinical complications.

The primary objective was to determine the presence of distinct clinical phenotypes using unsupervised clustering methods that were applied to the datasets available on ICU admission. The second objective was to assess which factors are independently associated with ICU mortality. The added value of this large-scale multicenter prospective study lies to discover phenotypes based on clinical data available at ICU admission that can help explain the variation in clinical results of COVID-19 disease in the ICU.

Material and methods

Study design

A multicenter observational, prospective cohort study that consisted of a large-scale data source of hospital ICU admissions and patient-level clinical data. The enrolment criteria included adult’ patients with laboratory confirmed SARS-CoV-2 infection admitted in 63 ICUs across Spain due to acute respiratory failure between February 22, 2020 and May 11, 2020. The study was approved by the reference institutional review board at Joan XXIII University Hospital (IRB# CEIM/066/2020) and each participating site with a waiver of informed consent. All data values were anonymized prior to the phenotyping which consisted of clustering clinical variables on their association with COVID-19 mortality.

Study sites and patients population

The study enrolled consecutive adult patients (> 16 years) with laboratory confirmed SARS-CoV-2 infection, detected by RT-PCR positive test of nasopharyngeal, oropharyngeal swab or invasive respiratory samples according to the WHO recommendations [12]. The follow-up of patients was scheduled until August 11, 2020, which confirmed ICU discharge or death whichever occurred first. A complete list of participating ICUs and their investigators is provided in the acknowledgements section. In this cohort, 43 patients were described in a preliminary report of a single–center case series in Tarragona, Spain [9].

Outcomes

The primary outcome included all-causes of ICU mortality. Patients who were discharged alive from ICU were evaluated in the data as alive considering mortality was defined as any in-ICU death. All complications and outcomes were followed during ICU admission.

Data collection

Data was obtained from a voluntary registry created by Spanish Society of Intensive Care Medicine-SEMICYUC. All consecutive cases admitted to the ICU were collected. There were no patients excluded from the analysis that was enrolled to participating ICU and met criteria.

All the collected variables recorded at ICU admission are listed in the Additional file 1: p. 6. To determine severity of illness, the Acute Physiology and Chronic Health Evaluation (APACHE) II score [13] and Sequential Organ Failure Assessment (SOFA) scoring [14] were calculated for all patients within the first 24 h of ICU admission.

The ICU admission criteria, use of antiviral, antibiotic or co-adjuvant treatment, and also the measures that would determine the need to intubate and type of ventilator support required (oxygenation, high flow nasal cannula [HFNC], noninvasive [NIV] or invasive [IMV] mechanical ventilation) were not standardized between centers and were left to the discretion of the attending physician, according to SEMICYUC and National Ministry of Health [15] and were included in the case report form and confirmed by the medical records. We also collected hospital-level data including city, county and number of hospital beds available. The study definitions used in the present analysis are shown in the Additional file 1: p. 2.

Statistical analysis

No statistical sample size calculation was performed a priori, and sample size was equal to the number of patients admitted to the participant’s ICUs with confirmed COVID-19 during the study period. To describe baseline characteristics, the continuous variables were expressed as median (interquartile range [IQR]) and categorical variables as number of cases (percentage). For patient demographics and clinical characteristics, differences between groups were assessed using the chi-squared test and Fisher’s exact test for categorical variables, and the Mann–Whitney U or Wilcoxon test for continuous variables. To performed the analysis, we first assessed the candidate variables, missing values, and correlation. Multiple imputation was used to account for missing data (Additional file 1: p. 2). After evaluating correlation, highly correlated variables were excluded (Additional file 1: p. 5).

An overview of the primary analysis plan is outlined in Fig. 1. In a first step, a multilevel conditional logistic modelling and the intraclass correlation coefficient (ICC) was calculated (Additional file 1: p. 2) with patients nested in hospital to characterize hospital-level variation of ICU mortality and determine if a significant inter-hospital variation is present.

Fig. 1
figure 1

Overview of the primary analysis plan. ICU Intensive care units, PAM partition around medoids clustering analysis, GLM Generalized Linear model

In a second step, to determine presence of distinct clinical phenotypes in our population of COVID-19 patients, an unsupervised clustering analysis was applied to the database at ICU admission. In order to carry out this analysis, a discretization of the numerical variables into categorical ones was done using “ChiMerger” packages for R software. The information provided by each variable regarding ICU mortality was defined using the Information Value (IV). A IV greater than 0.03 was considered clinically important and this variable was included in the multivariate logistic regression analysis. Model performance was examined using accuracy test, Sensitivity, Specificity and AUC modeling. Subsequently, the unsupervised cluster analysis was performed using the important variables. The Podani distance was used to calculate the distance between patients and the “partition around medoids” (PAM) algorithm to perform the clustering [16]. The optimal number of clusters were determined after studying the silhouette [17] and the PAM objective for different numbers of clusters (Additional file 1: p. 12). Each of these clusters represent a specific patient’s phenotype. To visualize the clusters in a lower dimensional space, we used a Principal Component Analysis (PCA). We obtain important variables according to IV for each phenotype, and the OR of these variables were obtained after applying a GLM (Generalize linear Regression model) analysis. GLM is how statistical software R performs multiple logistic regression analysis when the command family = “binomial” is indicated. Multinomial regression models were fit to further compare patient comorbidities across phenotype classification. Model performance in each phenotype was examined using accuracy test, Sensibility, Specificity and AUC.

Lastly, a traditional multivariate analysis GLM was performed to investigate the association between baseline (on ICU admission) variables and ICU-mortality. The GLM model comprised factors of clinical interest and all significant covariates (p < 0.05) in the univariate analysis of ICU mortality and presence of collinearity was studied by variance inflation factors (VIF). The results are presented as odds ratios (OR) and 95% confidence intervals (CI). To determine our model, we checked adequate model performance between groups with a cross validation model (K-fold = 10) and the model with better performance was chosen.

For all model validation, database was randomly split into two subsets: (a) a “training set” (80%), and (b) a “validation set” (20%). Model performance was examined using accuracy test, precision, sensitivity, specificity and area under ROC curve (AUC). Data analysis was performed using R software (cran.r-project.org).

Results

Patients characteristics at ICU admission

From February 29, 2020 to June 11, 2020 a total of 2,022 critically ill patients from 63 ICUs were enrolled in the present analysis. Forty percent of ICUs belonged to hospitals with more than 500 beds, 40% to hospitals between 200 and 500 beds and the remaining 20 percent to hospitals with fewer than 200 beds. To determine if a significant inter-hospital variation is present, multilevel conditional logistic modelling with patients nested in hospital to characterize hospital-level variation of ICU mortality was done. According to intraclass correlation coefficient (ICC) obtained 0.04 when considering all hospital (n = 63) and of 0.04 when excluded hospitals that submitted data on few than 10 patients, no significant inter-hospital variation was observed (Additional file 1: e-Fig. 2, p. 7).

Fig. 2
figure 2

Phenotype clinical characterization (APACHE II Acute Physiology and Chronic Health Evaluation II, SOFA Sequential Organ Failure Assessment, LDH D-Lactate dehydrogenase, U/L, AKI Acute Kidney injury)

The median (IRQ) age was 64 (55–71) years, and 1,423 (70.3%) were men. The median of time between the onset of symptoms and diagnosis was 7 (4–9) days. A total of 1,467 (72.6%) patients had at least one coexisting comorbidity. Arterial hypertension (936 [46.7%]), obesity (655 [32.4%]) and diabetes mellitus (420 [20.7%]) were the most frequently comorbid conditions reported. The severity of illness was high according to the APACHE II (14; IQR 11–18) and SOFA (5.7; IQR 4–7.3) scores. The PaO2/FiO2 ratio on the day of ICU admission was 132(IQR 96–163) and 1131 (55·9%) patients meet criteria of severe and 198(9·8%) of mild acute distress respiratory syndrome (ADRS) profile at ICU admission, 1174 (58.0%) patients required invasive mechanical ventilation (MV), 906 (44.8%) developed shock and 580 (28.7%) meet criteria of acute kidney injury (AKI). The overall ICU length of stay (LOS) was 14 (8–26) days and similar for survivors (14 [8–27]) days and non-survivors (14 [8–24]) days (p = 0.10).

The most frequent prescribed co-adjuvant treatments for COVID-19-related infection were hydroxychloroquine (1866 [92.3%]) and lopinavir/ritonavir (1662 [82.2%]). Empiric antibiotic treatment was administered in 1818 (89.9%) of the patients and intravenous corticosteroids in 1174 (58.6%). Further clinical characteristics of patients and laboratory finding are shown in Additional file 1: e-Table 3, p. 8.

ICU mortality

Overall, 660 patients (32.6%) died. The crude ICU mortality increased significantly with the increase in predefined age cut-off and was greater than 80% in patients over 80 years old (Additional file 1: p. 7). Age, male sex, severity of illness (APACHE II and SOFA), presence of arterial hypertension, diabetes, coronary arterial disease, chronic obstructive pulmonary disease (COPD), chronic kidney disease (CKD), immunosuppression and hematologic disease markers were significantly higher in ‘non-survivor’ patients. Non- survivor patients compared with those that survived had higher levels of D-Lactate dehydrogenase (LDH), white blood cells, serum creatinine, C-reactive protein (CRP), Procalcitonin (PCT), serum lactate, serum D-dimer and serum ferritin. Non-survivor patients developed more frequent complications such as shock, kidney and myocardial dysfunction at ICU admission. High Flow Nasal Cannula (HFNC) was more frequent in survivors, while invasive mechanical ventilation (MV) was more common in non-survivors. Mortality for those who received MV during their ICU stay (n = 1554; 76.8%) was 37.3% (n = 580) higher than observed in patients who did not require MV 17.0% (80/468, p < 0.001). Complete characteristics of patients according outcome are shown in Table 1.

Table 1 Characteristics of 2022 patients according ICU outcome

Unsupervised analysis (cluster) to determine different phenotypes in critically ill patients

Once the variables were categorized, 5 patients (0.24%) were excluded for outlier’s data, and the analysis was performed with 2,017 patients. Of the 50 variables considered, only 25 were considered as predictors according to the IV (Additional file 1: p. 11) and were included in the model. Remarkably, no treatment option was a predictive factor for ICU-mortality. The categorized variables independently associated with ICU-mortality are shown in Additional file 1: p. 12. The performance of the model was adequate with an accuracy of 0.77, sensitivity of 0.88, specificity of 0.54 and AUC of 0.82. According to the Podani’s distance and the Shilouette and PAM plots (Additional file 1: p. 13) the optimal number of clusters in our dataset was 3. Cluster A included 537 patients (26.7%), cluster B included 623 (30.8%) and cluster C included 857 patients (42.5%). The clusters in a lower dimensional space are shown in the Additional file 1: p. 14. The size and characteristics of the phenotypes in the 3-class model are shown in Table 2. Patients with the cluster A phenotype (mild COVID-19 disease) had < 65 years, lower severity of illness, fewer abnormal laboratory values and less development of complications, with a crude ICU mortality of 20.3%; those with the cluster B phenotype (moderate COVID-19 disease) had similar characteristics as seen in the A phenotype but were more likely to present shock at ICU admission with a crude ICU-mortality of 25.5%. Patients with the cluster C phenotype (severe COVID-19 disease) had > 65 years, a high level of severity of illness, more likely to have elevated measures of inflammation (e.g. D dimer, LDH and ferritin), high frequency of shock, AKI and myocardial dysfunction, with a crude ICU mortality of 45.4%. The clinical characterization of each observed phenotype can be seen in Fig. 2. By including these important variables in a regression model for each cluster, we observed that the discrimination of each model was higher than general model except for C phenotype (Table 3). The Variables independently associated with mortality were different between automatic and cluster models (Table 4 and Fig. 3 A-B).

Table 2 Characteristics of 2017 critically ill patients included in machine learning analysis according to overall or cluster (phenotype) population
Table 3 Performance of global and clustering models (GLM model: generalized linear models with variables according ICU mortality table; Global model: GLM model with important variables according to information value analysis for ICU mortality; A, B and C Phenotypes: GLM models with important variables according to information value analysis for each cluster)
Table 4 Factors independently associated with ICU mortality in automatic and clustering models (automatic model: generalized linear model [GLM] with important variables according to information value analysis for ICU mortality; A, B and C Phenotypes: GLM models with important variables according to information value analysis for each cluster)
Fig. 3
figure 3

a Chord diagrams showing abnormal clinical variables by phenotype. A: mild COVID-19 disease; B: moderate COVID-19 disease and C: severe COVID-19 disease. b Chord diagrams showing abnormal clinical variables by Phenotype differentiating survivors (green) from non-survivors (red) (APACHE II Acute Physiology and Chronic Health Evaluation II, SOFA Sequential Organ Failure Assessment, PCT Procalcitonin, > 3 chest X-ray more than 3 quadrants infiltrates in the chest X-ray, Miocard Dys Myocardial dysfunction, Hydroxichloroq. Hydroxychloroquine, GAP antiviral Time in days from onset of symptoms to first dose of antiviral, DD D dimer, AKI Acute Kidney injury, LDH D-Lactate dehydrogenase, U/L, COPD Chronic Pulmonary Obstructive Disease, Pa/Fi Partial pressure arterial oxygen/fraction of inspired oxygen, Hemat. Dis Hematologic disease, GAP_UCI Time in days from Hospital to ICU admission, Coronary dis. Coronary disease)

Construction of the ICU Mortality classic multivariate model

Of the 42 variables measured at ICU admission, 25 variables that were statistically significant in the univariate analysis (Table 1) were included in the model. The initial dataset of patients was randomly split in two subgroups “Training group” with 1,618 patients (80%) and “Test group” with 404 patients (20%). The characteristics of patients included into each subgroup are shown in Additional file 1: e-Table 3, p. 8. No significant differences were observed between the subgroups. Inclusion of these 25 variables in a GLM model for the training group, resulted in 10 variables that were independently associated with ICU mortality (Fig. 4). No presence of collinearity between explanatory variables was observed (Additional file 1: p. 11) and the Hosmer–Lemeshow Goodness-of-fit test (X-squared = 5.53, df = 8, p-value = 0.69) established no discrepancy between the observed values and those that would have been expected in the model. The validation of the classic model in the test group demonstrated adequate performance with an accuracy of 0.78, a precision test of 0.73, sensitivity of 0.88, specificity of 0.45 and an AUC ROC of 0.82 (95%CI 0.78–0.86) (Additional file 1: p. 10). Performance of classic model was similar than automatic model (Table 3), however, the variables included in each model were different (Fig. 4 and Additional file 1: e-Fig. 9, p. 16).

Fig. 4
figure 4

Variables independently associated with ICU mortality in multivariable analysis (GLM: generalized linear model). Data are show as OR (odds ratio) and 95% Confidence interval (SOFA Sequential organ failure assessment, PCT Procalcitonin, PaO2/FiO2 Partial pressure arterial oxygen/fraction of inspired oxygen, Dysf Dysfunction, LDH D-Lactate dehydrogenase, MV Mechanical ventilation, AKI Acute Kidney injury, > 2 infiltrates  > 2 infiltrates in chest-X ray)

Discussion

The main finding of our study is that among patients with COVID-19, 3 clinical phenotypes were derived using habitual clinical and laboratory variables at ICU admission. The ability of identifying phenotypes using a small set of variables is a crucial step towards clinical application and has important implications for possible differential treatment guided by phenotypes and validated prognostic scoring systems [18, 19].

Our C phenotype was associated with more than double the ICU mortality than each of the remaining two phenotypes. This C phenotype was characterized by the interplay of older age (> 65 year), a high severity (APACHE II > 15 and SOFA > 5), greater burden of risk factors (hematologic disease and coronary disease) and a higher likelihood of developing further complications (shock and AKI).

Previous studies have implemented clustering techniques to analyze various data sources relating to demographic, geographic, environment, and socioeconomic determinants of health and disease. There are studies that have evaluated treatment decisions and characterized clinical phenotypes associated with complications, ICU admission and mortality risk in critically ill COVID-19 patients. According to the Situation Report & Public Health Guidance published by Johns Hopkins University on March 19th, 2020, people over 60 and those with chronic health conditions are at the highest risk for COVID-19 complications [20]. To our knowledge, this is the first study with a high number of critically ill patients to analyze the presence of phenotypes in patients with SARS-CoV-2 infection. This multicenter cohort study of 2,022 critically ill patients found that 660 patients (32.6%) died at ICU discharge. Our ICU mortality rates was significantly lower as reported in Yang et al. [4] in Wuhan, China (61.5%), by Myers et al. [7] in California, USA (50.0%), by Arentz et al. [6] in Washington, USA (67%) and by Richardson et al. [5] in New York, USA (78%), but slightly higher to reported by Grasselli et al. [8] in Lombardy region, Italy (26%). These observed differences in ICU mortality could respond to different healthcare models and important practice variations in the management of severe COVID-19 patients [11], but it can also depend on the frequency of presentation of the different phenotypes.

In our study, a great variability in model performance and risk factors were observed during cross-validation to choose the best model to use. In addition, we use 2 different techniques for the selection of important variables, one of them is the “classic” approach dependent on the p-value, while the other, a “modern” statistical approaches, is more in line with the new recommendations [21]. Although the performance of the models was similar, the variables included in each of them are different. This could be related to the presence of a very heterogeneous patient population, which is revealed during random partitioning (80%/20%) validation of each model or by implementing 2 variable selection techniques. In this context, three clinical phenotypes of COVID-19 patients were derived using routinely available clinical data at ICU admission by an unsupervised cluster analysis. The phenotypes were multidimensional, differed in their demographics, laboratory abnormalities, patterns of organ dysfunction, and associated with ICU mortality. In addition, our phenotypes are not similar with groupings or phenotypes of patients performed so far considering only the presence of clinical complications [22, 23], or the type of ARDS [24]. Our COVID-19 phenotypes can be identified at the time of the ICU admission, and thus could be useful in facilitating early tailored therapy and improve prognosis.

Only routinely available clinical and laboratory data were used in the clustering models, and the phenotypes were derived from a large observational multicenter cohort to ensure generalizability. Importantly, we have observed that the variables associated with the ICU mortality varied between the global model and the models developed for each phenotype. The discrimination power (AUC) of A and B phenotypes models improved in comparison to the global model. However, for the C phenotype (severe COVID-19 disease), the performance of the model was not superior respect of the global model. The C phenotype was most strongly correlated with abnormal values of biomarkers as well as clinical features of cardiovascular dysfunction, AKI and subsequently a higher ICU mortality. Although the AUC for C phenotype is lower, the relationship between sensitivity and specificity in C phenotype model might be more appropriate. Specificity can sometimes be more important than sensitivity, because confirming that a person does not have the event under study (survival) is more important than detecting if a person has it.

Recently, several authors have proposed different clinical phenotypes of COVID-19 patients [22,23,24]. Rello et al. [22] speculated that COVID-19 has five phenotypic presentations based on physiological and clinical features from published studies. Garcia-Vidal et al. [23] describe the main clinical complications of hospitalized patients with COVID-19 through classification into three pattern groups (inflammatory, co-infection and thrombotic). However, as the authors acknowledge, the cut-off points of the different biomarkers for defining phenotypes have been arbitrary and not scientifically supported. Finally, Gattinoni et al. [24] proposing two phenotypes for COVID-19 patients, (1) “Type L” characterized by high compliance and low lung recruitablity and (2) “Type H” with low compliance and high lung recruitability as a two “extremes” of a spectrum of respiratory failure in COVID-19 pneumonia. Despite the importance related to clinical experience in each of these approaches, none of these studies have been developed through a machine learning process to determine phenotypes nor have they been tested for validation.

Hypoxemia has been proposed as a marker of severity for the differentiation of phenotypes [22, 24]. In our study, the PaO2/FiO2 relationship at ICU admission was an independent risk factor for ICU mortality in overall multivariate analysis (as a continuous or dichotomized variable), but was only closely associated with ICU mortality in phenotype C. Other variables such as advanced age, serum D-dimer values and the development of AKI were variables more strongly related to ICU mortality in all subgroups or phenotypes analyzed than PaO2/FiO2 at ICU admission.

Our results should be interpreted in the context of the study limitations. First, although phenotypes were found to be generalizable in our population, risk factors and characteristics of clinical phenotypes were derived initially from data at ICU admission of multicenter observational study in Spain. However, these risk factors are similar to those that have been reported by other investigators [4,5,6,7,8]. The cross-validation carried out and the high discrimination observed for each of the models built for phenotypes, suggests their applicability to other populations, but it should be examined considering the high variability observed in patients with COVID-19 and in the support measures applied. Second, because missing data were common for some variables included in the clustering models, multiple imputation was used in the primary analysis. However, variables with high missing values were excluded and the missing threshold used was reported elsewhere [18]. Third, only routinely available clinical data at ICU admission were used to identify risk factors and clinical phenotypes, and the inclusion of other data related to clinical evolution of patients in the ICU could change risk factors or phenotype assignments. However, our objective was to study early risk factors and phenotypes at ICU admission that may allow for early treatment implementation and as a result improve patient outcome. Fourth, although IL-6 is an excellent severity biomarker, we have not been able to include this biomarker in the models because more than 50% of patients had no IL-6 determination upon ICU admission. Although the inclusion of IL-6 in models could modify or improve their performance, we do not consider it appropriate to impute a large number of missing data. In addition, if IL-6 is a biomarker not usually available its inclusion in the models would not have practical application. Finally, we did not collect data on ethnicity or socioeconomic factors. These factors may play a role in the prevalence of pre-existing comorbidities and mortality due to COVID-19. Our findings should be interpreted within the context of the study population and its generalizability to other populations warrant further investigation.

Conclusion

To our knowledge this is the largest study that describe different phenotypes of patients with confirmed COVID-19 that were admitted to ICU to date. We not only characterized three novel clinical phenotypes, but extended findings outside of a single site ICU by characterizing the association of comorbidities with clinical phenotype and the association of clinical phenotypes with clinical outcomes. Different risk factors for the global population and clinical phenotypes were observed, possibly due to the heterogeneity of patients, which may limit the application of a single predictive model for all patients with COVID-19. Further research is needed to determine the application of these phenotypes in clinical practice, in other patient’s population and for clinical trial design.