Background

The COVID-19 pandemic hit Germany in spring 2020 and since then intensive care resources were heavily utilized up to now [1]. Although large numbers of SARS-CoV-2 patients required intensive care unit (ICU) admission, ICU capacity in Germany was not exceeded. However, risk stratification and prediction of outcomes continues to be challenging. Several investigators have reported their ICU COVID-19 experience during this time period, yet these data show great variability in the number of cases and outcomes reported [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16].

Few of these reports attempted to identify risk factors predicting morbidity, mortality and overall clinical outcome. This may be the result of the reporting of (1) incomplete data sets earlier in the pandemic as many patient were still undergoing ICU care for SARS-CoV-2 infection [10, 13, 15, 16], and/or (2) data sets biased by the need to triage ICU care to patients in the face of the exhaustion of local/regional ICU capacity [7, 10, 14, 15]. Nonetheless, there was consensus that SARS-CoV-2 ICU patients experienced lengthy ICU stays with ICU mortality in the range of 25 to 41% [14, 17]. Classical statistical analysis identified risk factors in these patient populations including age, renal function, the degree of pulmonary compromise and severity of acute respiratory distress syndrome (ARDS). But standard statistical techniques are limited in their ability to integrate diverse data types such as past medical history, therapeutic ICU interventions and many more in relation to clinical outcome variables [18].

To overcome these limitations, we employed machine learning methods to optimize risk stratification and prediction of overall outcomes for individual COVID-19 ICU patients. It has been recently shown that machine learning (ML) algorithms in combination with numerous, multidimensional variables with non-linear relationships may have advantages in clinical outcome prediction. Machine learning strategies were found to be superior to classical methods of outcome prediction typically used in cardiovascular pathologies [18, 19]. To take advantage of this superior technique for outcome prediction, we investigated 1186 PCR-confirmed COVID-19 patients receiving ICU care at 27 German hospitals that were enrolled retrospectively and prospectively. The aim of this study is to investigate whether ML can provide additional and interpretable insights for outcome prediction and weigh the identified outcome factors in COVID-19 ICU patients.

Methods

Study design, setting and participants

This multi-center retrospective—prospective cohort study was performed with 27 participating German hospitals (Additional file 1: Table E1 and Fig. 1). An ethics approval was obtained from the participating hospitals’ Institutional Review Boards. The study was registered in “ClinicalTrials” (clinicaltrials.gov) under NCT04455451. COVID-19 patients 18 years and older requiring ICU admission between 1st January 2020 and 4th May 2021 at a participating center were recruited for this study. Patients were recruited either retrospectively (1st January 2020 to 31st July 2020) or prospectively (29th September 2020 to 4th May 2021). Inclusion criteria were the requirement for ICU treatment due to COVID-19 confirmed by a positive SARS-CoV-2 PCR test. The local investigator confirmed the accuracy and completeness of all entered data. A secure electronic research data capture system (REDCap) was used to collect and manage study data in a pseudonymous fashion [20, 21].

Table 1 Clinical characteristics of N = 1186 patients included in the study; clinical and laboratory parameters
Fig. 1
figure 1

Descriptive data of patients included into the study population. (n = 596 retrospective cohort and n = 443 prospective cohort). A Distribution Age B Horovitz quotient at admission C Murray lung injury score and SOFA score without GCS at admission D Survival rates E Interaction of Murray long injury score and admission status F Laboratory values. Grey indicates patients that did not survive ICU therapy, orange indicates patients that did survive ICU therapy

Variables and measurements

During the data collection process demographic data, past medical history, previous medications, current illness data, laboratory values as well as outcome data were collected. A total of 49 variables were used for the ML models (Additional file 1: Table E8).

To allow comparability of intubated and spontaneously breathing patients the Sequential Organ Failure Assessment (SOFA) score was calculated without the Glasgow Coma Scale (GCS) [22]. Murray Lung injury score was calculated as previously published [23]. Static compliance and driving pressure were calculated as previously described [24]. Laboratory values were converted to a common unit to permit analysis. Oxygen supply in spontaneously breathing patients was converted to an estimated FiO2 (Additional file 1: Table E2).

Table 2 Overall performance of the machine learning models for ICU outcome prediction

Bias management

Discontinuation of ICU care

107 (8.3%) patients, or their legal representative requested that ICU level care be discontinued during the ICU stay. The majority of these patients died during the ICU stay (n = 95, 88.8%). To avoid bias in predictor analyses this patient group was excluded from further analyses (for patient characteristics please see Additional file 1: Table E5). For three patients these data were not available, they were excluded from the analyses.

Dataset used for ML

Out of the 1186 patients included in the final study (Table 1), 147 were transferred to another ICU. Due to study design the ultimate ICU outcome of this subset of patients is unknown. To avoid bias in survival prediction these patients were excluded, thus ML models were trained on 1039 (complete cohort), 596 (retrospective cohort) and 443 (prospective cohort) patients.

Statistical analyses

Observed parameters were assessed for their distribution. Outliers were excluded by visual assessment of clinical validity based on the distribution plots (excluded data points are provided in Additional file 1: Table E3). Baseline characteristics of all patients were evaluated. Continuous variables are reported as either means and standard deviation (SD) if normally distributed or as median and interquartile ranges (IQR) if not normally distributed. Shapiro–Wilk-Test was used prior to Student’s t test or Wilcoxon rank sum test. Kaplan–Meier estimators were compared using Log-Rank-Test. Categorical variables were compared using the Fisher’s Exact Test. A sample size calculation was not performed. Study size is defined by the available datasets in the recruitment period. All statistical analyses were performed in R (version 4.0.3) and JMP (version 15.2.0, SAS Institute, Cary, USA).

Description of machine learning process

Variables are referred to as features in machine learning (ML) but for consistency we will refer to them as variables. For a detailed description of the machine learning process please see Additional file 1: Table E7. We trained Support Vector Classifier (SVC), Random Forest Classifier (RF), and EBM with a fivefold stratified Cross Validation (CV) by using 80% of the data for training and 20% of the data for testing. We excluded variables with more than 30% of data missing (see Additional file 1: Table E7). For all ML-methods, we applied one-hot encoding for categorical data, i.e. creating indicator columns for each category (including missing values). We converted Boolean data to numerical values zero and one. We performed a hyper-parameter optimization across all ML-algorithms with nested CV techniques [25]. Performance of the models was evaluated as the average of balanced accuracy and the area under precision-recall curve (PR-AUC) per fold of CV. A regular accuracy or AUC would be biased towards the overrepresented class (“survival”). In order to verify the robustness of our results in light of the imbalanced outcome variable, we used both over-sampling and under-sampling for the outcome “survival”. For over-sampling, the observations from the under-represented class (here: “non-survival”) were added at random to the data set. For under-sampling, the over-represented class (here”survival”) was reduced at random to the same size as the underrepresented class. We compared the ranking of variable importance and the shape function with the results from each of the fivefold stratified CV runs on the retrospective dataset. The results of each run were the same (data not shown). We further validated the results by training the ML-models with a fivefold CV for hyper-parameter optimization (RF and SVC) on the retrospective data and predicting the outcome on the prospective data (see Table 2).

For the results presented in this paper, we trained the EBM on the entire dataset (retrospective and prospective).

Rationale for the use of the explainable boosting machines model

EBMs are built on a generalized additive model (GAM) of the form

$$g\left( y \right) = \varpi_{1} f_{1} \left( {x_{1} } \right) + \varpi_{2} f_{2} \left( {x_{2} } \right) + ::: + \varpi_{p} f_{p} \left( {x_{p} } \right),$$

where \(g\) is the link function and \(f_{i} \left( {x_{i} } \right)\) the shape function for variable \(x_{i}\) and \(\varpi\)i is the weight for variable \(x_{1}\), with which each variable influences the model. In a classification problem, the link function \(g\) is a logistic function [26]. As the model is additive, each variable contributes in a modular way. This allows for an easy interpretation about the influence of a variable to the prediction (see Fig. 2A). The idea of using shape functions for each variable allows for complex relationships (even non-linear) between the variable and the outcome prediction (see Fig. 2B). Therefore, GAMs can be significantly more accurate than simple linear models [27]. We use EBMs as they additionally employ modern machine learning techniques such as bagging and boosting and have a comparable performance to state-of-the art ML techniques such as RF [27, 28]. Overall performance of the ML models was assessed by balanced accuracy and PR-AUC (Table 2).

Fig. 2
figure 2

EBM prediction model showing importance of risk factors predicting “survival” in COVID-19 ICU patients including admission data. Top A significant risk factors for outcome after analysis of admission data and weighed according to their importance for outcome. bottom) B importance of age for outcome and distribution of age data C platelet/neutrophil ratio and distribution of data on admission D initial D-dimer serum values and distribution of data determined on admission E importance of Horovitz quotient (PaO2/FiO2) for outcome and distribution of data on admission F initial hemoglobin values and distribution of data on admission G initial procalcitonin (PCT) serum values and distribution of data on admission. Grey indicates patients that did not survive ICU therapy, orange indicates patients that did survive ICU therapy

Results

Participating centers and level of care

27 ICUs participated in this observational study including 24 ICUs from university hospitals and three ICUs from regional primary and secondary care hospitals (Additional file 1: Table E1, Figure E1). All patients requiring ICU treatment could receive the full treatment possibilities including ventilation, renal replacement therapy (RRT), and extracorporeal membrane oxygenation (ECMO).

Patient characteristics and status at ICU admission

1186 patients were recruited into the study (patient selection chart, Additional file 1: Figure E2) with 713 patients in the retrospective and 473 patients in the prospective cohort. Overall patient characteristics, severity of the disease, and organ failure are given in Table 1 and Additional file 1: Table E4. Twice as many males (71.9%) than females (28.1%) were treated at the participating ICUs. The median age was 63 (IQR 54 to 73), 180 patients (15.2%) had an age below 50 years, and 6 patients (0.5%) had an age above 90 years. For age distribution and baseline parameters please see Fig. 1. Kaplan Meier Curves for probability of ICU survival according to patient age are provided in Additional file 1: Figure E3a. At ICU admission spontaneous breathing via oxygen mask, non-invasive assisted ventilation or invasive ventilation were present in 47.2%, 11%, 41.7% patients, respectively. Data for the grading of the ARDS severity were available for 1154 patients (97.3%). According to the Berlin definition ARDS was graded using the PaO2/FiO2 index as mild (16.6%), moderate (47.3%), or severe (28.4%) [29]. Additional file 1: Figure E3b provides the Kaplan Meier Curves for probability of ICU survival according to ARDS severity.

Fig. 3
figure 3

EBM prediction model showing importance of risk factors predicting need for ECMO or RRT in COVID-19 ICU patients including admission data. (a) ECMO therapy left) A significant risk factors for outcome after analysis of admission data and weighed according to their importance for outcome. Right B importance of age for outcome and distribution of age data C importance of status “intubated” on ICU admission and distribution of status D) importance of status “external transfer” on ICU admission and distribution of status E importance of Murray lung injury score and distribution of MLIS data. Green indicates patients that did not receive ECMO therapy, orange indicates patients that did receive ECMO therapy. (b) Renal Replacement Therapy (RRT). Left A significant risk factors for outcome after analysis of admission data and weighed according to their importance for outcome. Right B importance of the interaction of age and D-dimer level for outcome and distribution of data C initial creatinine values and distribution of data determined on admission D initial SOFA score w/o GCS and distribution of data determined on ICU admission. Blue indicates patients that did not receive RRT, red indicates patients that did receive RRT

Patient outcome

Overall ICU mortality was 34% for all recruited patients. Median length of ICU stay was 15 days (IQR 7 to 30 days). Mortality was significantly lower in female patients (27.6%) than in male patients (36.5%) (p = 0.0041). Mortality was highest in octogenarians with an observed mortality of 45.7% (Additional file 1: Figure E3a). 22% patients received ECMO therapy (21% in the retrospective cohort and 23.5% in the prospective cohort) with a median duration of 16 days (IQR 9 to 26). 95% of patients received veno-venous ECMO, 2% of patients received a veno-arterial ECMO and 3% received a transition from veno-venous to veno-arterial ECMO. Patients receiving ECMO therapy were significantly younger than those not receiving ECMO (57 (IQR 49 to 65) years vs. 66 (IQR 56 to 76) years; p < 0.0001). 39.3% patients, not receiving chronic dialysis prior to ICU admission, received RRT/dialysis therapy during their ICU stay (41.7% in the retrospective cohort and 35.8% in the prospective cohort).

Prediction of ICU survival by EBM models

Overall performance of the different ML models including results for balanced accuracies and precision recall area under the curve (PR-AUC) are given in Table 2. The EBM model based on variables reflecting status at ICU admission (Additional file 1: Table E8), resulted in a high precision recall area under the curve (PR-AUC) of 0.81 and a moderate balanced accuracy of 0.64 (Additional file 1: Figure E4a). The ten most important predictive variables in the admission model were according to their predictive importance: age, platelet/neutrophil ratio, D-dimer, Horowitz quotient, hemoglobin, procalcitonin, Murray lung injury score, platelet count, interaction of c-reactive protein and interleukin-6 and absolute lymphocyte count (Fig. 2). Patients’ comorbidities were not under the fifteen most important variables. As shown in the shape function for the variable age, there is a transition from improved survival to worsened survival at the age of 61 years (confidence interval (CI) 60 to 62) with a first worsening at the age of 34.7 (CI 31 to 35) years. The platelet/neutrophil ratio was the second most important parameter showing a worsened outcome above a ratio of 43.7 (CI 19.6 to 44.1). Elevated D-Dimers, for instance, affect ICU survival negatively at levels above 4.06 µg/ml (CI 3.78 to 4.07). Low Horovitz quotients demonstrated a negative impact on ICU survival with transitions for the worst impact at PaO2/FiO2 quotients below 85 (CI 84 to 86) and improved survival above 163 to 172. Overall performance and results of the EBM model was similar for the different datasets (complete, prospective and retrospective) (Additional file 1: Table E9, Figure E5a).

Predicting the need for ECMO therapy by EBM models

EBM models for the prediction of ECMO therapy resulted in a good PR-AUC of 0.69 and a good balanced accuracy of 0.73. The five most important parameters associated with ECMO therapy according to their predictive importance were: age, ventilatory status “intubated” at ICU admission, admission by external transfer, Murray lung injury score, and admission by internal transfer (reduced risk) (Fig. 3a). The shape function for the factor age showed a higher risk for ECMO therapy below the age of 70 (CI 69 to 75) years. A Murray Lung injury score above a level of 2.8 (no CI) resulted in a higher risk for ECMO therapy. Patients admitted by external transfer had a higher risk to receive ECMO therapy. Comparison of the EBM models and selected shape functions of important variables revealed similar results (Additional file 1: Table E9 and Figure E5b).

Prediction of renal replacement therapy by EBM models

Patients on chronic dialysis were excluded prior to EBM model generation. The EBM model on the complete dataset resulted in a good PR-AUC (Additional file 1: Figure E4c). The five most important parameters according to their predictive importance were: interaction of age with D-dimer level, creatinine level, SOFA score w/o GCS, interaction of BMI with creatinine, and platelet/neutrophile ratio (Fig. 3b). Patients with an age below approximately 65 years combined with elevated D-dimers had a higher risk for the need of RRT (see heatmap of interaction of age and D-dimers in Fig. 3b). An elevated creatinine level above 1.3 mg/dl (no CI) at ICU admission, as well as a SOFA score w/o GCS above 5 (no CI) resulted in a higher risk to receive RRT during ICU stay. Throughout all EBM models, creatinine and bilirubin levels showed a reverse correlation relationship.

Discussion

In this multi-center retrospective—prospective cohort study we identified and weighed possible predictive factors on COVID-19 outcome using a machine learning approach on 49 variables. Using the present ML approach, we confirmed previously reported factors and extend knowledge to novel factors and factor combinations likely predicting outcome in COVID-19 patients. Shape functions for each of these variables show the individual influence of the variable for the prediction of the outcome. For ICU survival these include age, platelet/neutrophil ratio, D-dimers, and ARDS severity. The most important factors for the prediction of RRT need include the combination of Age and D-Dimers, Creatinine levels and SOFA score without GCS.

Previous studies have shown that older age, obesity, diabetes, being immunocompromised, lower PaO2/FiO2, higher hemodynamic and renal SOFA score at ICU admission were independently associated with 90-day mortality in COVID-19 [14]. This has also been reported by other investigators, yet they did not show individual cutoff values nor weigh the individual importance for the identified factors [30, 31]. To exclude an early effect or a late effect as seen when logistic regression is performed, we included almost all admission variables collected for our cohort. Variable selection influencing outcome can be performed in ML models but is less crucial than for logistic regression. We refrained from such a variable selection in our EBM model’s decision process. In our analysis we were able to confirm that age and pulmonary function on admission are important predictors in COVID-19 ICU patients. The present shape functions clearly show a non-linear association between the predictive factors and the outcome variable. Patient’s age, for instance, as the most important predictive factor, shows a higher chance for ICU survival below 61 years. Additionally, the ML approach identified the D-dimer level and platelet/neutrophil ratio at ICU admission as important factors. This is especially interesting in the context of reported thrombotic complications of COVID-19 patients [32, 33]. When activated, neutrophils complex with platelets to form platelet-neutrophil complexes (PNCs) activating both cell types. These PNCs enhance inflammation, increases neutrophil extracellular trap formation, and result in micro-thrombosis [34, 35]. The same is applicable when looking at D-dimer levels. High D-dimer levels reflect an activation of inflammation and the formation of micro-thrombi with neutrophil extracellular trap formation. We can therefore say that our data reflects the inflammatory markers known from translational science and confirm their relevance to outcome [35].

In everyday clinical practice, it is of great interest to assess the further course of patients in intensive care, such as a necessity for renal replacement or ECMO therapy. The present ML model predicting the need for ECMO therapy identified age and pulmonary compromise (Murray lung injury score) as important factors. Admission both from an external hospital and already in an intubated state are associated with the need for ECMO therapy. This result is not surprising, as both younger and more severely pulmonary compromised patients were typically transferred for ECMO therapy to our participating centers [36]. Our ML models assessing the need for RRT include age as an important factor as well as variables quantifying disease severity (SOFA score) or inflammatory and thrombotic activity (D-dimers and Platelet/neutrophil ratio). Our models do not only permit the identification of risk factors in COVID-19 patients, they also provide insights to the weight of each individual variable for the selected ICU outcome of the individual patient [18, 37]. The ML models chosen allow for transparent assessment of various variables in a non-linear fashion which overcomes limitations of currently employed regression models. The use of shape functions in GAMs for each variable allows for complex relationships (even non-linear) between the variable and the outcome prediction. Therefore, EBMs can be significantly more accurate than simple linear models [27]. Interactions of different variables extend the analyzing capabilities of the ML approach. Overall, the results from the EBM offer a greater degree of interpretability than a p-value of a linear regression, or an odds ratio analysis. As shown in Figs. 2 and 3 the visualizations offer insight into transition values from positive to negative impact, plateaus, as well as confidence intervals as a certainty measure.

A limitation of the present study is that we were not able to include even more patients into the analysis. This is of course a valid point of criticism, yet the data used for our analyses were manually collected and curated. The data was not simply exported from an electronic medical record where missing data are prevalent and validity of the information has not been confirmed. Missing data often needs to be imputed prior to analysis. As a result of the design of our study, we were largely able to reduce imputation of missing data, again adding to the significance of our findings. The predictiveness of the models presented here differed for the three outcomes (survival, ECMO, RRT). This is likely due to the underlying dataset containing more information for predicting e.g. survival compared to ECMO. Since the study was designed with a focus on predicting survival, some variables which might better predict ECMO or RRT might not have been included in this study (for details see Additional file 1: Table E9). Furthermore, whereas the validation of survival prediction was largely consistent between the retrospective and prospective datasets, there was more variability with regard to ECMO and RRT. A possible reason for this might be structural differences between the retro- and prospective datasets, e.g. changes in treatment or age cohort over time. However, the moderate predictive capabilities of the variables used in these ML models leave open the opportunity to add further, even translational technologies for risk prediction in future. A strength of our approach is the ability to determine a weight for individual patient factors with respect to an individual prediction. Additionally, risk factors are presented with a shape function. This allows for a more detailed interpretation and segmentation of risk factors than a simple linear incrementation, as it is the case for the linear regression. Finally, due to the imbalanced dataset (more patients survived ICU therapy, more patients did not need ECMO or RRT), our model is more reliable for predicting “survival” than “mortality”. Nonetheless, the strength of these clinical data is the generalizability across institutions and even other similarly resourced countries.

Conclusions

Yet, we present individual risk factors that can be combined for a prediction of “survival” during COVID-19 treatment and ICU course and these factors are weighed for importance. This has been done for the first time and will allow clinicians to weigh clinical criteria for outcome prediction in the patients treated.