FormalPara Key Summary Points

Why carry out this study?

In the clinical setting it is important to rapidly identify SARS-CoV-2 positive patients, and to provide patients with COVID-19 with appropriate medical support by monitoring parameters that are associated with poor disease outcomes.

We asked whether concise subsets of clinical parameters can be identified to diagnose SARS-CoV-2 positive patients, to identify patients with COVID-19 with a high risk of a fatal outcome on admission to hospital, and to recognize longitudinal parameter patterns as warning signs of a possible fatal COVID-19 outcome during hospitalization.

What was learned from the study?

With a medical decision tree that was built using machine learning-selected diagnostic laboratory parameters, SARS-CoV-2 positive and negative patients can be distinguished on the basis of the full blood count and procalcitonin only.

With the use of machine learning we determined that older age, higher procalcitonin, C-reactive protein (CRP), and troponin I as well as lower hemoglobin and platelets/neutrophils ratio were the strongest predictors of a fatal outcome of COVID-19 on admission to hospital, whereas the strongest predictors in the longitudinal parameter patterns measured during hospitalization were CRP, white blood cells, and D dimers.

The identified subsets of parameters will help to quickly identify and isolate SARS-CoV-2 positive patients, and will assist in the adoption and adjustment of effective treatment for patients with COVID-19 with warning signs of a fatal disease outcome in the clinical setting.

Introduction

As the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) began to spread worldwide at the beginning of 2020, causing the outbreak of coronavirus disease 2019 (COVID-19), individual countries started to reorganize their respective healthcare systems. The Polish government as a specific example created a network of 19 hospitals dedicated to patients with confirmed or suspected infection with SARS-CoV-2. The main idea was to provide those patients with the best possible medical services, while separating them from patients without COVID-19. Infection with SARS-CoV-2 results in different clinical presentations, some of which can require hospitalization and advanced management. The health burden of the disease and especially the high number of concurrent patients [1] constitute a serious problem for healthcare systems. In the situation of emerging SARS-CoV-2 waves and shortage of resources, it is crucial to find a simple and cost-effective solution to identify infected patients as early as possible, and to closely monitor those laboratory parameters that are most strongly associated with the worst disease outcomes. The aim of this study was therefore to use machine learning (ML) to identify a subset of specific features that will assist with the diagnosis of a SARS-CoV-2 infection and the prediction of COVID-19 course and outcome based solely on the standard clinical and laboratory data collected during hospitalization. This could provide medical staff with (i) priority testing approaches on admission to hospital, (ii) initial predictors of a fatal COVID-19 outcome, and (iii) clear warning signs of disease progression associated with a fatal outcome during hospitalization.

The reliable and widely accepted diagnostic tools of an acute infection with SARS-CoV-2 remain PCR swabs and, although less sensitive, antigen rapid diagnostic tests (Ag-RDTs) [2, 3], which became widely available in spring 2021. While Ag-RDTs results are available at the point-of-care in 15–30 min, PCR testing is more time-consuming and takes at least 2–3 h, but often much longer to be ready. Since RT-PCR and Ag-RDTs were developed, less efforts have been dedicated to predicting SARS-CoV-2 positivity on the basis of patient reports of symptoms and medical history, or on standard laboratory values on admission to hospital. However, the rapid identification of patients who are most likely infected with SARS-CoV-2 on the basis of diagnostic features is important, especially in low- and middle-income countries [4, 5], or in the situation of an excess of hospitalized patients with a COVID-19-like clinical presentation. In addition, to optimize care for patients with COVID-19 and the highest risk of a negative disease outcome, it would be beneficial if a set of standard laboratory parameters could be used as an early warning sign. So far, several standard laboratory parameters have been consistently linked with the most severe COVID-19 outcomes including C-reactive protein (CRP), D dimer (DD), various complete blood count (CBC) abnormalities and many others, such as comorbidities, patient’s demographics, and physical examination factors [6,7,8,9,10]. However, given the high number of reported features and at times contradictory information, it has become impossible to identify the most reliable set of significant features that could serve as strong predictors of fatal outcomes.

Therefore, artificial intelligence (AI) approaches are increasingly implemented to assist with clinical assessment in order to diagnose infections in a timely manner, to predict disease outcomes and response to treatment, and to manage different aspects of pandemic necessities [11, 12]. In reviews on AI models for COVID-19, the features predicting the diagnosis frequently included flu-like symptoms, CRP, white blood cells (WBC), lymphocytes (LYMPH), and imaging-derived features. Predictive features for disease course and outcome were most often vital signs, comorbidities, CRP, LYMPH, lactate dehydrogenase (LDH), and imaging-derived features. Yet, the predictive performance of almost all models was too optimistic as they had a high risk of bias due to model overfitting and unclear reporting [13, 14]. Common limitations in modeling approaches limiting their predictive capacity are small sets of parameters, a small sample size, unclear exclusion criteria for participants, limited selection of ML algorithms, missing data on predictors and outcomes, subjective or time-dependent outcomes, and laboratory measurements taken at only one time point.

Hence, we aimed to implement ML into clinical reasoning by reducing the large number of potential features to small and highly predictive feature subsets. To minimize the shortcomings of many previous ML approaches, we simultaneously included a more comprehensive feature space through inclusion of different types of parameters previously shown to be predictive, such as blood cell counts, other laboratory measurements, reported comorbidities, and medications prior to hospitalization. Additionally, we obtained longitudinal measurements of laboratory parameters taken at different time points over the course of the disease to monitor how the dynamic changes of these features impact the clinical outcome, which would ultimately enable physicians to adjust the treatment accordingly. Being aware of a rather conservative approach amongst clinicians towards new concepts, we created a decision tree using ML to support decision-making in everyday practice and summarized the results of the ML approaches into small consensus sets of highly predictive features.

Methods

Patient Data Collection

The project was performed in accordance with the Declaration of Helsinki [15], and was accepted by the Ethical Committee of Medical University of Lodz, Poland (Nr. RNN/126/20/KE). The patients gave informed consent for participation in this study. The comparative study included 515 patients hospitalized in the Maria Skłodowska-Curie Specialty Voivodeship Hospital in Zgierz, Poland, between March and June 2020 in the first wave of the COVID-19 pandemic caused by the SARS-CoV-2 wild-type strain. The analyses included patients with PCR-confirmed SARS-CoV-2 infection (n = 201), who were hospitalized subsequently in the settings of COVID-19 care and either survived (n = 129) or died (n = 72), and patients admitted to the COVID-19 hospital with symptoms resembling COVID-19 who eventually tested negative for SARS-CoV-2 infection (n = 314), and were transferred to non-COVID-19 hospitals or discharged home. The outcome definition of SARS-CoV-2 positive and negative patients was based on RT-PCR results, and that of COVID-19 survival was discharge from hospital after a negative RT-PCR test. Clinical and laboratory data included clinical findings on admission, symptoms prior to hospitalization, epidemiological risk, comorbidities, and reported medications. The laboratory parameters were assessed at the day of admission to hospital and at several time points during the course of the disease. The STROBE guidelines [16] were applied regarding study design, patient recruitment, and reporting of observational research.

Laboratory Measurements

Nasal swabs were collected from all patients on admission to hospital. RNA was then isolated and RT-PCR tests were performed using the 2019-nCoV Triplex RT-qPCR (Vazyme, Nanjing, China) or the 2019-nCoV Bosphpore Novel Coronavirus (Anatolia Geneworks, Istanbul, Turkey) detection kits. The time from collection of a patient’s swabs to receiving PCR results was between 8 and 24 h. The laboratory measurements were consistent with a standard clinical approach considering the individual situation of the patient. A set of features was checked amongst almost all the patients and based on the guidelines valid at that time. Analyses of the full blood and serum samples taken on admission to hospital and regularly throughout hospitalization were performed in the Central Diagnostic Laboratory of the Maria Sklodowska-Curie Specialty Voivodeship Hospital (Zgierz, Poland) using the following systems: Sysmex XN-1000 (Sysmex Europe GmbH, Norderstedt, Germany) for the CBC parameters, ACL Top 350 (Werfen, Warsaw, Poland) for the coagulation parameters, and Abbott Alinity (Abbott Laboratories, Warsaw, Poland) for the biochemistry parameters. The time from blood collection to receiving the results was about 60–90 min.

Machine Learning Approaches

The detailed description on the ML approaches is included in the appendix in the supplementary material.

Data Pre-processing

Data were checked for consistency with removal of characters in numerical fields. When a numeric value was preceded by a “<” sign, the entry was replaced with the mean between 0 and that value. From the existing information, the feature ratios NEU/LYMPH, MON/LYMPH, EOS/LYMPH, PLT/LYMPH, BASO/LYMPH, and PLT/NEU were calculated and included in the analyses, while excluding the following original features: NEU, EOS, MON, platelets (PLT), basophils (BASO), and LYMPH. Three extreme outlier values were removed from data on the subgroup of SARS-CoV-2 positive patients.

Model Building

The following ML algorithms were used for the task of classification: logistic regression (LogisticRegression), k-nearest neighbor (KNeighborsClassifier), random forest (RandomForestClassifier), AdaBoost (AdaBoostClassifier), bagging (BaggingClassifier), gradient boosting (GradientBoostingClassifier), and support vector machines (support vector machine classifier, SVC). After a training–test split of size 75:25 was applied, the training data were passed to the pipeline using Python scikit packages. In the pipeline used to separate SARS-CoV-2 positive from negative patients, categorical features are encoded as an integer array and missing values are imputed with univariate feature importance using the “most frequent” statistics of each respective column. Numerical features are log-transformed, missing values are imputed using the k-nearest neighbors method, and the features are transformed to the default range of minimum 0 and maximum 1. The pipeline used for predicting the outcome of patients with COVID-19 in addition uses a feature selector to remove low-variance features with threshold 0.95 × (1 − 0.95) for numerical and categorical values. For the longitudinal laboratory parameters, the symbolic aggregate approximation (SAX) pipeline was used, which in a first step uses the SAX Transformer package to reduce a variable number of time-series laboratory parameters over the course of the disease into a single parameter reflecting the time series [17]. The single parameter used here is a string of length n = 2 where the first symbol represents the first half of the time series and the second symbol the second half, and which contains a = 2 (a, B) different symbols to discretize the laboratory parameters where a denotes low levels and B high levels. This results in the four SAX-coded clusters aa, aB, Ba, and BB. The parameter choice was based on the decision to create the minimum number of SAX clusters that allowed for observing differences in the mean laboratory values between deceased and surviving patients in each cluster upon visual inspection of the plots. After discretization of the longitudinal parameter patterns, missing values were imputed with univariate feature importance using the “constant” strategy with fill value “missing”, followed by encoding the categorical features as a one-hot numeric array omitting category “missing”. The procedure to find the best model was to apply a training–test split of 75:25 and to perform a grid search to find the best hyperparameters with a fivefold cross-validation with accuracy as score value. The best model for each algorithm was considered to be the estimator with the best mean accuracy on the held-out cross-validation split, and was afterwards refitted on all training data. The refitted models were then run on the so far unseen test data using the same estimator parameters including pre-processing transformations and the distribution statistics learned from the training data to achieve the test accuracy score. This was repeated 15 times with random training–test splits and the best hyperparameter set for each algorithm is taken from the respective model achieving the highest test accuracy score in these 15 runs. The best model was then used to calculate feature importance on the training and test data together using the Sequential Feature Selector method with the parameters to select 15 features, to add features to form a feature subset in a greedy fashion, to use fivefold cross-validation, and to use accuracy as scoring method. The 10 highest scoring features of each algorithm were then assigned with a number of 1–10 to calculate the median feature importance.

Survival Analysis

Survival analysis was based on the laboratory measurements on admission to hospital and patient information on comorbidities and reported medications. Data pre-processing excluded features with more than 40% missing values and three patients without initial laboratory measurements. Comorbidities with less than 5% observations and pre-medication categories were also excluded. Missing values were imputed using the Multiple Imputation by Chained Equations (MICE) method in the Python implementation miceforest. To perform the survival analysis, the following ML algorithms were used: random survival forest with an ensemble of tree-based learners (RandomForestSurvival), gradient boosting with a regression tree base learner (GradientBoostingSurvivalAnalysis), and the Cox proportional hazards model (CoxPHSurvivalAnalysis) from the Python scikit-survival package. Best estimator, hyperparameter optimization, and cross-validation were performed in a sequential manner using a common pipeline. The accuracy of the decision tree to separate SARS-CoV-2 negative from positive patients was based on the accuracy scores obtained during the tenfold cross-validation procedure. Feature importance was calculated using the Permutation Feature Importance method. Best models were determined by the highest median concordance score during the tenfold cross-validation procedure. Using the features in the best-performing ten models, we determined feature importance by calculating the mean weight of the features in all of the ten models, and the feature ranking by taking the median of the rankings in ten best models.

Results

SARS-CoV-2 Positive and Negative Patients Did Not Differ Markedly in Demographic Features, Symptoms, and Comorbidities on Admission to Hospital

Clinical and laboratory data were retrieved from the patients’ database. To reduce matrix sparsity, the comorbidities and self-reported medication features were combined into broader categories (appendix). Features that still affected less than 5% of patients were excluded from further analyses. As some features were found to have an imbalanced fraction of missing values for SARS-CoV-2 positive and negative patients, features with more than 40% missing values in any of the groups were excluded from further analyses (appendix, Fig. S1 in the supplementary material).

Patients admitted to hospital who tested positive for infection with SARS-CoV-2 had a similar sex and age distribution as SARS-CoV-2 negative patients (Table 1). They had direct contacts with confirmed SARS-CoV-2 positive persons more frequently, and residents of health facilities or nursing homes were disproportionately more often infected with SARS-CoV-2 (Table S1 in the supplementary material). On admission, oxygen saturation was slightly lower in SARS-CoV-2 positive compared to negative patients, and a bigger proportion of positive patients had body temperature below 37 °C. On admission to hospital, patients reported symptoms in the week prior to hospitalization, and disclosed information on regularly taken medication, and perceived comorbidities. The only symptom that was reported by a considerable number of patients and over-represented in the SARS-CoV-2 positive group was cough (Table 1). Comorbidities that were significantly more prevalent in SARS-CoV-2 positive patients were hypertension and dementia (Table S2 in the supplementary material), yet dementia might be associated with its general high prevalence in healthcare facilities and nursing homes. Medication for cardiovascular diseases was the only reported medication category with a significant difference between SARS-CoV-2 positive and negative patients (Table S3 in the supplementary material), reflecting a higher prevalence of hypertension in the SARS-CoV-2 positive group. As patients who tested negative for SARS-CoV-2 were either transported to other hospitals or discharged home, the duration of hospitalization, death, and ICU data between COVID-19 positive and negative patients cannot be compared. In summary, SARS-CoV-2 positive and negative patients had rather similar demographics, clinical presentation, and medical history on admission to hospital. Altogether, this points to the rather limited value of symptoms, demographics, and self-reported data in predicting SARS-CoV-2 positivity on admission to hospital and calls for the more objective measurements to build the prediction model.

Table 1 Statistical test for differences in patient characteristics and symptoms on admission to hospital or in the week preceding hospital admission between patients that tested positive or negative for an infection with SARS-CoV-2

Machine Learning Accurately Distinguished SARS-CoV-2 Positive from Negative Patients, Based on Full Blood Count and Procalcitonin Only

On the basis of the findings above, we aimed to assess the possibility of predicting SARS-CoV-2 infection according to the standard laboratory parameters acquired on admission to hospital, before PCR results become available. On the basis of a classical statistical analysis, 20 parameters reached statistical significance with a p value smaller than 0.05 (Table 2). Interestingly, differences in WBC and NEU counts, and in the ratios of different leukocyte cells and platelet counts between SARS-CoV-2 positive and negative patients reached more significant differences than any single inflammatory marker alone. However, the high number of significantly different parameters hinders their interpretation in everyday practice and does not provide us with additional clinical understanding.

Table 2 Laboratory parameters for all patients, and separately for SARS-CoV-2 positive and negative patients

Therefore, to better distinguish SARS-CoV-2 positive from negative patients, we used the ML algorithms logistic regression, k-nearest neighbor, random forest, AdaBoost, bagging, gradient boosting, and SVC. Using initial simulation experiments, we evaluated the optimal split into training and test data. We found that a test–training split size of 25:75 leads to the best trade-off between minimizing the difference between accuracies on the training and cross-validation data, and maximizing test accuracy (appendix, Fig. S2 in the supplementary material). This 25:75 test–training split was done 15 times, and the seven algorithms were run on this data set (Table S4 in the supplementary material). The classifier metric here was the accuracy score on distinguishing SARS-CoV-2 positive and negative patients. The performance of the different models was evaluated and gave a mean test accuracy of 0.76. The gradient boosting algorithm achieved the highest accuracy (Table S4 in the supplementary material). To overcome the rather low overlap between sets of the most important features given out by each algorithm, we calculated a median feature importance over all ML algorithms (Table S5 in the supplementary material). It enabled us to determine the decisive laboratory parameters: WBC, antibody-synthesizing lymphocytes (AS-LYMPH), procalcitonin (PCT), basophils/lymphocytes ratio (BASO/LYMPH), platelets/neutrophils ratio (PLT/NEU), monocytes/lymphocytes ratio (MON/LYMPH), creatinine (CREAT), and CRP (Fig. 1a).

Fig. 1
figure 1

Decisive laboratory parameters to distinguish SARS-CoV-2 positive (orange) from negative (blue) patients and the medical decision tree built from these features. Box plots of the decisive laboratory parameters that were identified with ML on the basis of their median rank importance where the boxes represent the median and interquartile range of the respective feature in SARS-CoV-2 negative and positive patients at hospital admission (a). The decision tree built from these features shows at every decision node a histogram with the distribution of the parameter values and an arrow that indicates the decision threshold values with N denoting the number of samples used to compute the split point. For the pie charts, the diameter is proportional to the number of samples in that leaf, and N denotes the number of samples used to compute the predicted class in the training data set. The percentages given below the pie charts indicate the probabilities of identifying SARS-CoV-2 positive or negative patients within a given decision pathway that were calculated according to a binary prediction problem with two classes (b)

Next, we created a decision tree using these decisive laboratory parameters to evaluate patients with a suspected SARS-CoV-2 infection, providing an additional diagnostic tool before the PCR test results are available (Fig. 1b). Patients with a WBC count lower than or equal to 6.9 × 103 were mostly negative, if the antibody-synthesizing lymphocytes count was higher than 0.01 × 103. If the AS-LYMPH count was lower than or equal to 0.01 × 103, most patients were still negative if the MON/LYMPH was lower than or equal to 0.89. The majority of patients with WBC count higher than 6.9 × 103 were positive, especially when the MON/LYMPH was lower than 1.79 and combined with PCT levels higher than 1.9, or when the MON/LYMPH was higher than 1.79 and combined with the PLT/NEU lower than 18. The decision tree therefore reached an accuracy of up to 90–100%. In summary, mainly the counts of various blood cells and their ratios and the levels of PCT can provide an estimate of a patient’s infection status, which is in contrast to the high number of laboratory parameters that showed significant differences using classical statistical tests (Table 1, Table S6 in the supplementary material).

Levels of Inflammatory Parameters, Troponin I, Blood Cell Counts, and Age Could Predict the Fatal Outcome of COVID-19 on Admission to Hospital

After diagnosis of COVID-19 is confirmed, it is crucial to predict the potential course of the disease and the risk of fatal outcomes, and to provide patients with cautious monitoring followed by tailored diagnostic and therapeutic strategies. To do so, we again performed classical statistical analyses and applied ML approaches to compare initial laboratory results on admission to hospital and reported medications, comorbidities, and demographics between patients who survived or died in hospital. Numerous laboratory parameters were significantly different at a 1% significance level on admission between surviving or deceased patients with COVID-19 (Table 3, Table S7 in the supplementary material). Most of the laboratory features were significantly elevated in patients who later died during hospitalization, including parameters of inflammation—PCT, CRP, and ferritin (FER) (Fig. 2a), tissue damage—CREAT, aspartate aminotransferase (AST), troponin I (TnI), creatine kinase-myocardial band (CK-MB), LDH, creatine kinase (CK) (Fig. 2b), coagulation—DD, prothrombin time (PT), activated partial thromboplastin time (APTT) (Fig. 2c), and CBC parameters related to an inflammatory response—WBC, neutrophil reactivity (NEU-RE), neutrophils (NEU), and relevant CBC ratios—neutrophils/lymphocytes (NEU/LYMPH) and platelets/lymphocytes (PLT/LYMPH) (Fig. 2d, e). Importantly, those patients also had significantly lower levels of hemoglobin (HGB), LYMPH, eosinophils (EOS), monocytes (MON), ratios of eosinophils/lymphocytes (EOS/LYMPH) and PLT/NEU (Fig. 2d, e). Additionally, analyzing patients’ characteristics and reported comorbidities by hypergeometric testing, we found that deceased patients were older and more often affected with renal disorders or anemia (Table S2 in the supplementary material). They also more often reported hypertension or other cardiovascular diseases, which was supported by the higher percentage of deceased patients taking antithrombotic and anti-cardiovascular disease medications prior to hospitalization (Tables S2, S3 in the supplementary material). Nonetheless, it is important to note that reported data on the past medical history may be incomplete or inconsistent.

Table 3 Laboratory parameters for all patients with COVID-19, and separately for patients with COVID-19 who survived or died
Fig. 2
figure 2figure 2

Survival analysis of patients with COVID-19 based on features that were measured at hospital admission. Box plot distribution of laboratory parameters related to inflammation (a), tissue damage (b), coagulation (c), complete blood counts (d), and blood cell ratios (e) between patients with COVID-19 who survived or died. The boxes represent the median and interquartile range of the respective parameters for the deceased (violet) or surviving (green) patients. Bump chart of feature ranks in the ML survival analysis with cross-validation on 10 different training–test splits, where a low rank indicates high feature importance (f). Features in bold consistently scored high in different models. The mean weights of the different features as assessed in a permutation feature importance analysis confirm the importance of those features in distinguishing deceased and surviving SARS-CoV-2 positive patients (g)

Similar to the separation of SARS-CoV-2 positive and negative patients, the high number of significantly different variables between patients who died or survived makes it difficult to implement them in everyday clinical practice. Therefore, we used ML survival analysis to determine the best set of features to predict the risk of dying from COVID-19 already on admission to hospital. The feature space in the complete survival analysis included the on-admission laboratory results, patients’ demographics, reported comorbidities, and medications. After a 25:75 test–training split, the training data were further divided ten times into a training and validation set for the tenfold cross-validation. Three different algorithms were run with specified hyperparameter ranges on these training validation sets. With this cross-validation, the model with the highest median score was evaluated, and then run on the so far unseen test data. This was repeated for ten different test–training splits (appendix). Analysis of the ML results revealed that, even though the models differed to some extent, PCT, TnI, age, HGB, PLT/NEU, and CRP were amongst the most important prognostic features (Fig. 2f). This was further confirmed by calculating the importance of all features across the top ten models using permutation (Fig. 2g).

In conclusion, SARS-CoV-2 positive patients who on admission to hospital were older, had higher inflammatory parameters (PCT, CRP) and TnI levels together with lower HGB and a lower PLT/NEU ratio already had a high risk of death from COVID-19.

Dynamic Changes in CRP, WBC Count, and DD During Hospitalization Allowed for Additional Prediction of COVID-19 Survival or Death

Providing patients with tailored diagnostic and therapeutic strategies reflecting the dynamics of the disease course should have an influence on the final COVID-19 outcomes. Therefore, we analyzed the laboratory parameters measured repeatedly during the hospitalization to identify those parameters whose changes should be most closely monitored, and serve as warning signs to intensify the treatment.

As a result of the nature of the time-series data with the variable number of measurements (Fig. S3 in the supplementary material), they needed to be parameterized before they could be used in the ML approaches. Such parameterizations are usually done using clustering methods. Yet, with the time-series data available here and their variability in the number of observations per time-series feature (Fig. S3 in the supplementary material), most commonly used clustering methods would just cluster for the length of the time series. Therefore, we used the SAX algorithm that allows for a symbolic representation of time-series data [17]. The parameter used here as symbolic representation is a string with length 2 where the first position represents the average over the first half of the time series and the second position the second half, and the two different symbols a and B denote low and high levels, respectively. This allows for the four different SAX clusters aa (started low, stayed low), aB (started low, increased), Ba (started high, decreased), BB (started high, stayed high). The SAX-clustered time-series data were then combined with self-reported comorbidities, demographics, and reported medications, and a full set of features was subjected to the same ML algorithms as previously on 15 different 75:25 training–test splits (Table S8 in the supplementary material). The classifier metric here was the accuracy score on predicting surviving versus deceased patients. Performance of different models was evaluated, giving a mean test accuracy of 0.82, which was considerably higher than the test accuracy achieved in separating the SARS-CoV-2 positive and negative patients. We then calculated the median feature importance for all the features over all algorithms. The features with median scoring values smaller than 11 were the SAX-clusters CRP BB, WBC aa, DD aa, and age (Table S9 in the supplementary material).

The CRP SAX-cluster BB with high levels of CRP, staying high during hospitalization predicted fatal outcomes from COVID-19 with only 20% survival rate, whereas cluster aa had a very good prognosis (Fig. 3a). Survival rate of patients who initially had high average levels of CRP (more than 6.95 mg/dl), which decreased during hospitalization (SAX-cluster Ba), was nearly two times higher than patients who showed the opposite pattern (SAX-cluster aB). For WBC, the most predictive cluster was aa in which patients started with lower average levels of WBC (less than 9.94 103/μl) that stayed low during hospitalization, and which was associated with a survival rate over 80% (Fig. 3b). Patients whose WBC increased (cluster aB) had 20% less chance of survival compared to those whose WBC decreased (cluster Ba) over time. Importantly, all the measurements in this study were taken prior to the WHO recommendation for standard use of steroids in the treatment of severe or critical patients with COVID-19. Increasing or persistently high levels of WBC during the course of the disease that were associated with fatal disease outcomes were therefore unrelated to steroid therapy. Regarding DD levels, a survival rate of more than 80% was noted in patients with low average DD levels during hospitalization (cluster aa; average values less than 2.62 μg/ml). In patients with DD values high and decreasing over time (cluster Ba), the survival rate decreased to around 70%. Interestingly, both DD starting high and staying high (cluster BB) and DD starting low and increasing (cluster aB) were associated with around 40% chance of survival (Fig. 3c). Age, the only additional non-laboratory feature with a low median feature rank, has already been mentioned earlier as an important determinant of survival. In summary, monitoring the longitudinal patterns of CRP, WBC, and DD during hospitalization together with attention to patients’ age can predict COVID-19 fatal outcome or survival with high accuracy.

Fig. 3
figure 3

Machine learning analysis of SAX-clustered longitudinal features to distinguish deceased and surviving patients with COVID-19. The laboratory parameters measured during the course of the disease were clustered using SAX into the four clusters: aa—average value below the SAX threshold in the first and the second half of the measurements, aB—average value below the SAX threshold in the first half of the measurements that increased to an average value above the threshold in the second half of the measurements, Ba—average value above the SAX threshold in the first half of the measurements that decreased to an average value below the threshold in the second half of the measurements, and BB—average value above the SAX threshold in the first and the second half of the measurements. In the top panel, the mean and standard deviation of the respective laboratory feature is shown for all deceased and surviving patients with COVID-19 for CRP (a), WBC (b), and DD (c), while in the middle panel they are shown for each SAX cluster separately. The bottom panel depicts the percentage of surviving patients in each SAX cluster of the respective laboratory feature

Discussion

In the current work we have shown that implementation of ML approaches allowed for the accurate prediction of a SARS-CoV-2 infection and of COVID-19-related outcomes. First, it enabled us to identify SARS-CoV-2 negative and positive patients admitted to hospital with comparable symptoms, demographics, and medical history, based solely on the on-admission CBC values WBC and AS-LYMPH, the ratios MON/LYMPH and PLT/NEU, and on the levels of PCT. In contrast to the high number of laboratory parameters that showed significant differences using classical statistical tests, the ML approach therefore selected those parameters and their combinations that best separated SARS-CoV-2 positive from negative patients on admission to hospital. To assist clinicians in the proper triage at an early stage, this subset of features was included in an easy-to-use medical decision tree. Compared with another decision tree described in the literature [18], our approach requires fewer laboratory parameters to be measured, which makes the decision tree more transparent and practical to use.

Interestingly, our finding that SARS-CoV-2 negative patients often had higher parameters of infection or tissue injury such as WBC or CRP than SARS-CoV-2 positive patients is in agreement with the results from previous studies comparing COVID-19 with other known infections of bacterial or viral origin [19, 20]. This reflects that SARS-CoV-2 initiated a lower inflammatory response in comparison to other bacterial and viral infections. Even though the individual selected laboratory parameters are not specific for SARS-CoV-2, the usage of the ML-selected parameter set and the medical decision tree allow for the efficient separation of positive patients, and for the initiation of appropriate treatment as soon as the laboratory data are available. This is especially useful in the situation of healthcare overload or limited availability of antigen and PCR tests when rapid risk stratification can also protect SARS-CoV-2 negative patients from becoming infected in the medical facility. In general, complete blood counters and serum analyzers are more widely available in hospitals of various reference levels than facilities to test infectious patient material for the presence of SARS-CoV-2. The time need to acquire the standard laboratory parameters fluctuates depending on the medical facility, but typically takes between 15 and 90 min, while obtaining PCR results takes 1–48 h. Ag-RDTs, which were not available at the time of our study, only take around 15 min, yet their availability, sensitivity, and specificity are variable, especially facing waves of new SARS-CoV-2 variants. Especially during phases with low incidences or when new SARS-CoV-2 variants are spreading, the clinical laboratory data would be expected to provide more stable diagnostic features, as the response of the organism is not expected to differ fundamentally, and the laboratory values are less prone to false positive and false negative identifications. Therefore, we are convinced that the subset of diagnostic laboratory parameters identified here can be instrumental in hospitals all over the world.

Next, we identified the best set of on-admission laboratory parameters and demographic features, which gave the highest prediction score of COVID-19-related death. We have shown that a few standard laboratory parameters measured on admission to hospital (PCT, TnI, HGB, PLT/NEU, and CRP) together with the age of the patients distinguished between surviving and deceased patients and provided early biomarkers of poor prognosis regardless of symptoms, comorbidities, sex, and previous medical history. In systematic reviews and meta-analyses in which a large number of features were evaluated for their consistent association with fatal COVID-19 outcomes based on laboratory parameters measured at admission to hospital, TnI and CRP were previously reported as prognostic features [21, 22]. This supports our ML-based approach, which allowed us to narrow down a large set of features to a subset of diagnostic markers in a single-center study. Other laboratory features included here, such as the different blood cell ratios, are less commonly reported in other studies, and are therefore not covered in meta-analyses. The use of ML to identify prognostic features hence brings the additional advantage that new, potentially predictive laboratory parameters can be included in the analyses. This best subset of identified prognostic parameters should be sufficient to alert the medical personnel, to increase the level of care, and to enable appropriate, quicker, and tailored medical decisions.

Finally, the concurrent inclusion of the time-course laboratory parameters into the modelling followed by feature selection allowed us to identify the progression of CRP, WBC, and DD as a small subset of significant diagnostic markers. These parameters should be carefully monitored during hospitalization because their alterations could herald the change of a patient’s fate.

While previously described AI models are often not coherent with regards to the predictive potential of various laboratory parameters, the majority outlined an important role of a CBC in screening for an infection with SARS-CoV-2 [13, 14, 23,24,25,26,27]. As our decision tree confirms the high predictive value of the CBC, it also emphasizes the specific impact of an infection with SARS-CoV-2 on the immune system that even includes changes of the physical phenotype of blood cells [28], which is an important aspect of its short- and long-term pathogenicity. The inclusion of continuous laboratory parameters that were measured at different time points and with a variable number of measurements during the hospital stay as features was accomplished previously, e.g., by using the laboratory values at the first, the last, and a random day followed by the training of three different ML models and parameter binning [29], or by taking the laboratory values at the first day and then two consecutive measurements in defined time intervals, followed by a logistic regression of the data at individual time points [30]. Here, we decided to reflect the time-course laboratory data through feature engineering using SAX clustering, because it allowed us to include the time-resolved development of laboratory parameters as features into the modelling approaches, and provided us with thresholds for the laboratory values. The results of our analyses corroborated with the previously reported risk of a fatal outcome that is associated with a specific progression of CRP and WBC values [29, 30]. Monitoring the most important diagnostic clinical parameters over time can aid in adjusting therapeutic interventions prior to clinical deterioration. Interestingly, the most important diagnostic parameters identified in the feature set including the time-course data (clusters of CRP, WBC, and DD) differed slightly from those obtained using the feature set with the clinical parameters obtained on admission to hospital (PCT, TnI, HGB, PLT/NEU, and CRP). However, WBC and DD are consistently associated with fatal COVID-19 outcomes in various meta-analyses [21, 22, 31, 32]. This emphasizes the importance of developing separate models for addressing different clinical questions.

Major limitations of our study, as for the majority of ML approaches, might be the single-center study group despite involving 515 and 201 patients in the screening and prognostic analyses, respectively. ML approaches require a thorough pre-processing and unification of the reported values and parameters, which is challenging to achieve across several centers. Nonetheless, large multicentered studies would be valuable here to confirm our results. A large fraction of our study group were individuals from long-term healthcare facilities, which reflects the way the virus spread during the first waves of the COVID-19 pandemic. This patient group is a priori more vulnerable to severe courses of the disease and potentially fatal outcomes, which could be responsible for the previously reported high fatality rates in hospitalized patients [33,34,35]. As a result of the differences in study designs and a variety of additional variables such as seasonal activity of influenza viruses, changes in the predominant SARS-CoV-2 variants, vaccination progress, and implemented safety measures, general conclusions should be drawn carefully. However, given that the general pathophysiological aspects of the disease have been well characterized [36, 37], we expect that the laboratory findings concerning the initial impact of an infection and differences indicative of an increased risk of a fatal disease are coherent amongst different SARS-CoV-2 variants. Further validation of these findings based on different ML approaches applied to a large variety of different cohort data is expected to consolidate on the most important predictive parameters, while revealing specific differences between different subpopulations or virus variants.