Machine Learning Successfully Detects Patients with COVID-19 Prior to PCR Results and Predicts Their Survival Based on Standard Laboratory Parameters in an Observational Study

Styrzynski, Filip; Zhakparov, Damir; Schmid, Marco; Roqueiro, Damian; Lukasik, Zuzanna; Solek, Julia; Nowicki, Jakub; Dobrogowski, Milosz; Makowska, Joanna; Sokolowska, Milena; Baerenfaller, Katja

doi:10.1007/s40121-022-00707-8

Machine Learning Successfully Detects Patients with COVID-19 Prior to PCR Results and Predicts Their Survival Based on Standard Laboratory Parameters in an Observational Study

Original Research
Open access
Published: 04 November 2022

Volume 12, pages 111–129, (2023)
Cite this article

Download PDF

You have full access to this open access article

Infectious Diseases and Therapy Aims and scope Submit manuscript

Machine Learning Successfully Detects Patients with COVID-19 Prior to PCR Results and Predicts Their Survival Based on Standard Laboratory Parameters in an Observational Study

Download PDF

2846 Accesses
5 Citations
3 Altmetric
Explore all metrics

Abstract

Introduction

In the current COVID-19 pandemic, clinicians require a manageable set of decisive parameters that can be used to (i) rapidly identify SARS-CoV-2 positive patients, (ii) identify patients with a high risk of a fatal outcome on hospital admission, and (iii) recognize longitudinal warning signs of a possible fatal outcome.

Methods

This comparative study was performed in 515 patients in the Maria Skłodowska-Curie Specialty Voivodeship Hospital in Zgierz, Poland. The study groups comprised 314 patients with COVID-like symptoms who tested negative and 201 patients who tested positive for SARS-CoV-2 infection; of the latter, 72 patients with COVID-19 died and 129 were released from hospital. Data on which we trained several machine learning (ML) models included clinical findings on admission and during hospitalization, symptoms, epidemiological risk, and reported comorbidities and medications.

Results

We identified a set of eight on-admission parameters: white blood cells, antibody-synthesizing lymphocytes, ratios of basophils/lymphocytes, platelets/neutrophils, and monocytes/lymphocytes, procalcitonin, creatinine, and C-reactive protein. The medical decision tree built using these parameters differentiated between SARS-CoV-2 positive and negative patients with up to 90–100% accuracy. Patients with COVID-19 who on hospital admission were older, had higher procalcitonin, C-reactive protein, and troponin I levels together with lower hemoglobin and platelets/neutrophils ratio were found to be at highest risk of death from COVID-19. Furthermore, we identified longitudinal patterns in C-reactive protein, white blood cells, and D dimer that predicted the disease outcome.

Conclusions

Our study provides sets of easily obtainable parameters that allow one to assess the status of a patient with SARS-CoV-2 infection, and the risk of a fatal disease outcome on hospital admission and during the course of the disease.

Prediction of COVID-19 deterioration in high-risk patients at diagnosis: an early warning score for advanced COVID-19 developed by machine learning

Article Open access 19 July 2021

Modelling the risk of hospital admission of lab confirmed SARS-CoV-2-infected patients in primary care: a population-based study

Article 10 February 2022

A novel severity score to predict inpatient mortality in COVID-19 patients

Article Open access 07 October 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

FormalPara Key Summary Points

*Why carry out this study?*
In the clinical setting it is important to rapidly identify SARS-CoV-2 positive patients, and to provide patients with COVID-19 with appropriate medical support by monitoring parameters that are associated with poor disease outcomes.
We asked whether concise subsets of clinical parameters can be identified to diagnose SARS-CoV-2 positive patients, to identify patients with COVID-19 with a high risk of a fatal outcome on admission to hospital, and to recognize longitudinal parameter patterns as warning signs of a possible fatal COVID-19 outcome during hospitalization.
*What was learned from the study?*
With a medical decision tree that was built using machine learning-selected diagnostic laboratory parameters, SARS-CoV-2 positive and negative patients can be distinguished on the basis of the full blood count and procalcitonin only.
With the use of machine learning we determined that older age, higher procalcitonin, C-reactive protein (CRP), and troponin I as well as lower hemoglobin and platelets/neutrophils ratio were the strongest predictors of a fatal outcome of COVID-19 on admission to hospital, whereas the strongest predictors in the longitudinal parameter patterns measured during hospitalization were CRP, white blood cells, and D dimers.
The identified subsets of parameters will help to quickly identify and isolate SARS-CoV-2 positive patients, and will assist in the adoption and adjustment of effective treatment for patients with COVID-19 with warning signs of a fatal disease outcome in the clinical setting.

Introduction

As the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) began to spread worldwide at the beginning of 2020, causing the outbreak of coronavirus disease 2019 (COVID-19), individual countries started to reorganize their respective healthcare systems. The Polish government as a specific example created a network of 19 hospitals dedicated to patients with confirmed or suspected infection with SARS-CoV-2. The main idea was to provide those patients with the best possible medical services, while separating them from patients without COVID-19. Infection with SARS-CoV-2 results in different clinical presentations, some of which can require hospitalization and advanced management. The health burden of the disease and especially the high number of concurrent patients [1] constitute a serious problem for healthcare systems. In the situation of emerging SARS-CoV-2 waves and shortage of resources, it is crucial to find a simple and cost-effective solution to identify infected patients as early as possible, and to closely monitor those laboratory parameters that are most strongly associated with the worst disease outcomes. The aim of this study was therefore to use machine learning (ML) to identify a subset of specific features that will assist with the diagnosis of a SARS-CoV-2 infection and the prediction of COVID-19 course and outcome based solely on the standard clinical and laboratory data collected during hospitalization. This could provide medical staff with (i) priority testing approaches on admission to hospital, (ii) initial predictors of a fatal COVID-19 outcome, and (iii) clear warning signs of disease progression associated with a fatal outcome during hospitalization.

The reliable and widely accepted diagnostic tools of an acute infection with SARS-CoV-2 remain PCR swabs and, although less sensitive, antigen rapid diagnostic tests (Ag-RDTs) [2, 3], which became widely available in spring 2021. While Ag-RDTs results are available at the point-of-care in 15–30 min, PCR testing is more time-consuming and takes at least 2–3 h, but often much longer to be ready. Since RT-PCR and Ag-RDTs were developed, less efforts have been dedicated to predicting SARS-CoV-2 positivity on the basis of patient reports of symptoms and medical history, or on standard laboratory values on admission to hospital. However, the rapid identification of patients who are most likely infected with SARS-CoV-2 on the basis of diagnostic features is important, especially in low- and middle-income countries [4, 5], or in the situation of an excess of hospitalized patients with a COVID-19-like clinical presentation. In addition, to optimize care for patients with COVID-19 and the highest risk of a negative disease outcome, it would be beneficial if a set of standard laboratory parameters could be used as an early warning sign. So far, several standard laboratory parameters have been consistently linked with the most severe COVID-19 outcomes including C-reactive protein (CRP), D dimer (DD), various complete blood count (CBC) abnormalities and many others, such as comorbidities, patient’s demographics, and physical examination factors [6,7,8,9,10]. However, given the high number of reported features and at times contradictory information, it has become impossible to identify the most reliable set of significant features that could serve as strong predictors of fatal outcomes.

Therefore, artificial intelligence (AI) approaches are increasingly implemented to assist with clinical assessment in order to diagnose infections in a timely manner, to predict disease outcomes and response to treatment, and to manage different aspects of pandemic necessities [11, 12]. In reviews on AI models for COVID-19, the features predicting the diagnosis frequently included flu-like symptoms, CRP, white blood cells (WBC), lymphocytes (LYMPH), and imaging-derived features. Predictive features for disease course and outcome were most often vital signs, comorbidities, CRP, LYMPH, lactate dehydrogenase (LDH), and imaging-derived features. Yet, the predictive performance of almost all models was too optimistic as they had a high risk of bias due to model overfitting and unclear reporting [13, 14]. Common limitations in modeling approaches limiting their predictive capacity are small sets of parameters, a small sample size, unclear exclusion criteria for participants, limited selection of ML algorithms, missing data on predictors and outcomes, subjective or time-dependent outcomes, and laboratory measurements taken at only one time point.

Hence, we aimed to implement ML into clinical reasoning by reducing the large number of potential features to small and highly predictive feature subsets. To minimize the shortcomings of many previous ML approaches, we simultaneously included a more comprehensive feature space through inclusion of different types of parameters previously shown to be predictive, such as blood cell counts, other laboratory measurements, reported comorbidities, and medications prior to hospitalization. Additionally, we obtained longitudinal measurements of laboratory parameters taken at different time points over the course of the disease to monitor how the dynamic changes of these features impact the clinical outcome, which would ultimately enable physicians to adjust the treatment accordingly. Being aware of a rather conservative approach amongst clinicians towards new concepts, we created a decision tree using ML to support decision-making in everyday practice and summarized the results of the ML approaches into small consensus sets of highly predictive features.

Methods

Patient Data Collection

The project was performed in accordance with the Declaration of Helsinki [15], and was accepted by the Ethical Committee of Medical University of Lodz, Poland (Nr. RNN/126/20/KE). The patients gave informed consent for participation in this study. The comparative study included 515 patients hospitalized in the Maria Skłodowska-Curie Specialty Voivodeship Hospital in Zgierz, Poland, between March and June 2020 in the first wave of the COVID-19 pandemic caused by the SARS-CoV-2 wild-type strain. The analyses included patients with PCR-confirmed SARS-CoV-2 infection (n = 201), who were hospitalized subsequently in the settings of COVID-19 care and either survived (n = 129) or died (n = 72), and patients admitted to the COVID-19 hospital with symptoms resembling COVID-19 who eventually tested negative for SARS-CoV-2 infection (n = 314), and were transferred to non-COVID-19 hospitals or discharged home. The outcome definition of SARS-CoV-2 positive and negative patients was based on RT-PCR results, and that of COVID-19 survival was discharge from hospital after a negative RT-PCR test. Clinical and laboratory data included clinical findings on admission, symptoms prior to hospitalization, epidemiological risk, comorbidities, and reported medications. The laboratory parameters were assessed at the day of admission to hospital and at several time points during the course of the disease. The STROBE guidelines [16] were applied regarding study design, patient recruitment, and reporting of observational research.

Laboratory Measurements

Nasal swabs were collected from all patients on admission to hospital. RNA was then isolated and RT-PCR tests were performed using the 2019-nCoV Triplex RT-qPCR (Vazyme, Nanjing, China) or the 2019-nCoV Bosphpore Novel Coronavirus (Anatolia Geneworks, Istanbul, Turkey) detection kits. The time from collection of a patient’s swabs to receiving PCR results was between 8 and 24 h. The laboratory measurements were consistent with a standard clinical approach considering the individual situation of the patient. A set of features was checked amongst almost all the patients and based on the guidelines valid at that time. Analyses of the full blood and serum samples taken on admission to hospital and regularly throughout hospitalization were performed in the Central Diagnostic Laboratory of the Maria Sklodowska-Curie Specialty Voivodeship Hospital (Zgierz, Poland) using the following systems: Sysmex XN-1000 (Sysmex Europe GmbH, Norderstedt, Germany) for the CBC parameters, ACL Top 350 (Werfen, Warsaw, Poland) for the coagulation parameters, and Abbott Alinity (Abbott Laboratories, Warsaw, Poland) for the biochemistry parameters. The time from blood collection to receiving the results was about 60–90 min.

Machine Learning Approaches

The detailed description on the ML approaches is included in the appendix in the supplementary material.

Data Pre-processing

Data were checked for consistency with removal of characters in numerical fields. When a numeric value was preceded by a “<” sign, the entry was replaced with the mean between 0 and that value. From the existing information, the feature ratios NEU/LYMPH, MON/LYMPH, EOS/LYMPH, PLT/LYMPH, BASO/LYMPH, and PLT/NEU were calculated and included in the analyses, while excluding the following original features: NEU, EOS, MON, platelets (PLT), basophils (BASO), and LYMPH. Three extreme outlier values were removed from data on the subgroup of SARS-CoV-2 positive patients.

Model Building

The following ML algorithms were used for the task of classification: logistic regression (LogisticRegression), k-nearest neighbor (KNeighborsClassifier), random forest (RandomForestClassifier), AdaBoost (AdaBoostClassifier), bagging (BaggingClassifier), gradient boosting (GradientBoostingClassifier), and support vector machines (support vector machine classifier, SVC). After a training–test split of size 75:25 was applied, the training data were passed to the pipeline using Python scikit packages. In the pipeline used to separate SARS-CoV-2 positive from negative patients, categorical features are encoded as an integer array and missing values are imputed with univariate feature importance using the “most frequent” statistics of each respective column. Numerical features are log-transformed, missing values are imputed using the k-nearest neighbors method, and the features are transformed to the default range of minimum 0 and maximum 1. The pipeline used for predicting the outcome of patients with COVID-19 in addition uses a feature selector to remove low-variance features with threshold 0.95 × (1 − 0.95) for numerical and categorical values. For the longitudinal laboratory parameters, the symbolic aggregate approximation (SAX) pipeline was used, which in a first step uses the SAX Transformer package to reduce a variable number of time-series laboratory parameters over the course of the disease into a single parameter reflecting the time series [17]. The single parameter used here is a string of length n = 2 where the first symbol represents the first half of the time series and the second symbol the second half, and which contains a = 2 (a, B) different symbols to discretize the laboratory parameters where a denotes low levels and B high levels. This results in the four SAX-coded clusters aa, aB, Ba, and BB. The parameter choice was based on the decision to create the minimum number of SAX clusters that allowed for observing differences in the mean laboratory values between deceased and surviving patients in each cluster upon visual inspection of the plots. After discretization of the longitudinal parameter patterns, missing values were imputed with univariate feature importance using the “constant” strategy with fill value “missing”, followed by encoding the categorical features as a one-hot numeric array omitting category “missing”. The procedure to find the best model was to apply a training–test split of 75:25 and to perform a grid search to find the best hyperparameters with a fivefold cross-validation with accuracy as score value. The best model for each algorithm was considered to be the estimator with the best mean accuracy on the held-out cross-validation split, and was afterwards refitted on all training data. The refitted models were then run on the so far unseen test data using the same estimator parameters including pre-processing transformations and the distribution statistics learned from the training data to achieve the test accuracy score. This was repeated 15 times with random training–test splits and the best hyperparameter set for each algorithm is taken from the respective model achieving the highest test accuracy score in these 15 runs. The best model was then used to calculate feature importance on the training and test data together using the Sequential Feature Selector method with the parameters to select 15 features, to add features to form a feature subset in a greedy fashion, to use fivefold cross-validation, and to use accuracy as scoring method. The 10 highest scoring features of each algorithm were then assigned with a number of 1–10 to calculate the median feature importance.

Survival Analysis

Survival analysis was based on the laboratory measurements on admission to hospital and patient information on comorbidities and reported medications. Data pre-processing excluded features with more than 40% missing values and three patients without initial laboratory measurements. Comorbidities with less than 5% observations and pre-medication categories were also excluded. Missing values were imputed using the Multiple Imputation by Chained Equations (MICE) method in the Python implementation miceforest. To perform the survival analysis, the following ML algorithms were used: random survival forest with an ensemble of tree-based learners (RandomForestSurvival), gradient boosting with a regression tree base learner (GradientBoostingSurvivalAnalysis), and the Cox proportional hazards model (CoxPHSurvivalAnalysis) from the Python scikit-survival package. Best estimator, hyperparameter optimization, and cross-validation were performed in a sequential manner using a common pipeline. The accuracy of the decision tree to separate SARS-CoV-2 negative from positive patients was based on the accuracy scores obtained during the tenfold cross-validation procedure. Feature importance was calculated using the Permutation Feature Importance method. Best models were determined by the highest median concordance score during the tenfold cross-validation procedure. Using the features in the best-performing ten models, we determined feature importance by calculating the mean weight of the features in all of the ten models, and the feature ranking by taking the median of the rankings in ten best models.

Results

SARS-CoV-2 Positive and Negative Patients Did Not Differ Markedly in Demographic Features, Symptoms, and Comorbidities on Admission to Hospital

Clinical and laboratory data were retrieved from the patients’ database. To reduce matrix sparsity, the comorbidities and self-reported medication features were combined into broader categories (appendix). Features that still affected less than 5% of patients were excluded from further analyses. As some features were found to have an imbalanced fraction of missing values for SARS-CoV-2 positive and negative patients, features with more than 40% missing values in any of the groups were excluded from further analyses (appendix, Fig. S1 in the supplementary material).

Patients admitted to hospital who tested positive for infection with SARS-CoV-2 had a similar sex and age distribution as SARS-CoV-2 negative patients (Table 1). They had direct contacts with confirmed SARS-CoV-2 positive persons more frequently, and residents of health facilities or nursing homes were disproportionately more often infected with SARS-CoV-2 (Table S1 in the supplementary material). On admission, oxygen saturation was slightly lower in SARS-CoV-2 positive compared to negative patients, and a bigger proportion of positive patients had body temperature below 37 °C. On admission to hospital, patients reported symptoms in the week prior to hospitalization, and disclosed information on regularly taken medication, and perceived comorbidities. The only symptom that was reported by a considerable number of patients and over-represented in the SARS-CoV-2 positive group was cough (Table 1). Comorbidities that were significantly more prevalent in SARS-CoV-2 positive patients were hypertension and dementia (Table S2 in the supplementary material), yet dementia might be associated with its general high prevalence in healthcare facilities and nursing homes. Medication for cardiovascular diseases was the only reported medication category with a significant difference between SARS-CoV-2 positive and negative patients (Table S3 in the supplementary material), reflecting a higher prevalence of hypertension in the SARS-CoV-2 positive group. As patients who tested negative for SARS-CoV-2 were either transported to other hospitals or discharged home, the duration of hospitalization, death, and ICU data between COVID-19 positive and negative patients cannot be compared. In summary, SARS-CoV-2 positive and negative patients had rather similar demographics, clinical presentation, and medical history on admission to hospital. Altogether, this points to the rather limited value of symptoms, demographics, and self-reported data in predicting SARS-CoV-2 positivity on admission to hospital and calls for the more objective measurements to build the prediction model.

Table 1 Statistical test for differences in patient characteristics and symptoms on admission to hospital or in the week preceding hospital admission between patients that tested positive or negative for an infection with SARS-CoV-2

Full size table

Machine Learning Accurately Distinguished SARS-CoV-2 Positive from Negative Patients, Based on Full Blood Count and Procalcitonin Only

On the basis of the findings above, we aimed to assess the possibility of predicting SARS-CoV-2 infection according to the standard laboratory parameters acquired on admission to hospital, before PCR results become available. On the basis of a classical statistical analysis, 20 parameters reached statistical significance with a p value smaller than 0.05 (Table 2). Interestingly, differences in WBC and NEU counts, and in the ratios of different leukocyte cells and platelet counts between SARS-CoV-2 positive and negative patients reached more significant differences than any single inflammatory marker alone. However, the high number of significantly different parameters hinders their interpretation in everyday practice and does not provide us with additional clinical understanding.

Table 2 Laboratory parameters for all patients, and separately for SARS-CoV-2 positive and negative patients

Full size table

Therefore, to better distinguish SARS-CoV-2 positive from negative patients, we used the ML algorithms logistic regression, k-nearest neighbor, random forest, AdaBoost, bagging, gradient boosting, and SVC. Using initial simulation experiments, we evaluated the optimal split into training and test data. We found that a test–training split size of 25:75 leads to the best trade-off between minimizing the difference between accuracies on the training and cross-validation data, and maximizing test accuracy (appendix, Fig. S2 in the supplementary material). This 25:75 test–training split was done 15 times, and the seven algorithms were run on this data set (Table S4 in the supplementary material). The classifier metric here was the accuracy score on distinguishing SARS-CoV-2 positive and negative patients. The performance of the different models was evaluated and gave a mean test accuracy of 0.76. The gradient boosting algorithm achieved the highest accuracy (Table S4 in the supplementary material). To overcome the rather low overlap between sets of the most important features given out by each algorithm, we calculated a median feature importance over all ML algorithms (Table S5 in the supplementary material). It enabled us to determine the decisive laboratory parameters: WBC, antibody-synthesizing lymphocytes (AS-LYMPH), procalcitonin (PCT), basophils/lymphocytes ratio (BASO/LYMPH), platelets/neutrophils ratio (PLT/NEU), monocytes/lymphocytes ratio (MON/LYMPH), creatinine (CREAT), and CRP (Fig. 1a).

Next, we created a decision tree using these decisive laboratory parameters to evaluate patients with a suspected SARS-CoV-2 infection, providing an additional diagnostic tool before the PCR test results are available (Fig. 1b). Patients with a WBC count lower than or equal to 6.9 × 10³ were mostly negative, if the antibody-synthesizing lymphocytes count was higher than 0.01 × 10³. If the AS-LYMPH count was lower than or equal to 0.01 × 10³, most patients were still negative if the MON/LYMPH was lower than or equal to 0.89. The majority of patients with WBC count higher than 6.9 × 10³ were positive, especially when the MON/LYMPH was lower than 1.79 and combined with PCT levels higher than 1.9, or when the MON/LYMPH was higher than 1.79 and combined with the PLT/NEU lower than 18. The decision tree therefore reached an accuracy of up to 90–100%. In summary, mainly the counts of various blood cells and their ratios and the levels of PCT can provide an estimate of a patient’s infection status, which is in contrast to the high number of laboratory parameters that showed significant differences using classical statistical tests (Table 1, Table S6 in the supplementary material).

Levels of Inflammatory Parameters, Troponin I, Blood Cell Counts, and Age Could Predict the Fatal Outcome of COVID-19 on Admission to Hospital

After diagnosis of COVID-19 is confirmed, it is crucial to predict the potential course of the disease and the risk of fatal outcomes, and to provide patients with cautious monitoring followed by tailored diagnostic and therapeutic strategies. To do so, we again performed classical statistical analyses and applied ML approaches to compare initial laboratory results on admission to hospital and reported medications, comorbidities, and demographics between patients who survived or died in hospital. Numerous laboratory parameters were significantly different at a 1% significance level on admission between surviving or deceased patients with COVID-19 (Table 3, Table S7 in the supplementary material). Most of the laboratory features were significantly elevated in patients who later died during hospitalization, including parameters of inflammation—PCT, CRP, and ferritin (FER) (Fig. 2a), tissue damage—CREAT, aspartate aminotransferase (AST), troponin I (TnI), creatine kinase-myocardial band (CK-MB), LDH, creatine kinase (CK) (Fig. 2b), coagulation—DD, prothrombin time (PT), activated partial thromboplastin time (APTT) (Fig. 2c), and CBC parameters related to an inflammatory response—WBC, neutrophil reactivity (NEU-RE), neutrophils (NEU), and relevant CBC ratios—neutrophils/lymphocytes (NEU/LYMPH) and platelets/lymphocytes (PLT/LYMPH) (Fig. 2d, e). Importantly, those patients also had significantly lower levels of hemoglobin (HGB), LYMPH, eosinophils (EOS), monocytes (MON), ratios of eosinophils/lymphocytes (EOS/LYMPH) and PLT/NEU (Fig. 2d, e). Additionally, analyzing patients’ characteristics and reported comorbidities by hypergeometric testing, we found that deceased patients were older and more often affected with renal disorders or anemia (Table S2 in the supplementary material). They also more often reported hypertension or other cardiovascular diseases, which was supported by the higher percentage of deceased patients taking antithrombotic and anti-cardiovascular disease medications prior to hospitalization (Tables S2, S3 in the supplementary material). Nonetheless, it is important to note that reported data on the past medical history may be incomplete or inconsistent.

Table 3 Laboratory parameters for all patients with COVID-19, and separately for patients with COVID-19 who survived or died

Full size table

Similar to the separation of SARS-CoV-2 positive and negative patients, the high number of significantly different variables between patients who died or survived makes it difficult to implement them in everyday clinical practice. Therefore, we used ML survival analysis to determine the best set of features to predict the risk of dying from COVID-19 already on admission to hospital. The feature space in the complete survival analysis included the on-admission laboratory results, patients’ demographics, reported comorbidities, and medications. After a 25:75 test–training split, the training data were further divided ten times into a training and validation set for the tenfold cross-validation. Three different algorithms were run with specified hyperparameter ranges on these training validation sets. With this cross-validation, the model with the highest median score was evaluated, and then run on the so far unseen test data. This was repeated for ten different test–training splits (appendix). Analysis of the ML results revealed that, even though the models differed to some extent, PCT, TnI, age, HGB, PLT/NEU, and CRP were amongst the most important prognostic features (Fig. 2f). This was further confirmed by calculating the importance of all features across the top ten models using permutation (Fig. 2g).

In conclusion, SARS-CoV-2 positive patients who on admission to hospital were older, had higher inflammatory parameters (PCT, CRP) and TnI levels together with lower HGB and a lower PLT/NEU ratio already had a high risk of death from COVID-19.

Dynamic Changes in CRP, WBC Count, and DD During Hospitalization Allowed for Additional Prediction of COVID-19 Survival or Death

Providing patients with tailored diagnostic and therapeutic strategies reflecting the dynamics of the disease course should have an influence on the final COVID-19 outcomes. Therefore, we analyzed the laboratory parameters measured repeatedly during the hospitalization to identify those parameters whose changes should be most closely monitored, and serve as warning signs to intensify the treatment.

As a result of the nature of the time-series data with the variable number of measurements (Fig. S3 in the supplementary material), they needed to be parameterized before they could be used in the ML approaches. Such parameterizations are usually done using clustering methods. Yet, with the time-series data available here and their variability in the number of observations per time-series feature (Fig. S3 in the supplementary material), most commonly used clustering methods would just cluster for the length of the time series. Therefore, we used the SAX algorithm that allows for a symbolic representation of time-series data [17]. The parameter used here as symbolic representation is a string with length 2 where the first position represents the average over the first half of the time series and the second position the second half, and the two different symbols a and B denote low and high levels, respectively. This allows for the four different SAX clusters aa (started low, stayed low), aB (started low, increased), Ba (started high, decreased), BB (started high, stayed high). The SAX-clustered time-series data were then combined with self-reported comorbidities, demographics, and reported medications, and a full set of features was subjected to the same ML algorithms as previously on 15 different 75:25 training–test splits (Table S8 in the supplementary material). The classifier metric here was the accuracy score on predicting surviving versus deceased patients. Performance of different models was evaluated, giving a mean test accuracy of 0.82, which was considerably higher than the test accuracy achieved in separating the SARS-CoV-2 positive and negative patients. We then calculated the median feature importance for all the features over all algorithms. The features with median scoring values smaller than 11 were the SAX-clusters CRP BB, WBC aa, DD aa, and age (Table S9 in the supplementary material).

The CRP SAX-cluster BB with high levels of CRP, staying high during hospitalization predicted fatal outcomes from COVID-19 with only 20% survival rate, whereas cluster aa had a very good prognosis (Fig. 3a). Survival rate of patients who initially had high average levels of CRP (more than 6.95 mg/dl), which decreased during hospitalization (SAX-cluster Ba), was nearly two times higher than patients who showed the opposite pattern (SAX-cluster aB). For WBC, the most predictive cluster was aa in which patients started with lower average levels of WBC (less than 9.94 10³/μl) that stayed low during hospitalization, and which was associated with a survival rate over 80% (Fig. 3b). Patients whose WBC increased (cluster aB) had 20% less chance of survival compared to those whose WBC decreased (cluster Ba) over time. Importantly, all the measurements in this study were taken prior to the WHO recommendation for standard use of steroids in the treatment of severe or critical patients with COVID-19. Increasing or persistently high levels of WBC during the course of the disease that were associated with fatal disease outcomes were therefore unrelated to steroid therapy. Regarding DD levels, a survival rate of more than 80% was noted in patients with low average DD levels during hospitalization (cluster aa; average values less than 2.62 μg/ml). In patients with DD values high and decreasing over time (cluster Ba), the survival rate decreased to around 70%. Interestingly, both DD starting high and staying high (cluster BB) and DD starting low and increasing (cluster aB) were associated with around 40% chance of survival (Fig. 3c). Age, the only additional non-laboratory feature with a low median feature rank, has already been mentioned earlier as an important determinant of survival. In summary, monitoring the longitudinal patterns of CRP, WBC, and DD during hospitalization together with attention to patients’ age can predict COVID-19 fatal outcome or survival with high accuracy.

Discussion

In the current work we have shown that implementation of ML approaches allowed for the accurate prediction of a SARS-CoV-2 infection and of COVID-19-related outcomes. First, it enabled us to identify SARS-CoV-2 negative and positive patients admitted to hospital with comparable symptoms, demographics, and medical history, based solely on the on-admission CBC values WBC and AS-LYMPH, the ratios MON/LYMPH and PLT/NEU, and on the levels of PCT. In contrast to the high number of laboratory parameters that showed significant differences using classical statistical tests, the ML approach therefore selected those parameters and their combinations that best separated SARS-CoV-2 positive from negative patients on admission to hospital. To assist clinicians in the proper triage at an early stage, this subset of features was included in an easy-to-use medical decision tree. Compared with another decision tree described in the literature [18], our approach requires fewer laboratory parameters to be measured, which makes the decision tree more transparent and practical to use.

Interestingly, our finding that SARS-CoV-2 negative patients often had higher parameters of infection or tissue injury such as WBC or CRP than SARS-CoV-2 positive patients is in agreement with the results from previous studies comparing COVID-19 with other known infections of bacterial or viral origin [19, 20]. This reflects that SARS-CoV-2 initiated a lower inflammatory response in comparison to other bacterial and viral infections. Even though the individual selected laboratory parameters are not specific for SARS-CoV-2, the usage of the ML-selected parameter set and the medical decision tree allow for the efficient separation of positive patients, and for the initiation of appropriate treatment as soon as the laboratory data are available. This is especially useful in the situation of healthcare overload or limited availability of antigen and PCR tests when rapid risk stratification can also protect SARS-CoV-2 negative patients from becoming infected in the medical facility. In general, complete blood counters and serum analyzers are more widely available in hospitals of various reference levels than facilities to test infectious patient material for the presence of SARS-CoV-2. The time need to acquire the standard laboratory parameters fluctuates depending on the medical facility, but typically takes between 15 and 90 min, while obtaining PCR results takes 1–48 h. Ag-RDTs, which were not available at the time of our study, only take around 15 min, yet their availability, sensitivity, and specificity are variable, especially facing waves of new SARS-CoV-2 variants. Especially during phases with low incidences or when new SARS-CoV-2 variants are spreading, the clinical laboratory data would be expected to provide more stable diagnostic features, as the response of the organism is not expected to differ fundamentally, and the laboratory values are less prone to false positive and false negative identifications. Therefore, we are convinced that the subset of diagnostic laboratory parameters identified here can be instrumental in hospitals all over the world.

Next, we identified the best set of on-admission laboratory parameters and demographic features, which gave the highest prediction score of COVID-19-related death. We have shown that a few standard laboratory parameters measured on admission to hospital (PCT, TnI, HGB, PLT/NEU, and CRP) together with the age of the patients distinguished between surviving and deceased patients and provided early biomarkers of poor prognosis regardless of symptoms, comorbidities, sex, and previous medical history. In systematic reviews and meta-analyses in which a large number of features were evaluated for their consistent association with fatal COVID-19 outcomes based on laboratory parameters measured at admission to hospital, TnI and CRP were previously reported as prognostic features [21, 22]. This supports our ML-based approach, which allowed us to narrow down a large set of features to a subset of diagnostic markers in a single-center study. Other laboratory features included here, such as the different blood cell ratios, are less commonly reported in other studies, and are therefore not covered in meta-analyses. The use of ML to identify prognostic features hence brings the additional advantage that new, potentially predictive laboratory parameters can be included in the analyses. This best subset of identified prognostic parameters should be sufficient to alert the medical personnel, to increase the level of care, and to enable appropriate, quicker, and tailored medical decisions.

Finally, the concurrent inclusion of the time-course laboratory parameters into the modelling followed by feature selection allowed us to identify the progression of CRP, WBC, and DD as a small subset of significant diagnostic markers. These parameters should be carefully monitored during hospitalization because their alterations could herald the change of a patient’s fate.

While previously described AI models are often not coherent with regards to the predictive potential of various laboratory parameters, the majority outlined an important role of a CBC in screening for an infection with SARS-CoV-2 [13, 14, 23,24,25,26,27]. As our decision tree confirms the high predictive value of the CBC, it also emphasizes the specific impact of an infection with SARS-CoV-2 on the immune system that even includes changes of the physical phenotype of blood cells [28], which is an important aspect of its short- and long-term pathogenicity. The inclusion of continuous laboratory parameters that were measured at different time points and with a variable number of measurements during the hospital stay as features was accomplished previously, e.g., by using the laboratory values at the first, the last, and a random day followed by the training of three different ML models and parameter binning [29], or by taking the laboratory values at the first day and then two consecutive measurements in defined time intervals, followed by a logistic regression of the data at individual time points [30]. Here, we decided to reflect the time-course laboratory data through feature engineering using SAX clustering, because it allowed us to include the time-resolved development of laboratory parameters as features into the modelling approaches, and provided us with thresholds for the laboratory values. The results of our analyses corroborated with the previously reported risk of a fatal outcome that is associated with a specific progression of CRP and WBC values [29, 30]. Monitoring the most important diagnostic clinical parameters over time can aid in adjusting therapeutic interventions prior to clinical deterioration. Interestingly, the most important diagnostic parameters identified in the feature set including the time-course data (clusters of CRP, WBC, and DD) differed slightly from those obtained using the feature set with the clinical parameters obtained on admission to hospital (PCT, TnI, HGB, PLT/NEU, and CRP). However, WBC and DD are consistently associated with fatal COVID-19 outcomes in various meta-analyses [21, 22, 31, 32]. This emphasizes the importance of developing separate models for addressing different clinical questions.

Major limitations of our study, as for the majority of ML approaches, might be the single-center study group despite involving 515 and 201 patients in the screening and prognostic analyses, respectively. ML approaches require a thorough pre-processing and unification of the reported values and parameters, which is challenging to achieve across several centers. Nonetheless, large multicentered studies would be valuable here to confirm our results. A large fraction of our study group were individuals from long-term healthcare facilities, which reflects the way the virus spread during the first waves of the COVID-19 pandemic. This patient group is a priori more vulnerable to severe courses of the disease and potentially fatal outcomes, which could be responsible for the previously reported high fatality rates in hospitalized patients [33,34,35]. As a result of the differences in study designs and a variety of additional variables such as seasonal activity of influenza viruses, changes in the predominant SARS-CoV-2 variants, vaccination progress, and implemented safety measures, general conclusions should be drawn carefully. However, given that the general pathophysiological aspects of the disease have been well characterized [36, 37], we expect that the laboratory findings concerning the initial impact of an infection and differences indicative of an increased risk of a fatal disease are coherent amongst different SARS-CoV-2 variants. Further validation of these findings based on different ML approaches applied to a large variety of different cohort data is expected to consolidate on the most important predictive parameters, while revealing specific differences between different subpopulations or virus variants.

References

Aziz S, Arabi YM, Alhazzani W, et al. Managing ICU surge during the COVID-19 crisis: rapid guidelines. Intensive Care Med. 2020;46:1303–25.
Article CAS Google Scholar
Martín J, Tena N, Asuero AG. Current state of diagnostic, screening and surveillance testing methods for COVID-19 from an analytical chemistry point of view. Microchem J Devoted Appl Microtech Branch Sci. 2021;167:106305.
Venter M, Venter M, Richter K, Richter K. Towards effective diagnostic assays for COVID-19: a review. J Clin Pathol. 2020;73:370–7.
Article CAS Google Scholar
Schultz MJ, Gebremariam TH, Park C, et al. Pragmatic recommendations for the use of diagnostic testing and prognostic models in hospitalized patients with severe COVID-19 in low- and middle-income countries. Am J Trop Med Hyg. 2021;104:34–47.
CAS Google Scholar
Barros LM, Pigoga JL, Chea S, et al. Pragmatic recommendations for identification and triage of patients with COVID-19 disease in low- and middle-income countries. Am J Trop Med Hyg. 2021;104:3–11.
CAS Google Scholar
Huang C, Wang Y, Li X, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395:497.
Article CAS Google Scholar
Wang D, Hu B, Hu C, et al. Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan, China. JAMA. 2020;323:1061–9.
Article CAS Google Scholar
Izcovich A, Ragusa MA, Tortosa F, et al. Prognostic factors for severity and mortality in patients infected with COVID-19: a systematic review. PLoS One. 2020;15:e0241955.
Article CAS Google Scholar
Chen N, Zhou M, Dong X, et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet. 2020;395:507–13.
Article CAS Google Scholar
Yan W, Chen D, Bigambo FM, Wei H, Wang X, Xia Y. Differences of blood cells, lymphocyte subsets and cytokines in COVID-19 patients with different clinical stages: a network meta-analysis. BMC Infect Dis. 2021;21:1–9.
Article Google Scholar
Syrowatka A, Kuznetsova M, Alsubai A, et al. Leveraging artificial intelligence for pandemic preparedness and response: a scoping review to identify key use cases. NPJ Digit Med. 2021;4:1–14.
Article Google Scholar
Soltan AAS, Kouchaki S, Zhu T, et al. Rapid triage for COVID-19 using routine clinical data for patients attending hospital: development and prospective validation of an artificial intelligence screening test. Lancet Digit Health. 2021;3:e78–87.
Article Google Scholar
Wynants L, Van Calster B, Collins GS, et al. Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal. BMJ. 2020;369:26.
Google Scholar
Adamidi ES, Mitsis K, Nikita KS. Artificial intelligence in clinical care amidst COVID-19 pandemic: a systematic review. Comput Struct Biotechnol J. 2021;19:2833–50.
Article CAS Google Scholar
World Medical Association. World Medical Association declaration of Helsinki: ethical principles for medical research involving human subjects. J Am Med Assoc. 2013;310:2191–4.
Article Google Scholar
von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet. 2007;370:1453–7.
Article Google Scholar
Lin J, Keogh E, Lonardi S, Chiu B. A symbolic representation of time series, with implications for streaming algorithms. DMKD '03: proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery. 2003; p. 2–11. https://doi.org/10.1145/882082.882086.
Brinati D, Campagner A, Ferrari D, Locatelli M, Banfi G, Cabitza F. Detection of COVID-19 infection from routine blood exams with machine learning: a feasibility study. J Med Syst. 2020. https://doi.org/10.1007/s10916-020-01597-4.
Article Google Scholar
Kuang PD, Wang C, Zheng HP, et al. Comparison of the clinical and CT features between COVID-19 and H1N1 influenza pneumonia patients in Zhejiang, China. Eur Rev Med Pharmacol Sci. 2021;25:1135–45.
Google Scholar
Kong J, Hao Y, Wan S, et al. Comparative study of hematological and radiological feature of severe/critically ill patients with COVID-19, influenza A H7N9, and H1N1 pneumonia. J Clin Lab Anal. 2021. https://doi.org/10.1002/jcla.24100.
Article Google Scholar
Katzenschlager S, Zimmer AJ, Gottschalk C, et al. Can we predict the severe course of COVID-19—a systematic review and meta-analysis of indicators of clinical outcome? PLoS One. 2021;16:e0255154.
Article CAS Google Scholar
Kiss S, Gede N, Hegyi P, et al. Early changes in laboratory parameters are predictors of mortality and ICU admission in patients with COVID-19: a systematic review and meta-analysis. Med Microbiol Immunol. 2021;210:33.
Article CAS Google Scholar
Du R, Tsougenis ED, Ho JWK, et al. Machine learning application for the prediction of SARS-CoV-2 infection using blood tests and chest radiograph. Sci Rep. 2021;11:14250.
Article CAS Google Scholar
Cobre AF, Stremel DP, Noleto GR, et al. Diagnosis and prediction of COVID-19 severity: can biochemical tests and machine learning be used as prognostic indicators? Comput Biol Med. 2021;134:104531.
Article CAS Google Scholar
Cabitza F, Campagner A, Ferrari D, et al. Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests. Clin Chem Lab Med. 2021;59:421–31.
Article CAS Google Scholar
Tschoellitsch T, Dünser M, Böck C, Schwarzbauer K, Meier J. Machine learning prediction of SARS-CoV-2 polymerase chain reaction results with routine blood tests. Lab Med. 2021;52:146–9.
Article Google Scholar
Banerjee A, Nasir JA, Budylowski P, et al. Isolation, sequence, infectivity, and replication kinetics of severe acute respiratory syndrome coronavirus 2. Emerg Infect Dis. 2020;26:2054–63.
Article CAS Google Scholar
Kubánková M, Hohberger B, Hoffmanns J, et al. Physical phenotype of blood cells is altered in COVID-19. Biophys J. 2021;120:2838–47.
Article Google Scholar
Stachel A, Daniel K, Ding D, Francois F, Phillips M, Lighter J. Development and validation of a machine learning model to predict mortality risk in patients with COVID-19. BMJ Heal Care Inform. 2021;28:e100235.
Zhang JJ, Cao YY, Tan G, et al. Clinical, radiological, and laboratory characteristics and risk factors for severity and mortality of 289 hospitalized COVID-19 patients. Allergy. 2021;76:533–50.
Article CAS Google Scholar
Hariyanto TI, Japar KV, Kwenandar F, et al. Inflammatory and hematologic markers as predictors of severe outcomes in COVID-19 infection: a systematic review and meta-analysis. Am J Emerg Med. 2021;41:110–9.
Article Google Scholar
Malik P, Patel U, Mehta D, et al. Biomarkers and outcomes of COVID-19 hospitalisations: systematic review and meta-analysis. BMJ Evid Based Med. 2021;26:107–8.
Article Google Scholar
McMichael TM, Currie DW, Clark S, et al. Epidemiology of COVID-19 in a long-term care facility in King County, Washington. N Engl J Med. 2020;382:2005–11.
Article CAS Google Scholar
Adlhoch C, Kinross P, Melidou A, et al. High impact of COVID-19 in long-term care facilities, suggestion for monitoring in the EU/EEA, May 2020. Eurosurveillance. 2020;25:2000956.
Google Scholar
Thompson DC, Barbu MG, Beiu C, et al. The impact of COVID-19 pandemic on long-term care facilities worldwide: an overview on international issues. Biomed Res Int. 2020;2020:8870249.
Osuchowski MF, Winkler MS, Skirecki T, et al. The COVID-19 puzzle: deciphering pathophysiology and phenotypes of a new disease entity. Lancet Respir Med. 2021;9:622–42.
Article CAS Google Scholar
Peiris S, Mesa H, Aysola A, et al. Pathological findings in organs and tissues of patients with COVID-19: a systematic review. PLoS One. 2021;16:e0250708.
Article CAS Google Scholar

Download references

Acknowledgements

We are grateful for the patients and medical personnel involved in this study.

Funding

This work has been supported by the Medical University of Lodz, Poland, by the Swiss National Science Foundation (SNSF) Grant No 310030_189334; Swiss Institute of Allergy and Asthma Research (SIAF), and by the Center for Data Analysis, Visualization and Simulation (DAViS) that is funded by the Swiss canton of Grisons. The Rapid Service Fee was covered by the SNSF open access funds.

Author Contributions

Filip Styrzynski and Damir Zhakparov are co-first authors of equal contribution, and Joanna Makowska, Milena Sokolowska and Katja Baerenfaller are co-senior authors of equal contribution. Milena Sokolowska, Katja Baerenfaller, and Joanna Makowska conceived the project. Filip Styrzynski, Joanna Makowska, Zuzanna Lukasik, Julia Solek, Jakub Nowicki, and Milosz Dobrogowski collected and provided the data. Marco Schmid, Damir Zhakparov, and Katja Baerenfaller analyzed the data. Damian Roqueiro and Katja Baerenfaller guided and navigated the process of data analysis. Damir Zhakparov, Filip Styrzynski, Marco Schmid, Katja Baerenfaller, and Milena Sokolowska made the figures and tables. All authors had full access to all the data and accepted responsibility to submit for publication. Filip Styrzynski, Damir Zhakparov, Milena Sokolowska, and Katja Baerenfaller wrote the manuscript with the input from all the coauthors, and all the authors consented to the final version of the manuscript. Milena Sokolowska, Katja Baerenfaller, Marco Schmid, Joanna Makowska, and Filip Styrzynski have directly accessed and verified the underlying data reported in the manuscript. Katja Baerenfaller and Milena Sokolowska are guarantors who are responsible for the overall content. The corresponding authors attest that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Disclosures

Milena Sokolowska reports research grants from Swiss National Science Foundation, GSK, Novartis and speakers fee from AstraZeneca and a board chair position of the Basic and Clinical Immunology Section of the European Academy of Allergy and Clinical Immunology (EAACI). Filip Styrzynski, Damir Zhakparov, Marco Schmid, Damian Roqueiro, Zuzanna Lukasik, Julia Solek, Jakub Nowicki, Milosz Dobrogowski, Joanna Makowska, and Katja Baerenfaller declare that they have nothing to disclose.

Compliance with Ethics Guidelines

The project was performed in accordance with the Declaration of Helsinki [15], and was accepted by the Ethical Committee of Medical University of Lodz, Poland (Nr. RNN/126/20/KE). The patients gave informed consent for participation in this study.

Data Availability

The pre-processed data after patient de-identification are accessible to researchers upon request for data sharing to the corresponding authors.

Code Availability

The code is available online at https://github.com/dzhakparov/covid19-swiss-polish.

Author information

Filip Styrzynski and Damir Zhakparov are co-first authors.
Joanna Makowska, Milena Sokolowska, and Katja Baerenfaller are senior and corresponding authors.

Authors and Affiliations

Department of Rheumatology with Subdepartment of Internal Medicine, Medical University of Lodz, 90-419, Lodz, Poland
Filip Styrzynski, Zuzanna Lukasik & Joanna Makowska
Swiss Institute of Allergy and Asthma Research (SIAF), University of Zurich, Herman-Burchard-Strasse 9, 7265, Davos, Switzerland
Damir Zhakparov, Zuzanna Lukasik, Milena Sokolowska & Katja Baerenfaller
Swiss Institute of Bioinformatics, Lausanne, Switzerland
Damir Zhakparov, Damian Roqueiro & Katja Baerenfaller
University of Applied Sciences of the Grisons, 7000, Chur, Switzerland
Marco Schmid
Department of Biosystems Science and Engineering, ETH Zurich, 4058, Basel, Switzerland
Damian Roqueiro
Department of Pathology, Chair of Oncology, Medical University of Lodz, 90-419, Lodz, Poland
Julia Solek
Department of Biostatistics and Translational Medicine, Medical University of Lodz, 90-419, Lodz, Poland
Julia Solek
Department of Paediatrics, Newborn Pathology and Bone Metabolic Diseases, Medical University of Lodz, 90-419, Lodz, Poland
Jakub Nowicki
Maria Sklodowska-Curie Specialty Voivodeship Hospital, 95-100, Zgierz, Poland
Milosz Dobrogowski
Christine Kühne – Center for Allergy Research and Education (CK-CARE), 7265, Davos, Switzerland
Milena Sokolowska

Authors

Filip Styrzynski
View author publications
You can also search for this author in PubMed Google Scholar
Damir Zhakparov
View author publications
You can also search for this author in PubMed Google Scholar
Marco Schmid
View author publications
You can also search for this author in PubMed Google Scholar
Damian Roqueiro
View author publications
You can also search for this author in PubMed Google Scholar
Zuzanna Lukasik
View author publications
You can also search for this author in PubMed Google Scholar
Julia Solek
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Nowicki
View author publications
You can also search for this author in PubMed Google Scholar
Milosz Dobrogowski
View author publications
You can also search for this author in PubMed Google Scholar
Joanna Makowska
View author publications
You can also search for this author in PubMed Google Scholar
Milena Sokolowska
View author publications
You can also search for this author in PubMed Google Scholar
Katja Baerenfaller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Joanna Makowska, Milena Sokolowska or Katja Baerenfaller.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 10785 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which permits any non-commercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc/4.0/.

Reprints and permissions

About this article

Cite this article

Styrzynski, F., Zhakparov, D., Schmid, M. et al. Machine Learning Successfully Detects Patients with COVID-19 Prior to PCR Results and Predicts Their Survival Based on Standard Laboratory Parameters in an Observational Study. Infect Dis Ther 12, 111–129 (2023). https://doi.org/10.1007/s40121-022-00707-8

Download citation

Received: 25 July 2022
Accepted: 27 September 2022
Published: 04 November 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s40121-022-00707-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Machine Learning Successfully Detects Patients with COVID-19 Prior to PCR Results and Predicts Their Survival Based on Standard Laboratory Parameters in an Observational Study

Abstract

Introduction

Methods

Results

Conclusions

Similar content being viewed by others

Prediction of COVID-19 deterioration in high-risk patients at diagnosis: an early warning score for advanced COVID-19 developed by machine learning

Modelling the risk of hospital admission of lab confirmed SARS-CoV-2-infected patients in primary care: a population-based study

A novel severity score to predict inpatient mortality in COVID-19 patients

Introduction

Methods

Patient Data Collection

Laboratory Measurements

Machine Learning Approaches

Data Pre-processing

Model Building

Survival Analysis

Results

SARS-CoV-2 Positive and Negative Patients Did Not Differ Markedly in Demographic Features, Symptoms, and Comorbidities on Admission to Hospital

Machine Learning Accurately Distinguished SARS-CoV-2 Positive from Negative Patients, Based on Full Blood Count and Procalcitonin Only

Levels of Inflammatory Parameters, Troponin I, Blood Cell Counts, and Age Could Predict the Fatal Outcome of COVID-19 on Admission to Hospital

Dynamic Changes in CRP, WBC Count, and DD During Hospitalization Allowed for Additional Prediction of COVID-19 Survival or Death

Discussion

References

Acknowledgements

Funding

Author Contributions

Disclosures

Compliance with Ethics Guidelines

Data Availability

Code Availability

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s Note

Supplementary Information

Supplementary file1 (PDF 10785 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation