Prediction of COVID-19 deterioration in high-risk patients at diagnosis: an early warning score for advanced COVID-19 developed by machine learning

Purpose While more advanced COVID-19 necessitates medical interventions and hospitalization, patients with mild COVID-19 do not require this. Identifying patients at risk of progressing to advanced COVID-19 might guide treatment decisions, particularly for better prioritizing patients in need for hospitalization. Methods We developed a machine learning-based predictor for deriving a clinical score identifying patients with asymptomatic/mild COVID-19 at risk of progressing to advanced COVID-19. Clinical data from SARS-CoV-2 positive patients from the multicenter Lean European Open Survey on SARS-CoV-2 Infected Patients (LEOSS) were used for discovery (2020-03-16 to 2020-07-14) and validation (data from 2020-07-15 to 2021-02-16). Results The LEOSS dataset contains 473 baseline patient parameters measured at the first patient contact. After training the predictor model on a training dataset comprising 1233 patients, 20 of the 473 parameters were selected for the predictor model. From the predictor model, we delineated a composite predictive score (SACOV-19, Score for the prediction of an Advanced stage of COVID-19) with eleven variables. In the validation cohort (n = 2264 patients), we observed good prediction performance with an area under the curve (AUC) of 0.73 ± 0.01. Besides temperature, age, body mass index and smoking habit, variables indicating pulmonary involvement (respiration rate, oxygen saturation, dyspnea), inflammation (CRP, LDH, lymphocyte counts), and acute kidney injury at diagnosis were identified. For better interpretability, the predictor was translated into a web interface. Conclusion We present a machine learning-based predictor model and a clinical score for identifying patients at risk of developing advanced COVID-19. Supplementary Information The online version contains supplementary material available at 10.1007/s15010-021-01656-z.


Introduction
In December 2019, a cluster of severe pneumonia occurred in the city of Wuhan, China. The causative pathogen was identified as a new betacoronavirus [1]. It was later named the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) and the infectious disease was termed coronavirus disease 2019 (COVID-19) [2]. As of September 2020, more than 32 million infections were reported worldwide and over 970,000 people had died [3]. Course and outcome of patients with COVID-19 are heterogeneous. While most SARS-CoV-2 infected patients are asymptomatic or Carolin E.M. Jakob, Ujjwal Mukund Mahajan and Marcus Oswald, and Hans Stubbe, Lukas Tometten and Rainer König contributed equally to this work.
The members of The LEOSS study group are listed in the acknowledgements section. exhibit mild symptoms, some deteriorate to the complicated stage and require medical treatment and hospitalization. COVID-19 symptoms can deteriorate within hours of hospital admission prompting need for oxygen supply or transfer to the intensive care unit [4,5]. Hence, identifying patients at this early stage of the disease is of paramount importance in medical decision-making regarding follow-up, hospitalization, and decision for medical treatment.
Many studies investigated predictors for progression to critical COVID-19, which was defined as admission to an intensive care unit (ICU) or need for mechanical ventilation [6][7][8][9][10]. However, predictors for a COVID-19 deterioration causing oxygen therapy, have been rarely studied so far [11][12][13]. Depending on the clinical perspective, this stage of the disease is denoted in the literature as severe, but not critical [14][15][16] or moderate, but not severe [11,13]. To avoid misinterpretations of our analysis, in the following, we use the term advanced COVID-19 disease stage for this stage of the disease and this was used as our endpoint to be predicted. Patients presenting with asymptomatic SARS-CoV-2 infection or mild COVID-19 who are at risk for clinical deterioration benefit from close monitoring, swift medication and supportive measurements [17]. Further, patients at risk may benefit from early therapeutic agents for COVID-19 [14,16]. In addition, due to the high prevalence of long-term COVID-19 symptoms and the association of severity of COVID-19 and severity of long-term COVID-19 symptoms [18][19][20], the need for medical interventions avoiding COVID-19 disease progression in patients at risk is further emphasized.
Here, we present a predictor and score (SACOV-19, Score for the prediction of an Advanced disease stage of COVID- 19) resulting from a robust risk-stratification algorithm to assess if a patient is at risk of developing the advanced COVID-19 disease stage, based on data available at the day of the first positive SARS-CoV-2 test. By identifying patients at risk with a high probability for advanced COVID-19, our score aims at supporting clinical decision making for these patients presenting with asymptomatic SARS-CoV-2 infection or mild COVID-19. A low predicted risk could support out-patient management. A high predicted risk could promote close follow-up, hospitalization or enter risk-benefit assessments regarding medical treatment.
The algorithm and SACOV-19 were developed using state-of-the-art machine learning methods and based on patient variables from the study cohort of the Lean European Open Survey on SARS-CoV-2 Infected Patients (LEOSS). LEOSS is a large multicenter cohort of medically supervised patients with predominant hospital contact [21]. The algorithm and SACOV-19 were assessed by a temporal validation using the LEOSS data. The algorithm is implemented in a browser-based web application enabling straightforward usage of our predictor in future clinical studies and to make it accessible to the research community.

Patient population and data collection
The prediction algorithm and SACOV-19 were developed and validated on patient data from LEOSS, the multicenter international COVID-19 registry comprising over 7000 patients collected in more than 100 study sites (http:// www. leoss. net). Inclusion criteria for LEOSS were a laboratory confirmed SARS-CoV-2 infection from any respiratory material and clinical information available on follow-up until the end of the treatment (recovery or death). The day of the first SARS-CoV-2 diagnosis was referred to as the baseline time point. Documentation in LEOSS was performed retrospectively and anonymous. All patients' variables, and also rational data such as age, BMI or laboratory data was collected in categories. Due to the anonymous data collection, written informed consent of the participants was waived by the respective ethics committees. For patients, recruited in Turkey, informed consent was obtained from the participants upon request of the national ethics committee. To reduce the risk of reidentification, the data was additionally anonymized using the principles used for the LEOSS Public Use File (PUF) we described earlier [22]. Approval for LEOSS data collection and analysis was obtained by the applicable local ethics committees of all participating centers and registered at the German Clinical Trials Registry (DRKS, No. S00021145).
In this study, patients were included who were asymptomatic or exhibited mild symptoms (symptoms of the upper respiratory tract, fever, nausea, emesis or diarrhea) at baseline. Progression to a complicated or severe stage of COVID-19 during medical consultation/observational period was set as the endpoint (denoted as advanced COVID-19 stage). Since COVID-19 is a multi-organ disease, any incident organ failure during the disease was considered a complication. It was defined by the occurrence of at least one of the following symptoms during the observational period (complicated or critical COVID-19 stage according to LEOSS criteria [21]): need for new oxygen supplementation due to clinical deterioration, oxygen saturation (SO 2 ) at room air < 90%, partial pressure of oxygen (PaO 2 ) at room air < 70 mmHg, clinically meaningful increase of oxygen supplementation compared to prior oxygen home therapy, increase of aspartate aminotransferase (AST) or alanine aminotransferase (ALT) > 5 × ULN (upper limit of normal), new cardiac arrhythmia, new pericardial effusion > 1 cm or new heart failure with pulmonary edema, congestive hepatopathy or peripheral edema, catecholamine therapy, life-threatening cardiac arrhythmia, liver failure with an INR > 3.5 (Quick < 50%), a qSOFA score of ≥ 2 or acute renal failure with need of dialysis. The baseline data comprised patient characteristics, symptoms, co-morbidities, known microbiological colonization, preexisting medication, and laboratory and vital parameters.
We excluded patients with advanced COVID-19 stages at baseline. Furthermore, for the development of the algorithm and SACOV-19, we excluded patients with no documented information on laboratory or vital data (n = 279). Patients enrolled between 16 March and 14 July 2020 were included for the development of the method (discovery cohort). Patients enrolled between 15 July and 16 February 2021 were used for validation (validation cohort).

Machine-learning and computation of SACOV-19
The workflow All the aspects of data reporting, predictive modeling and validation reporting were performed in accordance with the TRIPOD guidelines [23]. To derive the machine learning based and the score based (SACOV-19) predictor, the following steps were performed (Fig. 1A): 1. Baseline data were preprocessed to calculate baseline variables (binary features). 2. The patient cohort of the discovery cohort was separated into a training and a test set. 3. Machine learning was performed based on all baseline variables and data of the training set yielding a predictor based on all variables (base predictor). 4. To improve robustness and interpretability, variables with low impact were iteratively removed. A predictor ("slim predictor") with a reduced number of variables (n = 61) and a minimalistic predictor with n = 20 variables was obtained, the selection based on the performance on the test set. 5. SACOV-19 was developed by reducing the variables of the minimalistic predictor following a modified dynamic programming approach. 6. A browser-based web application of the minimalistic predictor and SACOV-19 was implemented. 7. SACOV-19 and the minimalistic predictor were evaluated using the data from the validation cohort.

Identifying a predictor using the baseline variables
Using the data from the discovery cohort, patients were randomly separated into an endpoint-balanced training (80%) and a test set (20%). Endpoint balancing was achieved by stratification of the classes by inducing the sampling rate of patients progressing to advanced COVID-19 and reducing the sampling rate of patients not progressing to advanced COVID-19. Binary variables were defined for all baseline patient characteristics. To note, since in the LEOSS database also rational variables were given in categories, no information was lost by this binarization. Missing values or data documented as "unknown", "not measured" or "not detected" were incorporated in the design of the binary variables. For details of the binary variable computation see Supplementary Text 1. These binary variables were used in the following data processing. The base predictor was constructed using the H2O.ai platform (https:// www. h2o. ai) selecting automatically (with h2o.automl) the best suitable machine learning method on the training set. To save computational time, the selection of methods was limited to random forests, gradient boosting machines (gbm), extreme gradient boosting (XGBoost) and StackedEnsemble. The parameters of each method were optimized employing an internal tenfold cross-validation on the training set. The optimal method was then applied to the test set to assess the final performance. In each loop, the best performing predictor was identified from all obtained predictors using the performance measure logloss. The selection of predictors was based on the area under the curve (AUC > 0.75) and logloss < 0.50. A schematic representation of the procedure is shown in Supplementary Figure S1. Variables associated with the "base predictor" were selected according to their scaled importance above 0.05 to obtain the "slim predictor" which based on a reduced set of variables (n = 61). To obtain the best performing predictor based on a minimalistic set of variables, variables of the "slim predictor" were ranked according to their scaled importance. Of these, a smaller set of variables (n = 60) was selected by leaving out the lowest ranking variable, a new predictor trained on the training set and its performance evaluated on the test set. Again, the lowest ranking variable on the remaining set of variables was removed, a new predictor generated and tested in the same way. This procedure was repeated until no variable remained. Out of these predictors, the minimalistic predictor was selected showing the best tradeoff between good performance and minimal set of variables (see "Results", XGBoost predictors). The robustness of the minimalistic predictor was evaluated by constructing supplementary (mutated) predictors leaving out one variable at a time. To estimate the robustness, the performance of these mutated predictors was compared to the performance of the minimalistic (wildtype) predictor. For the minimalistic predictor, a graphical user interface was implemented in R using the package Shiny and ggplot2. The computational core consists of functionalities employing the packages h2o and lime.

Identifying discriminative single variables and the score (SACOV-19)
We estimated the discriminative power of each individual patient variable using the discovery set. The predictive power of each variable was estimated based on balanced accuracy. Patients with missing values for the tested variable were omitted. To identify the score (SACOV-19), we used the variables selected for the minimalistic predictor and combined up to a maximum of 16 variables into a predictive score. Each selected variable counted + 1. Together with a threshold T, the score predicted an advanced COVID-19 stage if at least T many of the (binary) variable values of the evaluated score equaled "yes" (+ 1) for a concrete patient. Varying the threshold from 0 to the length of the score, we computed the AUC for each score. We started with computing all scores of lengths two and stored the best 1000 of them according to their AUC. Next, the variables of each of these 1000 scores of lengths two were combined with one of the remaining variables. Doing this for all remaining variables yielded a list of scores of lengths three. Subsequently, we selected the 1000 best scores according to their AUC. This dynamic-programming-like procedure was repeated until a list of 1000 best scores of lengths 16 was compiled. Note, that this heuristic works in reasonable computational time.
The rationale for this procedure was that we assumed that sub-scores of well performing scores also perform good. Indeed, we observed that bests-of-lists of length 200 (instead of 1000) yet comprised all the best scores. Out of the list of 16 best scores (with length 1-16), the optimal score was determined by selecting the score with the highest AUC on the test set of the 16 optimal scores. All data processing, modeling and assessment of performances was performed using R (version 3.6.3). Confidence intervals for the odds ratios were calculated using the package "fmsb_0.7.0" [24]. Further used packages were dplyr_1.0.5, h2o_3. 30

General characteristics of the study population
We included 3487 out of 6360 patients enrolled in LEOSS in our study, 1223/2819 patients for model discovery and 2264/3541 patients for validation (for details of the selection of patients, see "Methods" and Supplementary Figure  S2).

Identifying a predictor based on a large set of baseline variables
Our goal was to develop a predictor as the basis for deriving a score aiding the front-line physician identifying patients at risk developing Advanced COVID-19. We compiled 472 baseline patient variables (being present to the treating physician) as input for obtaining the "base predictor" and trained machines on data of the discovery cohort. Evaluating the performance on a test set (taken from the discovery cohort) (n = 244), the "base predictor" revealed decent performance  Fig. 1D. Further performance values are shown in Table S1. A predictor is estimated to be robust if it performs similar under varying input conditions [25]. We constructed predictors by randomly dropping single variables. We observed that this did not influence the performance ( Fig. 2A), reflecting the robustness of the minimalistic predictor. Hitherto, the results were based on patients containing missing values. To assess the impact of missing values on the predictive power, we applied the minimalistic predictor to data of patients without missing values for any of the 20 patient variables. We observed a slightly better prediction performance (on the validation set AUC = 0.77 ± 0.02, OR = 6.78 [95% CI 2.74-16.65] and balanced accuracy: 0.72 ± 0.01 using n = 124 patients, Table S3, Fig. 2B, Supplementary Figures S3C).
To summarize, we constructed and internally validated a minimalistic predictor based on 20 patient variables comprising patient characteristics such as age and body mass index, but also vital parameters such as body temperature, respiration and lung parameters, several blood laboratory parameters such as CRP, LDH and creatinine levels, and acute kidney injury at diagnosis. The predictor showed good and stable performance in predicting the development to the advanced COVID-19 stage.

Identifying a predictive score and the discriminative power of single variables
For clinical implementation, we developed an early warning score. Starting with the 20 variables from the minimalistic predictor, we applied the score optimization procedure (described in "Methods") and identified a predictive score (SACOV-19) based on 11 patient characteristics or 14 binary variables including three binary variables originating from the same categorical variables. The performance was similar as for the machine learning-based predictors (AUC 0.80 ± 0.01) for the discovery set. For the validation set, the AUC was 0.73 ± 0.01. The composition of SACOV-19 is shown in Table 2. A high sensitivity is of particular clinical relevance reducing misclassification of patients in need of hospitalization and close monitoring and who possibly could benefit for medical treatment. This can be achieved using lower thresholds. Selecting a threshold of four, we Removing patients with a missing value in at least one of the 14 binary variables, improved the performance for the discovery (AUC = 0.83 ± 0.02, n = 120 patients) and the validation set (AUC = 0.75 ± 0.02, n = 153 patients) (Fig. 2C, Supplementary Figures S3D). Table 3 shows the performances for three different thresholds. To test if our score only works within a hospital setting, we computed the performance also for outpatients and asymptomatic patients. For outpatients (n = 28, after removal of patients with at least one NA in the score variables) the sensitivity was 82% and specificity 53%. For the asymptotic patients the sensitivity was 67% with a specificity of 81% (threshold = 4, n = 29 after removal of patients with at least one NA in the score variables). However, both results show only the tendency as their lower confidence values were not above one, assumedly due to the low patient numbers. To evaluate the predictive power of single variables, we computed their individual performance as a predictor to develop an advanced COVID-19 stage. Table 3 shows the results. The best single variable was oxygen saturation (SO2) smaller than 96% with an AUC of 0.63 ± 0.01 (OR = 3.07 [95% CI 2.34-4.04]). Notably, the top five discriminating variables (oxygen saturation, age, CRP, LDH and temperature) are all part of the minimalistic predictor and of SACOV- 19 showing the consistency of the results and the principal relevance of these five variables. In summary, using the preselected variables from the minimalistic predictor enabled to define a clinical score comprising eleven patient variables with a good performance which is comparable to the machine learning-based predictors.

Implementation of the machine learning-based predictor into a web interface
To illustrate the performance of the minimalistic predictor, we designed a graphical user interface for a quick entry of the values of potential patient variables, followed by the prediction of the investigated endpoint. The web interface (http:// www. klini kum. uni-muenc hen. de/ Mediz inisc he-Klinik-und-Polik linik-II/ de/ sacov 19app/ index. html, login: user, password: sacov19) provides the user with the modelbased estimated probability of the patient to develop an advanced COVID-19 stage, the odds ratio, SACOV-19 and the model prediction. Moreover, it provides several graphical presentations to illustrate the impact of the specific variables  on the decision. Supplementary Figure S4 and movie M1 shows the web front-end and illustrates its usage (for scientific use).

Discussion
We computed and validated a predictor and associated predictive score (SACOV-19) to predict a complicated or more severe COVID-19 stage in patients, who were tested positive for SARS-CoV-2 and presented at mainly inpatient settings asymptomatic or with mild COVID-19 symptoms. SACOV-19 is based on standard parameters, which can be acquired in most hospital and out-patient settings. In addition, we implemented a browser-based interactive graphical user interface making the data-driven model accessible to the research community. Though most patients presenting asymptomatic or with mild COVID-19 symptoms do not require medical treatment, some patients rapidly deteriorate and need medical intervention [17,26]. By focusing on complicated or more severe COVID-19 as the endpoint, our score (SACOV-19) identifies patients requiring medical intervention and hospitalization. For asymptomatic/mild COVID-19 patients with increased risk predicted by our score, the attending physician might consider hospitalization or close follow-up. A high-risk result might also enter risk-benefit considerations when evaluating medical treatments with possible side effects. In turn, supporting the decision to discharge an asymptomatic/ mild COVID-19 patient according to our score, enables physicians to prioritize patients in need for hospitalization and close monitoring.
As of now, management decisions for asymptomatic/ mild COVID-19 patient are mainly based on the presence of risk factors, the clinical judgment of the attending physicians and the available resources [17]. Unfortunately, course and outcome of COVID-19 are heterogeneous complicating this situation. Risk factors such as higher age, high BMI, male sex or arterial hypertension have been associated with poorer prognosis. However, they are also highly prevalent in patients with mild or asymptomatic courses [5]. Earlier studies evaluated general disease severity scores such as CRB65, NEWS2, or qSOFA in COVID-19. Mostly, these scores were validated for risk of progression to severe COVID-19 or death, to guide IMC/ICU admission in hospitalized patients [27][28][29][30]. Notably, patients of our cohort showed a very indistinctive qSOFA score at baseline, indicating its unsuitability for identifying asymptomatic patients or with mild COVID-19 who are at risk of developing an advanced stage (58% accuracy for a threshold of one, and Glasgow Coma Scale ≤ 12 instead of 14). Scores specifically developed for risk of progression in COVID-19 like the COVID-GRAM, Brescia-COVID Respiratory Severity Scale (BCRSS) or 4C Mortality Score most entirely focus on the progression to severe respiratory impairment and death not taking the early risk of progression into a complicated stage into consideration [6,8,12,31]. Exceptions are the CALL and EWAS score and the score published by Huang et al. [32], which were designed to predict risk for progression to advanced COVID-19. However, these scores were based on a relatively small patient cohort [32,33]. Though in validation studies, their performance in predicting the progression to complicated or more severe COVID-19 was poor (AUC < 0.67) [13,34]. To note, we could not evaluate these scores and most of the published scores for the critical endpoint as the needed thresholds for calculating the according variables are more complex and were not collected in LEOSS. LEOSS data were collected using predefined categories to preserve the anonymous data collection protocol. In the 4C Mortality score [8], for example, which was rated as high quality [12], categories for age, respiratory rate, oxygen saturation, urea and C reactive protein were not mappable to LEOSS. In future research the 4C mortality score, for example, could be adapted to the LEOSS data and could be evaluated on advanced COVID-19.
SACOV-19 is based on eleven patient characteristics (14 binary variables) which are often documented at first presentation. In line with previous studies, SACOV-19 shows that patients of higher age, higher BMI, and smokers or former smokers have a higher risk for advanced COVID-19 courses [5,12,13,26]. The respiratory parameters oxygen saturation, respiratory rate and feeling of dyspnea are included in SACOV-19 emphasizing the importance of examining pulmonary parameters at initial presentation.
A strength of the study is that it is based on data of a well-documented and curated multinational COVID-19 registry supported by the German Center for Infection Research and German Infectious Disease Society, and a well set up machine learning procedure. We trained the SACOV-19 on a discovery cohort including only patients from the first wave of the COVID-19 pandemic. SACOV-19 was tested on an independent validation cohort comprising patients from the first to the third wave, which have been collected after the development of the score. COVID-19 is a newly emerging infectious disease, for which the knowledge and standard of care evolved. Hence one may argue that our score which was developed based on data from March to July 2020 may not be useful anymore. But, most treatment options to date are administered after a COVID-19 disease deterioration [35] which is our endpoint and hence would not affect the predictiveness of our score. Indeed, when we tested SACOV-19 on an independent validation cohort comprising patients from the first to the third wave (in which potential changes of care may have occurred), we didn't recognize a drop in performance. The SACOV-19 stands out because it has been evaluated across regions and sectors. At the time of manuscript preparation, it contained, to our knowledge, the largest German data collection of comprehensive clinical data on high-risk patients. [36]. Nevertheless, until now, the investigated patients may limit its general applicability. Most of the patients received care in an inpatient setting. When testing our score on outpatients we observed a similar performance result, however, we had only n = 28 outpatients for this analysis and could hence not get a significant result. Furthermore, the majority of patients exhibited a mild disease and did not advance to the complicated phase. Therefore, patients with co-morbidities could have been overrepresented in our cohort, as these patients were mainly admitted without severe symptoms [21]. To show the general applicability of our score, a further, clinical trial is necessary. We actually plan a trial testing in a primary care setting if SACOV-19 acceptably predicts COVID-19 deterioration.
While we included a large cohort of patients, a limitation is that the majority of patients were included at German health care facilities. Our results may not be fully applicable to countries or regions with different demographics or resource settings. Most of the patients received care in an inpatient setting. The majority exhibited a mild disease and did not advance to the complicated phase. Therefore, patients with co-morbidities could be overrepresented in our cohort, as these patients were mainly admitted without severe symptoms [21]. Another caveat may be the high number of missing values for specific variables and, in particular, some laboratory values, as not all parameters were collected at the day of the first positive SARS-CoV-2 test. For example interleukin 6 has been shown to have predictive power for a severe COVID-19 course [37] but was not selected by our algorithms, possibly due to its high number of missing values. Furthermore, thresholds for parameters were predefined in the study protocol. Metric available data could improve prediction models. The web application was designed for research use making our predictor accessible to the research community.

Conclusion
We present a robust machine learning-based predictor and, from this, a score (SACOV-19) to identify patients with predominantly known risk factors at risk of developing an advanced COVID-19 stage. To make it accessible to the research community, the predictor is available through a web interface. The predictor and score encompass patient variables which are commonly assessed in the primary care setting and are easily available. SACOV-19 may promote clinical decision making when it is essential assessing the risk for complicated or more advanced COVID-19 stages. Prospective clinical studies are needed to prove its reliability, particularly in countries or regions with different demographics or resource settings.