Background

In December 2019, pneumonia of an unknown cause was detected in Wuhan, China. In February 2020, the World Health Organization officially confirmed that it was a severe acute respiratory syndrome caused by the 2019 coronavirus (COVID-19). Since then, the COVID-19 outbreak is still ongoing. Despite the development and distribution of vaccines, the prevalence of COVID-19 has not decreased owing to mutations. As of December 2021, the number of COVID-19 confirmed cases worldwide has exceeded 288 million with approximately 5 million deaths, and in South Korea, it has exceeded 630,000 cases with approximately 6,000 deaths [1]. Despite many efforts globally, the number of critically ill patients is increasing. Approximately, 15%–29% of COVID-19 cases require hospitalization, and approximately 17%–35% of hospitalized patients require intensive care [2,3,4,5]. In fact, the shortage of hospital beds and intensive care units (ICU) has emerged as a major issue worldwide, including in South Korea [6, 7]. Owing to the highly contagious nature of COVID-19, the treatment required needs to be carried out in a negative-pressure room with medical staff in a mandatory highest level of protective equipment. This has resulted in a significant burden on the healthcare system, worsened by an increased risk of infection to non-COVID-19 patients.

Currently, machine learning (ML) research on diagnosis of COVID-19 and mortality prediction is being actively conducted. A study has reported that ML methods can be applied to predict acute respiratory distress syndrome in patients with COVID-19 [8]. In addition, studies that apply ML to predict mortality and severity of COVID-19 have been steadily published in the past two years [9,10,11].

If the prognosis is predicted simultaneously with the COVID-19 diagnosis, limited medical resources can be allocated more efficiently. This can ultimately result in reduced mortality rates. This study aimed to apply ML to predict severe morbidity using the initial clinical information and laboratory results of COVID-19 patients. Thus, this study intends to help patients with a high probability of becoming critically ill at an early stage through timely intervention and thereby achieving efficient utilization of medical resources such as hospital beds.

Methods

Patient and data collection

This study was conducted retrospectively at a single institution in Korea. We included patients with mild COVID-19 requiring hospitalization from July 2020 to October 2021 and divided them into mild and exacerbation group. Data were collected retrospectively from electronic medical records (EMR) of patients. The collected clinical information and laboratory results of the patients were as follows: sex, age, height, body weight, comorbidity, duration of symptoms before hospitalization, systolic blood pressure (SBP), diastolic blood pressure, pulse rate (PR), respiratory rate, body temperature (BT), percutaneous oxygen saturation (SpO2), white blood cell count, C-reactive protein, blood urea nitrogen (BUN), total bilirubin, and procalcitonin. Cases in which the above data were omitted were excluded from this study. The criterion for exacerbation to a tertiary hospital was hypoxia that could not be maintained with O2 supply by a low-flow system, which was determined as worsening. All numerical data were collected before discharge or exacerbation, and cases collected over seven consecutive days were included in the study. The additionally created derived variables are as follows: 1) Index of whether the maximum BT three days before discharge increased compared to that of four days prior; 2) Index of whether the minimum SpO2 three days prior to discharge increased compared to four days prior; 3) ‘Vital bad index’ representing the sum of bad BT scores, bad PR scores, and bad SpO2 scores. The definitions for each score are as follows. Vital bad index: sum of bad BT score, bad PR score, and bad SpO2 score. Bad BT score is defined as 1 if the body temperature exceeds 38.5° three days before exacerbation. Bad PR score is defined as 1 if the heart rate was less than 60 bpm or greater than 110 bpm three days before exacerbation. Bad SpO2 score is defined as 1 if SpO2 was less than 93% three days before exacerbation.

Data pre-processing

Data from patients with a hospital stay of seven days or less were excluded. Missing values were removed through data cleaning. Data matrix was created by data integration, data transformation, data reduction, and data discretization. Patients were randomly assigned to a training and a test set at a ratio of 7:3.

Machine learning analysis

The analysis methods used in ML are random forest (RF), RF (ranger), gradient boosting machine (GBM), and support vector machine (SVM). Accuracy, kappa, specificity, precision, detection rate, balanced accuracy, and run time were calculated and compared using this model. The ability of each model to predict exacerbation was assessed by calculating the area under the receiver operating characteristics (AUROC). We divided the enrolled patients into a 70% training set and a 30% testing set. In addition, K-fold cross-validation (n_split:10) was performed to prevent data loss during model development training and to improve model prediction performance.

We quantified the uncertainty of our classification models and improved the confidence of our study by adding conformal prediction-based techniques such as class-conditional inductive conformal classifier for multi-class problems [12,13,14]. We split the dataset into 30% testing, 63% training, and 7% validation sets. Then we can get 99.5%, 99.7%, 99.5% accuracy of the region predictor, 0.98, 0.91, 0.97 oneC score and 1.01, 1.08, 1.03 avgC score in 95% confidence range in the order of GBM, RF and SVM.

Statistical analysis

Numeric and categorical variables were compared using t-tests and Chi-squared tests, as appropriate. Statistical analyses were performed using RStudio (version 4. 1. 2; Boston, MA, USA).

Results

Baseline characteristics

Of the 3744 patients diagnosed with COVID-19 by polymerase chain reaction test, 2,758 patients were finally enrolled. A total of 991 patients whose hospitalization was less than 7 days and for whom EMR data and laboratory findings were missing were excluded. Finally, the enrolled patients were randomly allocated to the training set (n = 1946, 70.6%) and test set (n = 812, 29.4%) (Fig. 1). Mild and exacerbation groups were 2696 (97.8%) and 62 (2.2%), respectively. The clinical characteristics and laboratory findings of both groups at the time of admission are shown in Table 1. In the mild and exacerbation groups, 1401 (52%) and 35 (57%) patients were male, respectively, (P = 0.061). The mean age of the patients in the two groups was 48.4 and 55.0 years (P < 0.001), respectively. The presence of co-morbidity was higher in the exacerbation group (75.8% vs. 47.2%, P < 0.001). SpO2 was 96.2% in the exacerbation group and 97.0% in the mild group (P = 0.017). SBP was 136.8 mmHg in the mild group, which was higher than 129 mmHg in the exacerbation group (P < 0.001). Among laboratory findings, the BUN level was higher in the exacerbation group than that in the mild group (14.2 mg/dL:11.9 mg/dL, respectively, P < 0.001).

Fig. 1
figure 1

Flowchart of patient selection and model development

Table 1 Baseline characteristics between transfer and non-transfer group

The comparison of bio-signals between the two groups three and four days before discharge or exacerbation is shown in Table 2. The maximum BT three and four days before exacerbation was higher in the exacerbation group (P < 0.001). The minimum SpO2 three and four days before exacerbation was lower in the exacerbation group (P < 0.001). The vital bad index was 1.4 in the exacerbation group and 0.9 in the mild group (P < 0.001).

Table 2 Comparison of derived variables between transfer- and non-transfer group

Developing and evaluating models

In this study, four ML algorithms, RF, RF (ranger), GBM, and SVM, were trained to develop a model to predict the exacerbation of COVID-19 patients. The performance of each developed model was evaluated using accuracy, kappa, sensitivity, specificity, precision, detection rate, balanced accuracy, and AUROC as the performance metrics (Table 3, Fig. 2). According to Table 3, RF (ranger), RF, and SVM had an equal accuracy, at 0.9883. SVM had the best Kappa, at 0.703, followed by RF (0.6614). SVM had the highest sensitivity, at 0.61111. Regarding precision, RF (ranger) had 1.00000, followed by GBM and SVM at 0.84615. The detection rate was highest for SVM, at 0.01436, followed by RF (0.01175). SVM had the best-balanced accuracy, at 0.80422. Run time was the shortest for SVM, at 4.13 s, followed by GBM (7.53 s). SVM showed the best results in six of the eight evaluation indicators. Although the AUROC of the RF ranger and SVM were the same (at 0.96), the SVM algorithm performed better.

Table 3 Final performance of machine learning models in prediction of transfer to tertiary medical center
Fig. 2
figure 2

Area under the receiver operating characteristics (AUROC) of four different prediction models

Among a total of 100 variables, 86 were used for all machine learning methods upon preprocessing. The top 20 variables important in SVM are shown in Fig. 3. BT, SpO2, and SBP were included in the top 20 important variables. The minimum SpO2 3 days before had the highest feature importance, followed by the maximum BT 3 days before, the maximum BT 4 days before, and the average BT 3 days before.

Fig. 3
figure 3

Top 20 variable importance of support vector machine

Discussion

In this study, we developed an algorithm that can predict the likelihood of a mild-to-severe COVID-19 patient being exacerbated to a tertiary medical institution equipped with tracheal intubation, ventilator, and negative pressure isolation room three days in advance. We analyzed continuous data from individual patients to predict the exacerbation of symptoms. With this study, we can help medical staff in deciding whether to exacerbation a patient to a tertiary hospital. In addition, the timing was predicted in advance. Recently, many studies have developed predictive models for the worsening of COVID-19 patients [15,16,17,18]. The characteristic of our study is that it developed a model for predicting the worsening of COVID-19 patients as well as simultaneously providing a warning in advance to the medical staff of whether the patient may need to be exacerbated to a tertiary hospital for advanced treatment. Although our model used easy-to-measure bio-signals, the AUROC was 0.96, which is a fairly high accuracy. Regarding variable importance, BT and SpO2, which are generally easy to measure, showed high importance. Currently, if the number of COVID-19 patients rapidly increases, medical institutions are unable to manage all patients with current resources. In Korea, asymptomatic or mildly symptomatic patients self-check their symptoms at home and are hospitalized only when their subjective symptoms worsen. However, for non-medical patients determining the need for inpatient treatment is difficult, therefore, the application of our proposed model can ensure that patients with mild symptoms can receive timely treatment. Medical resources, such as negative pressure isolation inpatient wards, mechanical ventilators, and intensive care units, cannot cope with the rapidly increasing number of patients. Therefore, if the deterioration in the condition of COVID-19 patients can be accurately predicted at an early stage, limited medical resources can be efficiently utilized.

According to an April 2021 review article on predicting the mortality and severity of COVID-19 using ML, the most used models were logistic regression, followed by extreme gradient boosting and SVM [19]. However, many methods are being attempted but the accuracy has remained unchanged for each method. According to literature, the SVM model was developed using laboratory findings as variables to predict severity; however, in such a pandemic situation, predicting the prognosis through the results of blood tests is cumbersome and relatively difficult to apply to actual clinical practice because there is a limit in terms of time and cost. Therefore, our study will be more useful in the current pandemic situation as the vital signs and SpO2, which are relatively easy to obtain, are the main variables.

Although this study shows the possibility of developing a model that can be easily applied to clinical practice, it has several limitations, the first of which is its implementation in a single institution. Since COVID-19 has not yet ended, we intend to undertake further research related to COVID-19, in which multi-center COVID-19 patient data will be collected and external validation will be conducted. Second, the variable SpO2 alone was considered in model development without information on oxygen supply. The ratio of SpO2 and supplied O2 flow more accurately reflects the medical condition of the patient. Thus, this is planned as a further study. Third, data from asymptomatic COVID-19 patients who were not hospitalized were not collected; this is also planned as a further study in the future. If data on asymptomatic patients are collected, the performance of the exacerbation model can be improved.

Conclusions

In conclusion, predicting deterioration in advance so that appropriate treatment can be achieved is an important diagnosis of COVID-19 for improving the survival rate of patients. Our model predicts with high accuracy 3–4 days before the condition of the patient with COVID-19 worsens. Therefore, this algorithm can facilitate adequate oxygen therapy and mechanical ventilator preparation, thereby improving patient prognosis, increasing the efficiency of the medical system, and mitigating the damage caused by the global pandemic.