Background

Coronavirus disease (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has affected almost all countries and regions, posing a great threat to human health. SARS-CoV-2 has evolved into various variants with different virulence and transmission, including Alpha (B.1.1.7), Beta (B.1.351), Gamma (P.1), Delta (B.1.617.2) and Omicron (B.1.1.529) [1]. B.1.1.529 was first discovered in South Africa in November 2021, and was listed as a Variants of Concern by the World Health Organization and named Omicron [2]. Increased transmissibility and reduced protection from neutralising antibodies have led to the rapid spread of this variant, which rapidly became a major variant in many countries [3]. Since the cancellation of the zero-COVID policy in China in December 2022, many cases of SARS-CoV-2 Omicron infection have been reported across the country [4]. The clinical manifestations of Omicron infection vary widely, ranging from asymptomatic illness to pneumonia and life-threatening complications, including acute respiratory distress syndrome and multiple organ failure, and death [5]. Although the case fatality rate is lower than that of ancestral viral strains, the risk of death remains high in older age groups, particularly in patients with comorbidities. Identifying patients with a poor prognosis early and providing supportive treatment are important for improving prognosis. The development of prediction models can provide a basis for medical decisions and help medical workers manage patients with different risks better.

Previously reported poor prognostic factors for COVID-19 include advanced age, multiple comorbidities, low lymphocyte count, elevated levels of inflammatory markers, coagulation markers, and cytokines, and imaging features [6, 7]. Cytokine storms are associated with severe COVID-19, and many studies have reported that proinflammatory cytokines, such as interleukin (IL)-1, IL-6, and tumour necrosis factor, can be used as prognostic biomarkers [8, 9]. Although several models have been developed for predicting severe disease and death in patients with COVID-19, most of the existing prediction models have defects such as complex calculation methods, high risk of bias, and lack of multicentre data verification, and are unsuitable for clinical application. Considering the differences in epidemiology and clinical characteristics between Omicron and other earlier variants, and paucity of reports on high-quality prediction models related to severe Omicron infection, we systematically studied the prognostic biomarkers of mortality in patients with SARS-CoV-2 Omicron infection, developed a predictive scoring system, and validated it using a national multicentre data to predict the risk of death in hospitalized patients with SARS-CoV-2 Omicron infection.

Methods

Study design and population

This retrospective multicentre study included data on 1817 patients hospitalized with SARS-CoV-2 Omicron infection at eight hospitals in China between December 19, 2022 and May 25, 2023. All enrolled patients were hospitalized for COVID-19. The diagnostic criteria were as follows: (i) clinical manifestations associated with COVID-19; (ii) a positive SARS-CoV-2 nucleic acid or antigen test result and (iii) a clear treatment outcome of discharge or death. Exclusion criteria were (i) age less than 14 years old, (ii) still receiving treatment at the time of data analysis or (iii) lack of information on underlying diseases. The clinical outcomes such as discharges or mortality were monitored up to June 10, 2023. The 815 patients from the First Affiliated Hospital of Zhejiang University constituted the training set, and the remaining 1002 patients from seven hospitals, including Shenzhen Third People’s Hospital, The First Affiliated Hospital of China Medical University, Affiliated Dongyang Hospital of Wenzhou Medical University, Shulan Hospital of Hangzhou, Fifth Medical Center of People’s Liberation Army General Hospital, Beijing Ditan Hospital Affiliated to Capital Medical University, and Qilu Hospital of Shandong University, constituted the external validation set.

Data collection

Baseline demographic characteristics (age and sex), clinical data (onset of symptoms, underlying diseases and laboratory test results on admission) and treatment outcomes were obtained from the electronic medical record system. Laboratory tests included haematology, serum biochemistry, coagulation spectrum, infection-related factors and cytokine levels. Data were collected on the time from onset to admission, and time from admission to discharge or death.

Statistical analysis

Categorical variables were described using frequencies and percentages, and continuous variables were described using medians and interquartile ranges (IQRs). The Mann-Whitney U-test was used to compare continuous variables, and the chi-square test or Fisher’s exact test was used to compare categorical variables. Continuous variables were converted into binary variables according to the cutoff value determined using the receiver operating characteristic (ROC) curve, with optimal sensitivity and specificity. Least absolute shrinkage and selection operator (LASSO) regression analysis was performed to identify variables with non-zero coefficients using the R “glmnet” software application [10]. Forty-six clinical features with missing values < 10% were included in the variable shrinkage process. A logistic risk model was used to establish the LASSO regression, and the optimal lambda value with the smallest partial likelihood deviation was selected using a 10-fold cross-validation. The variables screened by LASSO regression were further analysed using multivariable logistic regression analysis. To develop prognostic scores, we used the regression coefficients of prognostically relevant variables and assigned points proportional to the coefficient. To quantify the discriminant performance of the model, ROC curve and area under the curve (AUC) analysis were performed using the R “pROC” software package. All analyses were performed using R software (version 3.6.3). P values < 0.05 were considered statistically significant.

Results

Patient demographic and clinical characteristics

The training set included 815 patients, of whom 85(10.4%) died and 730 (89.6%) survived (Table 1). The median age of the patients was 72 (IQR, 61–82) years, with 500 males (61.3%). The median interval from onset to admission was 8 (IQR, 6–10) days, and the median length of hospitalisation was 8 (IQR, 6–10) days. The validation set included 1002 patients, of whom 84 (8.4%) died and 918 survived. The median age of the patients was 62 (IQR, 47–76) years, with 630 males (62.9%). The median interval from onset to admission was 2.5 (IQR, 1–7) days, and the median length of hospitalisation was 12 (IQR, 7–17) days. In both sets, the top three comorbidities were hypertension, diabetes, cardio-cerebrovascular disease (CCVD), and the most common symptoms were fever and cough. Compared with survivors, non-survivors were older, with a significantly higher prevalence of dyspnoea and comorbidities (hypertension, diabetes, and CCVD), and a longer length of hospital stay. Continuous variables were converted to binary variables based on cutoff values determined by the receiver operating characteristic (ROC) curve for optimal sensitivity and specificity. The cutoff values were described in Tables 1 and 2. A comparison of the laboratory test results on admission of survivors and non-survivors in the training set is shown in Table 2.

Table 1 Demographic and clinical characteristics of the training set and validation set on admission
Table 2 Laboratory test results of patients in the training set on admission

Risk factors and the prediction model for death in patients with SARS-CoV-2 Omicron infection

After excluding the variables with missing values in ≥ 10% of records in the training set, 46 clinical features detected on admission were analysed using LASSO binary logistic regression. All features were categorical variables, and seven factors were significantly associated with the risk of COVID-19 death, including age, IL-6, blood urea nitrogen (BUN), lactate dehydrogenase (LDH), D-dimer, neutrophil count, and neutrophil-to-lymphocyte ratio (NLR) (Fig. 1A and B). Multivariable logistic regression analysis showed that older age and IL-6, LDH, BUN and D dimer levels were independent risk factors for mortality (Fig. 1C). The risk-scoring system was generated by assigning points based on regression coefficients. The score distributions were as follows: age ≥ 78 years, 1 point; IL-6 ≥ 9.5 pg/mL, 1 point; BUN ≥ 8.4 mmol/L, 1 point; LDH ≥ 311U/L, 1 point; and D-dimer ≥ 1257 ng/mL, 1 point (Fig. 1D).

Fig. 1
figure 1

Least absolute shrinkage and selection operator (LASSO) regression and multivariable logistic regression were used to screen clinical features associated with COVID-19 mortality in the training set. Notes (A) LASSO trace curves of 46 features with < 10% missing values. (B) Seven features were selected using LASSO binary logistic regression analysis, and the two dashed vertical lines marked the optimal values using the minimum criteria and the 1 standard error (SE) of the minimum criteria (the 1 − SE criteria). (C) Regression coefficients obtained using multivariable logistic regression. (D) Scoreing of variables in the scoring system

Performance of the scoring system in the training and independent validation set

To confirm the generalisability of the risk score, we used an independent group of 1,002 patients from seven hospitals in different regions of the country. The same variables were collected from the validation set and the risk scores were calculated. ROC analysis indicated a good predictive performance in both the training set (AUC: 0.888; 95% CI: 0.850–0.926) and the independent validation set (AUC: 0.905; 95% CI: 0.879–0.931) (Fig. 2A and B). The Youden index-based cutoff generated during the development of the scoring system was 2.5, with a 83.6% sensitivity, 83.5% specificity, 37.2% positive predictive value (PPV), and 97.8% negative predictive value (NPV). In the validation set, sensitivity, specificity, PPV, and NPV were 79.8%, 83.0%, 30.0%, and 97.8%, respectively.

Fig. 2
figure 2

Evaluation of the performance of the scoring system in predicting death in patients hospitalized with SARS-CoV-2 Omicron infection. Notes Receiver operating characteristic (ROC) curves and the area under the curve (AUC) were used to evaluate the accuracy of the scoring system in the training set (A) and the independent verification set (B). AUC area under the curve; CI confidence interval

Discussion

Although most cases of COVID-19 are not life-threatening, the mortality rate is higher in older adults with multiple comorbidities [11].The ongoing evolution and mutation of COVID-19 will lead to more reinfections, and understanding of the risk factors for death from COVID-19 will continue to improve. The purpose of developing a mortality prediction scoring system is to assist clinicians in identifying patients at high risk of death on admission when the associated symptoms may be mild and nonspecific.

In this multicentre retrospective study, we developed and validated a new predictive score based on five variables (age and IL-6, BUN, D-dimer, and LDH levels) to predict outcomes in patients with SARS-CoV-2 Omicron infection. Compared with the traditional SOFA (AUC: 0.7) and qSOFA scores (AUC: 0.61), our model has a better ability to predict COVID-19 deaths [12]. The independent validation data were collected from seven hospitals in multiple provinces and cities across the country. Despite differences from the training set, the scoring systems also had high accuracy in the validation set, indicating that the scoring system is generalisable. In addition, some laboratory assays differed by hospital. For example, IL-6 measurement is not as standardised as other inflammatory markers, which also suggests that different laboratory techniques can be used without affecting the model performance.

In our study, the risk factors for death in patients with COVID-19 were similar to those identified in previous studies [13, 14]. Older patients have more underlying diseases, are more likely to have secondary infections, and develop critical illness and have a higher case fatality rate [15]. IL-6 is a multifunctional cytokine that regulates humoral and cellular responses, and has been identified in many previous studies as an important biomarker associated with adverse clinical outcomes of COVID-19 [16, 17]. It is released by immune cells, including macrophages and T cells, and elevated levels of IL-6 reflect viral load and lung damage. Overexpression of proinflammatory cytokines and chemokines is involved in the occurrence of severe pneumonia, acute respiratory distress syndrome, and multiple organ failure in patients with COVID-19 [18]. Our study suggests that monitoring IL-6 levels can also help identify high-risk patients. In our study, elevated D-dimer levels were associated with mortality. D-dimer is a fibrin degradation product. An increase in D-dimer level indicates activation of coagulation, which may be related to thrombosis and inflammation [19]. D-dimer levels are elevated in 3.75–74.6% of patients with COVID-19 [20, 21]. A multicentre retrospective study conducted in Wuhan found that D-dimer greater than 1 µg/mL on admission was associated with an increased risk of in-hospital death [22]. Consistent with previous studies, we found that elevated BUN levels were associated with an increased risk of death from COVID-19 [23]. BUN, a nitrogenous end-product of protein metabolism, can be used to assess renal function and Hypovolemia. One study reported that after correcting for renal function, a high BUN concentration on admission was still closely related to the adverse outcomes of critically ill patients in the ICU [24]. In addition, the BUN-to-serum albumin ratio is an important prognostic factor for mortality and severity in patients with aspiration pneumonia, hospital-acquired pneumonia, and community-acquired pneumonia [25, 26]. LDH is an enzyme present in the cytoplasm that is involved in lactate metabolism. An elevated LDH level is an indicator of cell damage or necrosis [27]. Several studies have shown that the LDH levels reflects disease severity and is significantly higher in patients in ICUs than in other patients [28, 29]. A meta-analysis of 18 studies (total sample size: 5394 patients) showed that an elevated LDH level was associated with a 5-fold increase in the risk of adverse outcomes in patients with COVID-19 [30]. The discovery of these biomarkers also has implications for the treatment of COVID-19, such as timely anti-inflammation, blocking cytokine storm, appropriate anticoagulation, and prevention of gastrointestinal bleeding may help to improve prognosis.

It is worth noting that underlying disease (especially cardiac disease) have been associated with poor prognosis [31], the prediction of COVID-19 death in our study was mainly captured by age and biological examination at adimission. Machine learning variable selection techniques essentially retain only those variables that have the greatest impact on prognosis, and the extent of individual systemic inflammatory response syndrome appears to drive patient outcomes to a greater extent than underlying conditions. Therefore, underlying disease was not included in the final risk score because its effect was offset by other factors.

Our scoring system has several advantages. First, it is based on the data of patients with SARS-CoV-2 Omicron infection, and so adds to the prognostic indicators available for different variants of SARS-CoV-2. Second, it is based on readily available objective indicators, and is easy to calculate and use in clinical practice. Third, it has good prediction performance and was verified using data from different hospitals in China, so has generalisability. However, our research has some limitations. First, this study was retrospective, not all patients underwent all laboratory tests, and all patients in this study were hospitalized and none were outpatients, resulting in incomplete data and selection bias. Second, it is not a large-sample study. All the data come from China and may not fully represent the world ‘s population. Third, our risk score was calculated from baseline variables at admission, regardless of the effect of various treatments during hospitalization on prognosis such as antiviral therapy, which was an important independent predictor of COVID-19-related mortality [32]. In addition, most of the patients in our study received COVID-19 vaccines on national appeal, but we did not collect specific information on the number, type, and timing of vaccinations, which have been reported to reduce the risk of COVID-19-related death in a dose-response manner [33]. These limitations may limit its implementation. Large-scale prospective data are needed to optimize the model in the future.

In summary, the scoring system based on age and four laboratory indicators on admission can timely and effectively assess the risk of patients with SARS-CoV-2 Omicron infection, and help clinicians identify high-risk patients for monitoring and immediate intervention.