Background

Novel coronavirus disease 2019 (COVID-19) is a contagious disease first reported in Wuhan, Hubei, China, with a rapidly spreading outbreak [1, 2]. According to the World Health Organization (WHO), there were more than 4,993,470 confirmed cases reported worldwide (nearly all countries and regions) and more than 327,738 deaths of the infected patients as of May 23, 2020 [3]. Currently, the number of patients with COVID-19 has been rapidly increasing in the United States, Europe, Russia, and Latin America. The infection appears to demonstrate a human-to-human transmission via droplet, aerosol, fecal, or direct contact, with an incubation period of ≥1–14 days [4]. COVID-19 infection has been reported in patients of all ages, but a higher mortality rate is being noticed in older adults and those with comorbidities of hypertension, cardiovascular disease, chronic kidney disease (CKD), diabetes, and chronic respiratory disease [5]. Obesity may also be a risk factor for respiratory failure leading to invasive mechanical ventilation in patients with COVID-19 patients [6]. The disease manifestation at presentation has been generally similar, with mild flu-like symptoms being the most frequent indication. Moreover, the most common symptoms in the general population have been fever, cough, dyspnea, and myalgia or fatigue. Nevertheless, certain patients might rapidly develop acute respiratory failure, multiple organ failure, and other fatal complications. To date, no specific treatment for COVID-19 has been fully developed.

Despite public health responses aimed at containing the disease and delay its spread, the outbreak has led to an increase in the demand for medical resources, while the medical staff themselves could also get infected. To reduce the burden on the healthcare system and provide optimal care for patients, an effective prognosis assessment of the disease is needed. A predictive model that combines multiple variables or features to estimate the risk of an infected person’s poor outcomes can assist healthcare staff in classifying patients based on the severity when allocating limited medical resources [7]. Some predictive models (e.g., Pneumonia Severity Index [PSI], CURB-65, Rapid Emergency Medicine Score [REMS], etc.) are already being used in COVID-19 patients. An earlier study about COVID-19 found that the PSI performed better than CURB-65 in predicting mortality [8]. Another study showed that the REMS could provide emergency clinicians with an effective adjunct risk stratification tool for critically ill patients with COVID 19, especially for the patients aged < 65 years. When REMS parameters cannot be completed in the emergency department, Modified Early Warning Score (MEWS), which also has a high negative predictive value (NPV) for screening, is also a second option for COVID-19 patients; the prediction accuracy of MEWS is acceptable [9]. A recent study demonstrated that the rate of severe cases had a significant regional difference [10]. Therefore, in this study, we aimed to describe the clinical characteristics of patients with confirmed COVID-19 in different cities of China where the outbreak risk levels have been different and construct an early warning prediction nomogram model incorporating clinical characteristics to identify the risk of patients with poor prognosis. The prediction nomogram considers admission to ICU as the outcome rather than patients with a poor prognosis. This nomogram contains some factors that can be obtained quickly but does not include laboratory examination data, which may help provide appropriate supportive treatment in advance and reduce the probability of severe COVID-19.

Methods

Patients

This retrospective, multi-center study was approved by the Ethics Committee of West China Hospital. Considering the retrospective nature of the study, written informed consent was waived by the Ethics Commission of the designated hospital for emerging infectious diseases. The study included data of consecutive patients hospitalized with laboratory-confirmed COVID-19, as reported to the National Health Commission, between January 2 and February 28, 2020. The data cutoff for the study was March 14, 2020. COVID-19 diagnosis was confirmed by high-throughput sequencing or real-time reverse-transcriptase polymerase chain reaction (RT-PCR) of nasal and pharyngeal swab specimens [11]. All the study patients were diagnosed as having COVID-19 in accordance with the WHO interim guidance [12]. Based on published articles on nomograms [13, 14], the primary cohort patients were subsequently randomly assigned, using a simple random splitting method in the R version 3.5.1 and the “caret” package, in a 7:3 ratio to the training or validation set.

Demographical and risk variables

The following data were obtained from the electronic medical records of the patients: demographics, clinical signs on admission, clinical symptoms, clinical risk factors, and exposure to infection. Demographic data were age, sex, alcohol intake status, smoking status, obesity, and the time between the onset of symptoms to admission. The onset time of clinical symptoms was defined as prior to the first visit to the hospital. Exposure to infection was defined as exposure to Wuhan (including Wuhan residency, travel history to Wuhan, or contact with people from Wuhan) or other COVID-19–affected areas (residency, travel history, or contact with people from these areas) or exposure to patients with COVID-19. The risk of exposure to infection changed as the relevant definitions in the COVID-19 guidelines of the National Health Commission of the People’s Republic of China were modified. If data were missing from the records or clarification was needed, data were obtained by direct communication with the attending physicians or other healthcare providers. A team of experienced clinicians reviewed, abstracted, and cross-checked the data. Each record was checked independently by 2 clinicians. The clinical and demographic features of the study cohort are summarized in Table 1.

Table 1 Baseline characteristics of patients infected with COVID-19

Definition of outcomes

The severity of COVID-19 during hospitalization was determined according to the American Thoracic Society guidelines for community-acquired pneumonia [15]. The primary outcome was defined as admission to the intensive care unit (ICU), which was similar to other studies associated with severe infectious diseases [15, 16].

Feature selection

The training cohort, which was also used for variable selection and risk model development, comprised 763 patients hospitalized with COVID-19. As described in Table 1, 37 variables were included in the selection process. The least absolute shrinkage and selection operator (LASSO) method, which is suitable for analyzing high-dimensional data, was used to select the most significant predictive features [17, 18]. Features with non-zero coefficients in the LASSO regression model were selected in the forward stepwise logistic regression model [19]. The features considered were the odds ratio (OR) with 95% confidence interval and two-tailed p values. Variables with a p-value < 0.1 in the univariate analysis and potential significance in the multivariate analysis were included in the logistic regression analysis. The forward selection procedure was used to develop a parsimonious model to predict ICU admission for COVID-19 in our cohort.

Development and validation of an individualized prediction model

Nomogram is a statistical model useful for risk assessment. A predictive nomogram was developed using the independent factors selected by LASSO to generate a combined indicator to estimate ICU admission for COVID-19. The nomogram can be used as a quantitative tool for physicians to assess the individual probability of ICU admission. Furthermore, the created nomogram was submitted to external validation, and the total score for each nodule was calculated. The nomogram was constructed using the total score as a factor.

Apparent performance of the nomogram in training and validation cohorts

Adequate discrimination and calibration were performed to test and validate the prognostic accuracy of the nomogram model [20]. Discrimination was quantified using Harrell’s concordance index (C-index), in which an absolute value close to 1 indicates the strong predictive ability of the model. The nomogram was further validated by bootstrapping (1000 bootstrap replicates) to calculate the corrected C-index. Calibration plots were developed to assess the predictive accuracy and agreement between the predicted and observed disease severity. Decision curve analyzes (DCAs) were performed to assess the clinical usefulness of the nomogram. The net benefit was calculated by subtracting the proportion of patients with false-positive results from that of patients with true-positive results and by weighing the relative risk of an intervention compared with the adverse effects of unnecessary intervention. The precision of the predictions was evaluated using the area under the receiver-operating-characteristic curve (AUC). Two-sided p values < 0.05 indicated a statistically significant difference.

Statistical analysis

Continuous variables were expressed as median and interquartile range. Categorical variables were expressed as absolute values and percentages. The medians of continuous variables were compared using independent group t-tests for normally distributed data and the Mann–Whitney test for non-normally distributed data. The chi-square or Fisher exact test was used to compare the proportions between the training and validation cohorts. Statistical analyzes were performed using the R software version 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria) and SPSS version 25.0 (IBM Corporation, Armonk, NY).

Results

Clinical characteristics

Data of 1087 patients with COVID-19 who had been hospitalized in 47 regions of Sichuan and 2 regions of Wuhan, from January 2 to February 28, 2020, were obtained. Among these patients, 763 were assigned to the training cohort and 324 to the validation cohort. The demographic and clinical characteristics of the study cohort are presented in Table 1. A total of 97 patients were eventually admitted to the ICU (8.9%). The median age was 51 years (interquartile range, 37–65 years) in the training cohort and 50 years (interquartile range, 38–64 years) in the validation cohort. More than half of the patients were female subjects (training cohort, 51.4%; validation cohort, 52.5%). The most common symptoms were fever (training cohort, 63.0%; validation cohort, 66.0%), cough (training cohort, 59.8%; validation cohort, 65.1%), and fatigue (training cohort, 37.1%; validation cohort, 38.3%). The top 3 comorbidities were hypertension (training cohort, 24.8%; validation cohort, 23.1%), diabetes (training cohort, 12.7%; validation cohort, 10.2%), and cardiovascular system disease (training cohort, 7.5%; validation cohort, 6.2%).

Among the patients with ICU admission, most had a history of alcohol intake (training cohort, 79.4%; validation cohort, 86.2%), smoking (training cohort, 72.1%; validation cohort, 89.7%), and non-obesity (training cohort, 100%; validation cohort, 100%). Patients with ICU admission were older than those without ICU admission by a median of 6 years both in the training and validation cohorts. Most patients with ICU admission had systolic blood pressure > 100 mmHg, heart rate < 100 beats per minute, and temperature during admission between 36.2 °C and 38.0 °C. Nearly 90% of the patients among admission to ICU were exposed to Wuhan or other COVID-affected areas in the past 14 days.

Selection of independent predictive factors

On the basis of demographics, clinical signs on admission, clinical symptoms, clinical risk factors, and exposure to infection, 19 potential predictors with non-zero coefficients were selected in the LASSO logistic regression model (Fig. 1). The inclusion of these 19 variables in a logistic regression model resulted in 6 variables that were independently statistically significant predictors of admission to ICU.

Fig. 1
figure 1

Selection of demographic and clinical features using the least absolute shrinkage and selection operator (LASSO) logistic regression model. a. Optimal parameter (lambda) selection in the LASSO model used fivefold cross-validation via minimum criteria. The partial likelihood deviance (binomial deviance) curve was plotted versus log (lambda). Dotted vertical lines were drawn at the optimal values by using the minimum criteria and the 1 standard error of the minimum criteria (the 1-SE criteria). b. Selection of optimal parameters (lambda) from the LASSO model using five-fold cross-validation and minimum criteria

The selected predictors were age, respiratory rate, systolic blood pressure, smoking status, fever, and CKD. The results of the logistic regression analysis are presented in Table 2.

Table 2 Logistic analysis of each factor’s ability in predicting the risk of ICU admission with COVID-19

Building and validating a prediction nomogram model

The nomogram used for predicting admission of patients with COVID-19 to ICU was formulated using the significant independent factors (age, respiratory rate, systolic blood pressure, smoking status, fever, and CKD). The nomogram revealed that the best predictors were CKD, age, and respiratory rate. Each variable was assigned a score according to the demographic and clinical features of an individual patient (Table 3), and the total score was computed by summing individual scores. The ICU admission probabilities were also obtained from the nomogram (Fig. 2).

Table 3 Score assignment for each variable included in the nomogram
Fig. 2
figure 2

Development of a nomogram for predicting the risk of ICU admission in COVID-19 patients. The nomogram included age, respiratory rate, systolic blood pressure, smoking status, fever and chronic kidney disease. The nomogram summed the scores for each scale and variable. The total score on each scale indicated the risk of ICU admission

The C-index of the nomogram was 0.829 (95% confidence interval [CI], 0.779–0.879) in the training cohort and 0.776 (0.684–0.868) in the validation cohort, implying the good discriminative ability of the model. The calibration plots of the nomogram revealed that the agreement between the predicted and observed disease severity was optimal in training and validation cohorts (Fig. 3). In addition, DCA revealed that the predictive model had significant net benefits for most threshold probabilities at different time points in training and validation cohorts, demonstrating the potential clinical benefit of the predictive model (Fig. 4). The AUC of the nomogram was 0.829 in the training cohort and 0.776 in the validation cohort, indicating the improved survival prediction compared with the nomogram model (Fig. 5).

Fig. 3
figure 3

Calibration curves of the nomogram for predicting the risk of ICU admission in training (a) and validation cohort (b). Data on predicted and actual disease severity were plotted on the x- and y-axis, respectively. The diagonal dotted line indicates the ideal nomogram, in which actual and predicted probabilities are identical. The solid line indicates the actual nomogram, and a better fit to the dotted line indicates a better calibration

Fig. 4
figure 4

Decision curves of the nomogram predicting the risk of ICU admission in training (a) and validation cohort (b). The x-axis represents threshold probabilities and the y-axis measures the net benefit calculated by adding true positives and subtracting false positives

Fig. 5
figure 5

Receiver-operating characteristic curve of the nomogram for predicting the risk of ICU admission in training (a) and validation cohort (b)

Discussion

Our study enrolled 1087 patients with COVID-19 who were registered from Sichuan and Hubei provinces’ health centers, where the outbreak risk levels were different. In the initial study, based on patient demographic and clinical characteristics obtained on the first admission, we established and validated a nomogram for predicting the risk for admission to ICU through LASSO, and logistic regression analyzes. The independently statistically significant factors included in the prediction model were age, respiratory rate, systolic blood pressure, smoking status, fever, and CKD. The validation of the model using different statistical methods demonstrated its optimal performance. As those factors can be obtained easily on admission, the nomogram is a convenient and valuable clinical warning tool to predict ICU admission of a patient with COVID-19, especially in the emergency department and even in a community health center.

Most patients with COVID-19 have mild disease with a good prognosis, but some patients may develop severe respiratory distress syndrome and have a poor prognosis [21]. To mitigate the burden on the healthcare system and provide the best care for patients, it is necessary to effectively predict the prognosis of the disease [22]. A predictive model that combines multiple variables or features to estimate the risk of poor outcomes of an infected person can assist the healthcare staff in classifying the patient’s disease severity when allocating limited medical resources [23]. Previous studies have reported prediction models for diagnosis and prognosis of COVID-19 and for detecting the risk of being admitted to a hospital for COVID-19. Chen et al. constructed a diagnosis prediction model with 10 clinical factors based on 136 participants [24]. Wang et al. enrolled 296 in-hospital patients with COVID-19 and developed a clinical model to predict the mortality of such patients [22]. Dong et al. developed a scoring model to predict the progression risk with COVID-19 pneumonia on the basis of 209 patients [25]. However, those proposed models are poorly reported and have a high risk of bias, raising concern of possible unreliable predictions when applied in daily practice for diagnosing. In a recent study, a risk score was reported to estimate the risk of critical illness of patients with COVID-19 based on 10 variables [26]. Although the study had a modest sample size and satisfying performance, the scoring system was complicated with some laboratory examination data that cannot be obtained before admission or quickly after admission. It is, therefore, necessary to develop and validate a convenient prediction model for healthcare staff or emergency staff that can be used quickly and easily. In our study, we constructed a warning model for predicting the risk of ICU admission on the basis of multi-center data from different cities and different severities of the outbreak in the Wuhan and Sichuan provinces. In our model, the independently statistically significant factors were age, respiratory rate, systolic blood pressure, smoking status, fever, and comorbidity with CKD, which could be obtained quickly, easily, practically, and reliably. This prediction model could be used in prehospital care or emergency department, allowing the medical staff to intervene at an early stage and determine their treatment location and the type of intervention. Statistically, our model demonstrated good discriminative ability and potential clinical benefit.

The model identified that comorbidities play a key role in the prognosis of patients with COVID-19. Cardiovascular system disease, especially hypertension, has been reported to be one of the most important independent risk factors [27]. In this study, we observed the patients with CKD were more likely to be admitted to the ICU, and that kidney disease was an independent risk factor for ICU admission of patients with COVID-19. This finding suggested that patients with a comorbidity of kidney disease on admission possibly had a high risk of deterioration [28, 29]. Previous studies revealed that kidney injury was associated with an increased risk of death in patients with influenza A virus subtype H1N1 and Severe acute respiratory syndrome (SARS). Multiple organ involvement, including the liver, gastrointestinal tract, and kidneys, has been reported during SARS in 2003 and very recently in patients with COVID-19 [30,31,32,33]. We hypothesized that such patients could have a proinflammatory state with functional defects in innate and adaptive immune-cell populations and were known to have a higher risk for upper respiratory tract infection and pneumonia. The 2019-nCoV itself may also cause kidney injury through multiple mechanisms: the 2019-nCoV may use angiotensin-converting enzyme 2 (ACE2) as a cell entry receptor and exert direct cytopathic effects on the kidney tissue. It has been reported ACE2 expression in the kidneys was nearly 100-fold higher than in the lungs [33]. Viral antigens or virus-induced specific immune effect mechanisms (specific T-cell lymphocytes or antibodies) and deposits of the immune complexes may damage the kidneys [34]. Early detection and treatment of renal abnormalities, including assessing the volume status and renal transplantation pressure, avoidance of nephrotoxic drugs, and adequate hemodynamic support, may help improve the vital prognosis of patients with COVID-19.

In most prognostic prediction models that have been published, older age, comorbidities, and increases in lactate dehydrogenase, lymphocyte, and C-reactive protein levels were the risk factors for poor prognosis [25]. Other indicators such as heart rate; breath rate; oxygen saturation; levels of procalcitonin, direct bilirubin, albumin, and D-dimer levels; activated partial thromboplastin time; glomerular filtration rate; and chest radiography abnormality have controversial conclusions [35, 36]. Our study also demonstrated that patients with COVID-19 infection who were older (especially > 65 years) had a worse prognosis than younger patients. In our study, fever (training cohort, 63.0%; validation cohort, 66.0%), cough (training cohort, 59.8%; validation cohort, 65.1%), and fatigue (training cohort, 37.1%; validation cohort, 38.3%) were the most common symptoms. However, among all the symptoms, only fever was an independent risk factor for prognosis, which is different from other studies. The difference in the inconsistencies of these models could be attributed to the risk of bias caused by the sample size and geographical differences of each model.

Our study has some limitations. First, the design was retrospective. Second, although the study is multi-center, the results cannot be generalized to other populations since the data is confined to just 2 places - Sichuan and Wuhan. Third, sample size limitation; future studies with larger sample sizes are warranted to validate our results. Fourth, some cases had incomplete data on symptoms, laboratory tests, and imaging examinations, given the variation in the structure of electronic databases across different participating hospitals and an urgent data extraction schedule. Fifth, severe patients were older than non-severe patients, and this difference in age may be a confounding factor. Sixth, we did not collect treatment-related data, which may be critical to the patient’s outcome. However, all patients received treatment in accordance with the guidelines issued by the National Health Commission of China.

Conclusions

We established an early prediction model incorporating clinical characteristics that could be quickly obtained on hospital admission, even in community health centers. This model can be conveniently used to predict the individual risk for ICU admission of patients with COVID-19 and optimize the use of limited resources.