A new Coronavirus, subsequently named severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) widely spread on Wuhan City, Hubei province, since December 2019, causing a diffusive type of interstitial pneumonia. In the following months, it diffused all over the world, becoming a global issue and forcing the World Health Organization to declare Coronavirus Disease 2019 (COVID-19) as a pandemic in March 2020.

An early assessment of illness severity for stratification of patients with SARS-CoV-2 infection is a relevant clinical need. Recently, Liang et al. [1] developed an interesting, specific score risk for COVID-19 that may help predict critical illness (COVID-GRAM). Briefly, authors, analysing a wide Chinese nationwide cohort, through a least absolute shrinkage and selection operator (LASSO) and logistic regression, constructed a predictive risk score (COVID-GRAM).

During this emergency period, in our clinical practice, we evaluated the predictive value of national early warning score 2 (NEWS2) to identify high-risk patients for intensive care unit (ICU) admission [2]. The NEWS2 was primarily validated for death and intensive care need in septic patients. It is a modified version of NEWS, which is commonly used in British hospitals for identification of patients with a high risk of deterioration [3].

This study aimed to compare the diagnostic accuracy of COVID-GRAM score proposed by Liang et al. and NEWS2 for predicting critical COVID-19 illness in a cohort of patients hospitalised for SARS-CoV-2 in Umbria region.


This study is a retrospective, multi-centre accuracy study. Consecutive adult COVID-19 patients admitted to the Clinic of Infectious Diseases of Perugia Hospital, Umbria region, Italy, and Clinic of Infectious Diseases of Terni Hospital, Umbria region, Italy, from March 01, 2020 to June 15, 2020, were recruited. All subjects provided informed oral consent to clinical data collection. Data were collected using the same database in both centres to guarantee the homogeneity of the cohort.

All adult patients (age ≥ 18 years old) with an established diagnosis of SARS-CoV-2 infection were included. [4] The infection was established for patients with at least a sample (nasopharyngeal swab, bronchoalveolar lavage, and sputum) positive for the SARS-CoV-2 (molecular diagnosis with real-time polymerase chain reaction, PCR). The study was approved by the Ethics Committee of the Umbria Region (protocol number 18344/20/OV).

Data on demographics, comorbidities and clinical presentation were obtained for each patient. COVID-GRAM and NEWS2 at hospital admission were calculated using medical records.

Exclusion criteria were missing information that did not permit to calculate NEWS2 or COVID-GRAM. The primary outcome was critical COVID-19 illness as defined by Liang et al. [1]: admission to the ICU, invasive ventilation, or death.

Data were summarised as mean with the respective standard deviation (SD) or median with the respective interquartile range (IQR).

Differences between groups were determined using the Mann–Whitney test, the Student’s t test, or the Χ2 as appropriate.

NEWS2 is based on aggregate scoring of the following parameters: respiratory rate, hypercapnic respiratory failure, supplemental oxygen, body temperature, systolic blood pressure, pulse rate and level of consciousness. The combination of these values provides a score between 0 and 20 [3].

COVID-GRAM is a composite score calculated using the following formula: COVID-GRAM risk score = (X-ray abnormality × 27.1464) + (age × 0.6139) + (hemoptysis × 33.6210) + (dyspnoea × 14.0569) + (unconsciousness × 34.4617) + (number of comorbidities × 10.3826) + (cancer history × 31.2211) + (neutrophil/lymphocyte ratio, N/L × 1.25) + (lactate dehydrogenase, LDH × 0.0534) + (direct bilirubin × 3.0605). Age, number of comorbidities, N/L, LDH and direct bilirubin are continuous variables, and the others are categorial variables (expressed with 1 if positive and 0 if negative) [1].

Performances of NEWS2 and COVID-GRAM were evaluated by receiver-operating characteristic (ROC) curve analysis, describing areas under curves (AUROC) with their 95% confidence interval (CI) and comparing them to the null hypothesis (area = 0.5) [5,6,7]. The probability value (p) of the ROC curve was assessed using the Mann–Whitney U test. The Hanley–McNeil test was performed to establish if differences between ROC curves area were significant: areas under the ROC curve with critical Z ratio ≥ 1.96 were considered different [7].

For each score, the optimal threshold was chosen basing on the Youden index [8]. In addition, the performance of the score at the optimal threshold was assessed by the corresponding sensitivity and specificity, positive and negative likelihood ratio, positive and negative predictive value, and accuracy. Each value is reported with 95% confidence intervals (CI).

In clinical practice, as defined in previous studies, two thresholds (5–7 and 56.6–138.4 for NEWS2 and COVID-GRAM, respectively) could better identify patients with low, medium and high risk of progression to critical COVID-19 [1, 2]. Indeed, the same values described above were calculated for the optimal clinical thresholds defined in the literature.

Statistical analyses were performed using Prism Graphpad 8 software. A p value < 0.05 was considered significant.


During the study period (as shown in Fig. 1), 152 patients were hospitalised for SARS-CoV-2 in Clinics of Infectious Diseases of Perugia and Terni. All the included patients were diagnosed with COVID-19 for the first time. Thirty-one were excluded due to insufficient data to calculate one or both scores at hospital admission. Therefore, 121 patients were included in the analysis. Among these, 50 (41.3%) patients presented critical illness. Analysing the subgroup of excluded patients, we found that 11/31 (35.5%) developed critical COVID-19, and among these, 6 patients died. Indeed, including these patients, the prevalence of severe COVID-19 would remain high (40.1%).

Fig. 1
figure 1

Patient selection flow chart

The two scores were calculated using data obtained at hospital admission. This timing corresponds to a median of 8 days (interquartile range 5–11 days) after symptoms onset.

Baseline characteristics of patients are summarised in Table 1. The mean age was 64.5, standard deviation (SD) ± 13.4 years; critical patients had a mean age slightly higher than non-critical patients (67.4, SD 10.7 years versus 62.5, SD 14.8 years, p = 0.04). Percentage of male is higher in critical patients compared to non-critical ones (78% and 56%, respectively, p = 0.01).

Table 1 Demographics, comorbidities and clinical presentation

Among the total cohort, 54.5% referred dyspnoea, and it was more observed in the critical group (72.0% versus 42.3% of non-critical patients, p = 0.001).

We observed an increased neutrophil–lymphocyte ratio in total population (median 4.4, IQR 2.7–8.2) and it is significantly higher in critical patients (median 7.6, IQR 4.1 versus median 3.8, IQR 2.1–5.3, p < 0.0001).

Lactate dehydrogenase was increased in the total population (median 265 U/L, IQR 190–373 U/L) and higher in patients with severe COVID-19 compared to non-severe disease (median 373.0 U/L, IQR 271.0–516.5 U/L versus 221.0 U/L, IQR 166.0–283.0 U/L, p < 0.0001).

NEWS and COVID-GRAM data had not a Gaussian distribution. For both the tools, Mann–Whitney U test confirmed the differences of scores between patients with severe and not severe COVID-19 (p < 0.0001).

The ROC curves of NEWS2 and the new COVID-GRAM for predicting critical illness are depicted in Fig. 2. The NEWS2 AUROC curve measured 0.87 (standard error, SE 0.03; 95% CI 0.80–0.93; p < 0.0001). The new specific COVID-GRAM AUROC curve measured 0.77 (SE 0.04; 95% CI 0.68–0.85; p < 0.0001). Hanley–McNeil test showed that NEWS2, better predicted severe COVID-19 (Z = 2.03).

Fig. 2
figure 2

Receiver-operating characteristic (ROC) curves for prediction of critical COVID-19 illness: ROC curve for critical COVID-19 illness using NEWS2 (■ dashed line) and COVID-GRAM (• continuous line) of COVID-19 patients at hospital admission. NEWS2 showed an area under the ROC curve (AUROC) curve of 0.87 (standard error, SE 0.03; 95% CI 0.80–0.93; p < 0.0001). The COVID-GRAM AUROC curve measured 0.77 (SE 0.04; 95% CI 0.68–0.85; p < 0.0001)

According to the Youden index (J), the optimal thresholds were 4 and 123.2 for NEWS2 and COVID-19 score, respectively. As shown in Table 2, at the selected thresholds, sensitivity was 86 and 88 for NEWS2 and Liang’s COVID-GRAM, respectively. Specificity was higher for NEWS2 (70.4%) compared to COVID-GRAM (62%). NEWS2 at 4 had a positive likelihood ratio of 2.9 while for COVID-GRAM of 2.3. Similarly, positive predictive value was higher for NEWS2 (67.2%) respect to COVID-GRAM (62.9%). The negative likelihood ratios were 0.2 for both the scores. Similar negative predictive value was found for the scores (87.7% and 88.2% for NEWS2 and COVID-GRAM, respectively). Accuracy was 76.9 and 72.7 for NEWS2 and Liang’s score, respectively.

Table 2 Prognostic accuracy of NEWS2 and Liang’s COVID-19 score for severe COVID-19 using the optimal threshold values individuated by Youden index (N = 121)

Data of sensitivity, specificity, positive and negative likelihood ratio, positive and negative predictive values and accuracy of optimal clinical thresholds are shown in Supplementary Table 1.


Liang et al. generated a specific COVID-19 score based on characteristics of patients of a nationwide Chinese cohort. Before this study, other authors underlined the importance of an early severity of illness assessment. For example, Bradley et al. [9] in a cohort of 830 patients, compared rapid scores as CURB-65, NEWS2 and qSOFA and observed that NEWS2 ≥ 5 had a negative predictive value of 98.0% for early mortality. However, they underlined the necessity of a COVID-19 score that focus on respiratory failure rather than circulatory collapse.

In a previous study, we demonstrated that, in COVID-19 patients, NEWS2 is useful to quickly identify patients at risk of rapid deterioration and to predict ICU admission. Furthermore, according to this study, the use of the score could be implemented basing on two different thresholds: 5 and 7 [2]. Furthermore, another study performed analysing an UK cohort of 296 patients supported the use of NEWS2 to identify deterioration of hospitalised COVID-19 patients [10].

In this study, through the Youden Index, we individuated for simplicity a threshold with a good balance of sensitivity and specificity. It was calculated to compare NEWS2 and COVID-GRAM, but it does not represent the best clinical threshold. In clinical practice, two thresholds (5–7 and 56.6–138.4 for NEWS2 and COVID-GRAM, respectively) could better fit the necessity of rapid individuation of patients at risk of deterioration and exclusion of clinically stable patients. Indeed, for both scores, the lower threshold could better individuate patients with low risk of critical progression with a sensitivity of 100% and 86.0% for COVID-GRAM and NEWS2, respectively. At the same time, the higher threshold better predicted the risk of critical course (specificity of 78.9% and 98.6% for COVID-GRAM and NEWS2, respectively) (Supplementary Table 1).

However, 79 patients had a COVID-GRAM score between the two thresholds, with a medium risk of progression; among these, 31.6% had a severe COVID-19. Concerning NEWS2, only 25 patients had a score between the two thresholds and, among these, 52.0% developed severe COVID-19 (data not shown).

We observed that NEWS2 applied for identification of patients with a high risk of deterioration in different clinical setting [5] could identify patients at risk of severe COVID-19 better than the COVID specific Liang’s Score. Of note, although COVID-19 is a disease involving mainly the lungs, among the ten variables required by COVID-GRAM, respiratory rate, oxygen saturation, and supplemental oxygen are not considered differently from NEWS2.

Multiple factors need to be discussed. First, the two cohorts of patients are different: the mean age of our patients is higher than in Liang ones (49 versus 65 years). Interestingly, in our cohort, age is similar in critical and non-critical patients. In Liang’s cohort, the mean age was higher in critical than in non-critical patients. Moreover, 51.2% of our patients have at least one coexisting condition, as opposed to 25.1% in Liang’s cohort; a higher percentage of our patients, comparing to the Chinese cohort, developed a severe disease (41.3% versus 8.2%) and had abnormal chest radiological findings (86.0% versus 71.0%).

Unfortunately, with published data, we are not able to evaluate NEWS2 in Liang’s cohort. The main strengths of this study are the homogeneity of the cohort and the setting as a double-centre study. The NEWS2 is a more practical tool that have been largely validated in other clinical setting for identification of patients with a high risk of deterioration and shows a good applicability to identify the critical COVID-19. The main limitations of our study are the retrospective study design and the small number of patients.

In conclusion, both scores provide useful information to manage patients with COVID-19, but NEWS2 seemed to be simpler and to better predict than COVID-GRAM the progression to critical COVID-19.