Introduction

In December 2019, the SARS-CoV2 virus, a member of the coronavirus family, responsible for the global COVID-19 pandemic, was isolated. SARS-CoV2 is a highly contagious respiratory virus, which is able to cause disease with a broad spectrum of severity, from asymptomatic infection to ARDS and MOF [1].

The disease caused by SARS-CoV2 can present a rapid evolution, within hours or days, even in paucisymptomatic patients. It is, therefore, essential to identify, right from the access to the Emergency Department, those patients at risk to develop a severe form of the disease and a pattern of ARDS [2].

Viral and bacterial pneumonia represent a common cause of ED presentation and it carries an adverse outcome, especially in old patients and in those with previous medical conditions [3]. In past years, several scores have been developed for the prognostic stratification of patients with community-acquired pneumonia, which demonstrated a good prognostic stratification ability. These tools, combined with the clinical evaluation, allow the physician to predict the disease evolution and to make appropriate therapeutic and care choices [4].

Pneumonia caused by SARS-CoV2 presents special features from several points of view, including symptoms of presentation, arterial blood gas parameters and bio-humoral findings [5]. A systematic review published in November 2020, identified a long list of clinical parameters, with an established prognostic value in patients with COVID-19 [6]. Thereafter, clinicians involved in the management of these patients elaborated several scores, specifically targeted for this disease. Most of them have been validated in small populations, with a specific ethnic characterization, and do not appear to be effectively generalizable among different contexts [7,8,9]. 4Cscore represents the only specific score validated on a large European population [10].

We selected two established scores for patients with pneumonia, CURB65 [11] and MuLBSTA [7], and we compared them to those newly designed for patients with COVID-19, including 4C mortality score [10], the CALL score [8] and the Quick score [9]. We added Sequential Organ Failure Assessment (SOFA) score [12], as it is the most widespread tool to evaluate the degree of organ dysfunction in critically ill patients, with a good prognostic stratification ability.

Our aim was to compare the prognostic performance of generic and specifically designed scores in predicting in-hospital mortality and the need for invasive ventilation among patients hospitalized for COVID-19.

Methods

Patients selection

This is a retrospective study, which included all patients hospitalized for pneumonia caused by SARS-CoV2 from the Emergency Department of the Careggi University Hospital in the period March 2020–February 2021. We considered the period from March 2020 to July 2020 as the first wave of the pandemic and the following period as the second wave. The Area Vasta Centro Local Ethical Committee approved this study (NO. 17104). Inclusion criteria were: age equal to or above 18 years, a positive nasopharyngeal swab for SARS-CoV2 and the need for hospitalization. All these patients were admitted primarily for COVID-19 and they all presented radiological findings diagnostic for pneumonia. Discharged patients with a positive swab for SARS-CoV2 were not included.

A database including all patients, with a confirmed diagnosis of SARS-COVID 19 in the ED of this University Hospital, was filled. For each patient, basic demographic data, previous medical conditions, vital signs and laboratory parameters were collected from medical records using a standardized collection template. From the database, we extracted data regarding the ED stay and the in-hospital outcome of the included patients. The primary end-point was in-hospital mortality, while the need for endotracheal intubation (ETI) and mechanical ventilation (MV) was the secondary end-point. Patients not considered eligible for ETI and MV for age and/or comorbidities were excluded from the analysis for the secondary end-point.

Scores calculation

We calculated the following scores: 4C mortality score [10], CALL score [8], MuLBSTA score [7], Quick Severity Index [9], CURB-65 [11] and SOFA score [12]. The scores were calculated as shown in Table 1. The Quick severity score was calculated only for patients, who required low-flow oxygen therapy in the ED (less than or equal to 6 L/min). We followed the calculation modality of the original scores, except for the 4C mortality score, for which we substituted BUN with creatinine. In fact, BUN was available in a minority of patients, while creatinine was evaluated in all subjects. The points assigned to different creatinine values are listed in Table 1. All scores were evaluated as continuous values and after dichotomization based on the calculation of quartiles (Table 1). SOFA score was calculated based on the original score and was dichotomized as follows: 1st quartile if lowest thru 3, 2nd quartile if 4, 3rd quartile if 5, 4th quartile if 6 thru Highest.

Table 1 Prognostic scores

Statistical analysis

Categorical data were reported as proportions and counts. Continuous parameters were reported as mean ± standard deviation or median and interquartile range based on their distribution. For continuous variables, the null hypothesis was tested using the Student’s t test for independent groups; Fisher exact test was used to compare counts in cross tables. The discrimination ability of each prognostic scoring system was evaluated by the receiver-operating-characteristic analysis (ROC analysis). A p value < 0.05 was accepted as statistically significant. The analyses were computed using Statistical Package for Social Sciences 27.0.1.0 (SPSS Statistics, IBM Corporation, Chicago, Ill, USA).

Results

We included 1208 patients, mean age was 69 ± 17 years, 57% of male sex. As reported in Table 2, the most common previous medical conditions were hypertension, diabetes mellitus, coronary artery disease and atrial fibrillation. The most common clinical presentation was dyspnea, followed by cough and weakness.

Table 2 Characteristics of the study population as a whole and based on in-hospital mortality

Most of the patients (n = 960, 80%) have been initially transferred to ordinary wards, while 87 (7%) were admitted to the High-Dependency Unit (HDU) and 161 (13%) to the Intensive Care Unit (ICU).

One-hundred-thirteen patients (9%) underwent ETI and MV, with a mean latency of 7 ± 8 days from ED admission and among them mortality rate was 50% (n = 57). The overall mortality rate was 21% (n = 248 patients), with a mean latency of 14 ± 14 days. One-hundred-sixty-seven patients were not considered eligible for ETI and MV and were excluded from the analysis for the secondary end-point. We compared the prevalence of the two end-points between the first and the second wave: the mortality rate decreased from 25 to 18% (p = 0.006) and the need for MV from 14 to 7% (p < 0.001). Among survivors, mean length of hospital stay was 15 ± 14 days.

Upon admission to the ED, non-survivors were older and were more likely to have previous medical conditions (Table 2). They showed lower peripheral oxygen saturation and PaO2/FiO2 ratio, systolic and diastolic blood pressure, and higher heart rate (HR) and respiratory rate (RR). Among laboratory parameters, they had higher White Blood Cell (WBC) counts, Neutrophil/Lymphocyte ratio, C Reactive Protein, procalcitonin, creatinine, IL-6 and glucose level (Table 3).

Table 3 Vital signs, arterial blood gas and laboratoristic values in the whole population and based on in-hospital mortality

In Table 4, we reported the values of the scores in the whole study population and based on in-hospital mortality and need for MV. All the scores were significantly higher in non-survivors compared to survivors and in those who needed mechanical ventilation. The prognostic stratification ability was further evaluated by ROC Curves analysis. 4C-score showed the best predictive ability for in-hospital mortality, followed by the CALL score (respectively, 0.837 and 0.743). SOFA, QUICK and MuLBSTA showed a fair discriminative ability (Fig. 1). We evaluated separately the discrimination ability of the scores in the first and the second wave and we did not find significant differences (SOFA score: AUC 0.652 vs 0.691; MulBSTA AUC 0.664 vs 0.671; QUICK AUC 0.663 vs 0.701; 4C score AUC 0.816 vs 0.846; CALL score AUC 0.738 vs 0.751, all p < 0.05).

Table 4 Scores values in the whole population and based on the primary outcomes
Fig. 1
figure 1

Prognostic stratification ability of the examined scores evaluated by ROC curves analysis

When we considered the secondary end-point, the discriminative ability of all the scores was fair (SOFA: AUC 0.60, 95% CI 0.54–0.66, p = 0.030; CURB65: 0.60, 95% CI 0.53–0.67, p = 0.005; QUICK: 0.66, 95% CI 0.59–0.74, p = 0.040; CALL: AUC 0.68, 95% CI 0.62–0.74, p = 0.032; MulBSTA: AUC 0.64, 95% CI 0.58–0.70, p = 0.031; 4C score: AUC 0.65, 95% CI 0.60–0.70, p = 0.024).

We finally evaluated the mortality rate in the different quartiles of all the scores (Fig. 2). The prevalence of in-hospital mortality significantly increased from the first to the fourth quartile of the 4C mortality score, being virtually absent in the first quartile and disproportionally high in the fourth one. The same behavior was observed for the CALL score and the CURB score. With the SOFA score, only patients in the fourth quartile showed significantly higher mortality, compared to those in the lower quartiles. For homogeneity, we adopted the subdivision in quartiles for all scores. However, in the original paper for validation of the 4C score, the authors proposed a division into 4 risk classes: low-risk (points 0–3), intermediate risk (points 4–8), high risk (points 9–14) and very high risk (points ≥ 15). We tested this classification in our study population and we confirmed the results obtained with the division in quartiles (mortality in different subgroups, respectively, 1, 6, 27 and 69%, all p < 0.001, except for low and intermediate-risk subgroups p = 0.05).

Fig. 2
figure 2

Mortality rate in the quartiles of the examined scores

Discussion

Among a large population of patients admitted to the hospital with SARS-COV2 infection, the prognostic scores specifically designed for COVID patients demonstrated a good prognostic stratification ability for in-hospital mortality. Among the scores usually applied to patients with bacterial or viral pneumonia, only the CURB-65 score showed a good prognostic performance. The SOFA score did not allow a useful prognostic stratification in our study population. These data were confirmed for both the first and the second wave of the pandemic. An adjunctive strength of CURB, CALL and 4C scores was that the mortality rate significantly increased in increasing quartiles. The novelty of the present study is represented by the evaluation of all the considered scores by parameters collected at the moment of ED admission, allowing an early prognostic stratification. The increased mortality in higher quartiles can give the clinician a definite reference for prognostic stratification. This is of utmost relevance, as during a pandemic, in the presence of a shortage of resources, the early identification of patients at risk of an adverse prognosis allows an appropriate utilization of hospital facilities [13]. As relevant changes in the admission policies and treatments between the first and the second wave of the pandemic occurred, we could not take for granted a similar diagnostic accuracy in the whole study period. However, our analysis did not show significant differences and, to the best of our knowledge, a similar comparison has not been reported by previous papers on this topic.

We tested the SOFA score as it represents one of the most widespread prognostic scores in the medical community and it plays a pivotal role in the diagnosis and prognostic stratification of septic patients [14]. However, in the present as well as in other populations of patients with pneumonia caused by COVID-19, its prognostic value was fair [9]. A possible explanation could be that these patients do not have significant extrapulmonary organ damage at the beginning of the disease when they usually come to the ED. Multiorgan failure is a late complication, in those who develop severe respiratory failure [15,16,17]. Therefore, the SOFA score evaluated upon ED admission cannot capture this possible evolution.

Among scores specifically designed for patients with COVID-19, we selected the CALL score, for its ease to use and for the inclusion of parameters routinely evaluated in the ED. It was validated in a small Chinese population [8] and, when we applied it in our study group, the performance was good in predicting mortality, while it was less satisfying for the prediction of respiratory deterioration. To the best of our knowledge, this study represents the first attempt at external validation of the score.

On the other side, the prognostic stratification ability of the 4C-score, validated in a large European population, has already been confirmed among patients evaluated in the Emergency Department [18, 19] and among those admitted to the Intensive Care Unit [20] or in the general ward [21], in different geographic areas. In our study group, which is one of the largest among the previously mentioned papers, 4C score outperformed all the other tools. The need to substitute BUN with creatinine, as it was not available in the vast majority of our patients, did not reduce its prognostic stratification ability. Both 4C and CALL score included different laboratory parameters, which were identified by previous studies as prognosticators in patients with a severe form of infection by COVID-19. In the same way, an increased creatinine value was associated with a more severe form of the disease. This could be the reason why the CURB-score, which includes demographics and parameters of renal function, demonstrated a good prognostic performance. Conversely, the MuLBSTA score did not allow a useful prognostic stratification, as some of the included parameters are not discriminative in patients with pneumonia induced by COVID-19. In fact, a bacterial coinfection is a late event, while multilobar infiltration is a ubiquitous finding in these patients [22].

When we considered the need for mechanical ventilation, the prognostic performance of all the scores was fair, consistently with similar study populations [21]. 4C score was developed to predict mortality, while the CALL score was tested to predict both mortality and disease progression, but the ability to early identify patients at risk of developing severe respiratory failure was fair. A possible explanation for this finding is that there was a considerable latency between the hospital admission and the event of intubation. Probably, patients with good and bad prognoses present similar characteristics upon ED admission, but the response to the early treatment plays a pivotal role in determining the prognosis. Therefore, scores calculated at the very beginning of the illness do not capture the difference.

A separate mention has to be made for the QUICK score, which was tested in the present study as it was conceived for patients who required low-flow oxygen supplementation in the ED. It is based on simple parameters, immediately available at the bedside, and could be especially useful in the ED, where patients with these characteristics represent a significant proportion. However, its prognostic ability was fair, limiting its applicability. Therefore, we were not able to confirm the results reported by Rodriguez-Nava et al. [23], who enrolled a significantly smaller population compared to the present one. A relevant difference between the QUICK score and the other ones specifically designed for patients with COVID-19 is that it does not take into account the presence of comorbidities. Our patients were all admitted for pneumonia caused by COVID-19, but we are aware that most of all had previous medical conditions, which definitively affected their outcome. Therefore, considering their presence allowed the 4Ca and CALL score to obtain a better prognostic stratification than the QUICK score.

Limitations

The retrospective and single-center design represents a significant limitation of the present study. In fact, these results may not be generalizable in the presence of different local admission and management policies. However, the use of a standardized template for data collection limited the possibility of a subjective interpretation of data.

We decided to test several scores, including both those specifically designed for patients with COVID and commonly used scores in patients with bacterial or viral pneumonia. The choice was suggested by the need to confirm the discrimination ability of the first ones as well as to ascertain whether familiar scores could be applied to these patients. We confirmed that specifically designed scores demonstrated a good discrimination ability in this large European population in the following waves. On the other side, scores for patients do not find a relevant role among these patients.

Conclusions

In conclusion, scores specifically designed for patients with pneumonia induced by SARS-CoV2 allowed a good prognostic stratification in the ED, especially for the risk of in-hospital mortality. They maintained their discrimination ability despite the changes in the clinical features of the pandemic in the following waves. The prevalence of an adverse prognosis significantly increased in higher quartiles of the scores. This finding may give clinicians a definite reference to define the risk, with the possibility of early identified patients at high risk of an adverse prognosis.