Introduction

The novel coronavirus SARS-19 produces the newly identified disease ‘COVID-19’ in patients with symptoms (Coronaviridae Study Group of the International Committee on Taxonomy of Viruses [1]) which was declared as a pandemic on 11-March-2020 that has challenged healthcare systems worldwide.

COVID-19 patients admitted to hospital can develop severe disease with life-threatening respiratory and/or multi-organ failure [2, 3] with a high risk of mortality in part due to the lack of effective treatment for the underlying disease in the early phase of the pandemic. The appropriate early assessment and management of patients with COVID-19 is important in ensuring high-quality care including isolation, escalation to critical care or palliative care. Early assessment of the risk of COVID-19 is crucial to this process. Presently this involves clinical judgment based on the patients presenting history, signs and symptoms and viral nucleic acid testing can have a 24-hour turnaround time [4].

In England, the patient’s vital signs are monitored and summarised into a National Early Warning Score (NEWS) [5]. NEWS has gained widespread interest from across the world, including Europe, India, the USA (and the US Navy) [6]. NEWS offers a standardised approach to assessing acute illness and is derived from seven physiological variables or vital signs – respiration rate, oxygen saturations, any supplemental oxygen, temperature, systolic blood pressure, heart rate and level of consciousness (Alert (A), Voice (V), Pain (P), Unresponsive (U)) – which are routinely collected by nursing staff as an integral part of the process of care.

In December 2017, an update to NEWS (NEWS2) was published [6] that extends the level of consciousness from AVPU to ACVPU, where C represents new confusion or delirium and is allocated 3 points (the maximum for a single variable). NEWS2 also offers two scales for oxygen saturation (scale 1 and scale 2) which accommodates patients with hypercapnic respiratory failure who have clinically recommended oxygen saturation of 88–92 %.

We posit that NEWS and NEWS2, and their subcomponents, may be useful in predicting COVID-19 risk. So we investigate the performance of multiple enhanced NEWS and NEWS2 models in terms of discrimination and calibration in predicting the risk of COVID-19. We are using the first electronically recorded NEWS and NEWS2 datasets and are available within 24 h of admission. This means that the models we investigate require no additional data collection from staff and can be readily automated in electronic health records.

Methods

Setting and data

Our cohorts of unplanned medical admissions are from two acute hospitals which are approximately 65 km apart in the Yorkshire and the Humber region of England – Scarborough hospital (SH) (n ~ 300 beds) and York Hospital (YH) (n ~ 700 beds), managed by York Teaching Hospitals NHS Foundation Trust. We selected these hospitals because they had electronically recorded NEWS2 scores, which are collected as an integral part of the patient’s process of care and were agreeable to the study. Since NEWS is a subset of NEWS2, we developed NEWS and NEWS2 based models because NEWS is still in widespread use.

We included all consecutive adult (age ≥ 18 years) unplanned medical admissions discharged during 3 months (11 March 2020 to 13 June 2020), with electronic NEWS2 data. For each admission, we obtained a pseudonymised patient identifier, patient’s age (years), sex (male/female), discharge status (alive/dead), admission and discharge date and time, diagnoses codes based on the 10th revision of the International Statistical Classification of Diseases (ICD-10), NEWS2 (including its subcomponents respiratory rate, temperature, systolic pressure, pulse rate, oxygen saturation, oxygen supplementation, oxygen scales 2 (yes/no), and alertness including confusion). The diastolic blood pressure was recorded at the same time as systolic blood pressure. Historically, diastolic blood pressure has always been a routinely collected physiological variable on vital sign charts and is still collected where electronic observations are in place. Since NEWS is a subset of NEWS2, we derived NEWS from NEWS2. NEWS and NEWS2 produce integer values that range from 0 (indicating the lowest severity of illness) to 20 (the maximum NEWS2 value possible) (see Supplemental Digital Content - Table S1 and S2). The index NEWS/NEWS2 was defined as the first electronically recorded NEWS/NEWS2 within ± 24 h of the admission time. We excluded records where the index NEWS/NEWS2 was not within ± 24 h or was missing/not recorded at all (see Supplemental Digital Content - Table S3). We searched primary and secondary ICD-10 codes for ‘U071’ for identifying COVID-19. Although we used the ICD-10 code ‘U071’ to identify records with COVID-19, it is in 95 % agreement with polymerase chain reaction (PCR) swab tests result.

Statistical analyses

We began with exploratory analyses including box plots that showed the relationship between covariates and risk of COVID-19 and line plots showed the relationship between age, vital signs, NEWS2 and risk of COVID-19. We developed three logistic regression models based on NEWS and NEWS2 separately for predicting the risk of COVID-19. The NEWS2-based models (M0’, M1’, M2’) use the index or first electronically recorded NEWS2 dataset within ± 24 h of admission. Model M0’ uses NEWS2 alone; Model M1’ extends M0’ with age and sex and Model M2’ extends M1’ with all the subcomponents of NEWS2 plus diastolic blood pressure. Equivalent models (M0, M1, M2) using NEWS were also developed but model M2 excluded two parameters that are in NEWS2 but no in NEWS - oxygen flow rate and scale 2 (yes/no). A log-transformation was used for variables with right-skewed distributions, i.e. for respiratory rate, pulse rate, systolic and diastolic blood pressure.

We developed all models using YH data (as development dataset) and externally validated their performance on SH data (as validation dataset).

We report discrimination and calibration statistics as performance measures for these models [7].

Discrimination relates to how well a model can separate, (or discriminate) between patients with and without COVID-19 and is given by the area under the Receiver Operating Characteristics (ROC) curve (AUC) or c-statistic. The ROC curve is a plot of the sensitivity, (true positive rate), versus 1-specificity, (false positive rate), for consecutive predicted risks. A c-statistic of 0.5 is no better than tossing a coin, whilst a perfect model has a c-statistic of 1. In general, values less than 0.7 are considered to show poor discrimination, values of 0.7 to 0.8 can be described as reasonable, and values above 0.8 suggest good discrimination [8]. The 95 % confidence interval for the c-statistic was derived using DeLong’s method as implemented in the pROC library [9] in R [10]. Calibration is the relationship between the observed and predicted risk of COVID-19 (24) and can be readily seen on a scatter plot (y-axis observed risk, x-axis predicted risk). Perfect predictions should be on the 45° line.

The predictive model performance is usually overestimated if the same data is used for testing model performance. There are several internal validation methods, which aimed to provide a more accurate estimate of predictive model performance. We used bootstrapping as an internal validation approach to assess the discrimination and calibration for all the models [11, 12]. The overall statistical performance was assessed using the scaled Brier score which simultaneously incorporates discrimination and calibration [7]. The Brier score is the squared difference between actual outcomes and predicted risk of COVID-19, scaled by the maximum Brier score such that the scaled Brier score ranges from 0 to 100 %. Higher Brier scores indicate superior models. We further assess discrimination and calibration-in-the-large and calibration slopes in the validation data.

The clinical cut-off of NEWS and NEWS2 is 5+ (Supplemental Digital Content - Figure S1). This is the recommended threshold for detecting deteriorating patients and sepsis [13, 14]. Therefore, we assessed the sensitivity, specificity, positive and negative predictive values and likelihood ratios for these models at NEWS/NEWS2 thresholds of 5+ [15]. We further compared the net benefit for all models, which may inform the utility of the models in routine clinical practice [16]. The net benefit is calculated at a particular threshold probability \({p}_{t}\) with total sample size \(N\) as follows:

$$Net benefit= \frac{True positives}{N}-\frac{False positivies}{N}\times \frac{{p}_{t}}{1-{p}_{t}}$$

The model with the highest net benefit metric has the highest clinical value.

We calculated the minimum sample size using the R package pmsampsize [17]. We found 930 (93 events) is the minimum required sample size with number of predictors = 21, R2 = 0.182, prevalence = 0.10, shrinkage > 0.9, margin absolute prediction error (MAPE) = 0.05 [18]. We followed the TRIPOD guidelines for reporting model development and validation [19]. We have deployed our best performing models - M2’ and M2 - as a calculator for predicting the risk of COVID-19 https://covidcalc.shinyapps.io/calc/. We used Stata [20] for data cleaning and R [10] for statistical analysis.

Results

Cohort characteristics

The number of unplanned discharges was 6444 over 3 months. We excluded 36 (0.6 %) records because the index NEWS2 was not recorded within ± 24 h of the admission time or no recorded at all (see Supplemental Digital Content - Table S3).

The characteristics of the admissions included in our study are shown in Table 1. Emergency admissions in the validation dataset were older than those in the development dataset (69.6 years vs. 67.4 years), less likely to be male (49.5 % vs. 51.2 %), had higher index NEWS (2.8 vs. 2.5) and NEWS2 (3.2 vs. 2.8) scores, higher prevalence of COVID-19 (11.0 % vs. 8.7 %) but similar in-hospital mortality (8.4 % vs. 8.2 %). See accompanying scatter and boxplots in Supplemental Digital Content - Figure S2, S3, S4 and S5.

Table 1 Characteristics of emergency medical admissions discharged during 3 months in development and validation datasets from YH and SH hospitals

We assessed the performance of all models in predicting the risk of COVID-19 in emergency medical admissions (see Table 2 and Fig. 1). The c-statistics for predicting COVID-19 for Model M2’ was the best in class in the development dataset (M0’=0.71; M1’=0.72, M2’: 0.78) (see Table S4 in supplementary material) and the validation dataset (M0’=0.65; M1’=0.67, M2’: 0.72) (see Table 2). The c-statistics for predicting COVID-19 for M0’,M1’,M2’ models was similar to M0,M1,M2 models in the development and validation datasets. Furthermore, all models are shown statistically significant improvement using likelihood ratio tests (see Table S5 in supplementary material). 

Table 2 Performance of NEWS (M0, M1, M2) and NEWS2 (M0’,M1’,M2’) models for predicting the risk of COVID on admission for validation dataset
Fig. 1
figure 1

Receiver Operating Characteristic curve for NEWS models (M0,M1,M2) and NEWS2 models (M0’,M1’,M2’) in predicting the risk of COVID-19 following admission in the validation dataset after correcting for calibration-in-the-large. Note: predicted probability at NEWS or NEWS2 threshold ≥ 5 (sensitivity, specificity) is shown for all models

Table 3 includes the sensitivity, specificity, positive and negative predictive values for all models for predicting COVID-19 for validation dataset. NEWS2 models (M0’,M1’,M2’) had the highest sensitivity but lower specificity compared to NEWS models (M0,M1,M2) because the predicted probability at NEWS2 ≥ 5 (0.116) is lower than at NEWS ≥ 5 (0.13) in the development dataset. Likewise, the performance for development dataset is shown in Table S6.

Table 3 Sensitivity analysis of NEWS (M0, M1, M2) and NEWS2 models (M0’, M1’, M2’) for predicting the risk of COVID at threshold NEWS/NEWS2 ≥ 5 (predicted probability of model M0 = 0.130 and M0’ = 0.116 using development dataset) for validation dataset after correcting the calibration-in-the-large

Internal validation of these models is shown in Supplemental Digital Content - Figure S6.

The calibration slope was similar and less than one across all NEWS2 (M0’,M1’,M2’) and NEWS (M0,M1,M2) models, which shows overfitting (see Table 2; Fig. 1; Supplemental Digital Content - Table S4 & Figure S7).

However, model M2’ had the highest sensitivity (M2’: 57 % (95 %CI 51-63 %) vs. M0’ 44 % (95 %CI 38-50 %) and M1’ 53 % (95 %CI 47-59 %)) and the highest specificity (M2’:76 % (95 %CI 74-78 %) vs. M0’:75 % (95 %CI 73-77 %) and M1’:72 % (95 %CI 70-74 %)) for the validation dataset at NEWS2 ≥ 5 (see Table 3).

Likewise, model M2 had the highest sensitivity (M2:52 % (95 %CI 46-58 %) vs. M0:33 % (95 %CI 27-38 %) and M1:44 % (95 %CI 38-50 %)) but lowest specificity (M2: 79 % (95 %CI 77-81 %) vs. M0:83 % (95 %CI 81-85 %) and M1:79 % (95 %CI 77-80 %)) for the validation dataset at NEWS2 ≥ 5 (see Table 3).

Figures 2 and 3 show model calibration improved across the models and that models M2’ and M2 are well-calibrated after correcting for the baseline difference.

Fig. 2
figure 2

External validation of NEWS models (M0, M1, M2) for predicting the risk of COVID-19. NB: We limit the risk of COVID-19 to 0.30 for visualisation purposes because beyond this point, we have few patients. The grey solid line shows ideal calibration. The black solid line shows the observed calibration along with 95 % confidence intervals in black dashed lines

Fig. 3
figure 3

External validation of NEWS2 models (M0’, M1’, M2’) for predicting the risk of COVID-19. NB: We limit the risk of COVID-19 to 0.30 for visualisation purposes because beyond this point, we have few patients. The grey solid line shows ideal calibration. The black solid line shows the observed calibration along with 95 % confidence intervals in black dashed lines

Figure 4 shows that model M2’/M2 had the highest net benefit (M2’/M2:0.04 vs. M1’/M1:0.03 and M0’/M0:0.02). As the unit of net benefit is true positives, the model M2’/M2 identified 4 out of 100 COVID-19 admissions, compared to M1’/M1 (3 out of 100) and M0’/M0 (2 out of 100) (see Supplemental Digital Content - Figure S8 for development dataset).

Fig. 4
figure 4

Net Benefit for NEWS models (M0, M1, M2) and NEWS2 models (M0’,M1’,M2’) in predicting the risk of COVID-19 in the validation dataset

Nevertheless, NEWS/NEWS2 ≥ 5 is the worst performing choice compared to all models (NEWS2: M0’,M1’,M2’ & NEWS: M0,M1,M2).

Discussion

In this study, we investigated the performance of multiple enhanced NEWS2 models in terms of discrimination and calibration in predicting the risk of COVID-19. Model M0’ uses NEWS2 alone; Model M1’ extends M0’ with age and sex and Model M2’ extends M1’ with all the subcomponents of NEWS2 plus diastolic blood pressure. Equivalent models (M0, M1, M2) using NEWS were also developed but model M2 excluded two parameters that are in NEWS2 but not in NEWS - oxygen flow rate and scale 2 (yes/no).

NEWS2 models (M0’, M1’, M2’) were more sensitive but less specific than NEWS models (M0, M1, M2). Models M2 and M2’ were the best in class, with the highest c-statistics (0.77 and 0.72 respectively). The high negative predictive value suggests models M2 and M2’ may be particularly useful in ruling out COVID-19 early in the patients unplanned admission which is clinically useful because testing for COVID-19 using viral nucleic acid testing is more time consuming than measuring and recording the patients vital signs data sets as defined by NEWS/NEWS2.

A recent systematic review identified five models to detect COVID-19 infection in symptomatic individuals with c-statistics that ranged from 0.87 to 1 [21]. However, despite these high c-statistics, the review authors cautioned against the use of these models in clinical practice because of the high risk of bias and poor reporting of studies which are likely to have led to optimistic results [21]. For example, the majority of studies are with smaller sample size; the lack of external validation and calibration was rarely assessed [21]. Our study addresses these shortcomings. While most of the studies reported an insufficient sample size [21], our study was sufficiently large for developing and validating the models in predicting the risk of COVID-19 [18]. The models were developed using data from one and validated using data from another hospital. We rigorously assessed the internal calibration using bootstrapping approach [22]. Furthermore, calibration slope and calibration-in-the-large are assessed and corrected.

The main advantages of our NEWS/NEWS2 models are that they are designed to incorporate data that are already available in the patient’s electronic health record and so place no additional data collection or computational burden on clinicians and can also be readily automated. Nonetheless, we emphasize that our NEWS/NEWS2 models are not designed to replace clinical judgement. They are intended and designed to support, not subvert, the clinical decision-making process and can be always overridden by clinical concern [5, 23]. The working hypothesis for our models is that their use may enhance situational awareness of COVID-19 by processing information already available without impeding the workflow of clinical staff, especially as our approach offers a faster and less expensive assessment of COVID-19 risk than current laboratory tests. This may be more practical to use in low resource settings or where large numbers of people are to be assessed.

There are limitations in relation to our study. We identified COVID-19 based on ICD-10 code ‘U071’ which was determined by clinical judgment and/or swab test results and so our findings are constrained by the accuracy of these methods [24, 25]. Moreover, we do not have the timing of diagnosis in our data and so we are unable to determine if patients arrived with COVID-19. Our two hospitals are part of the same NHS Trust and this may undermine the generalisability of our findings, and so further external validation may be worthwhile. Another issue related to generalisability is to determine the extent to which mass vaccinations for COVID-19 impact on the accuracy of our models. Finally, an important next phase of this work is to field-test our models by carefully engineering them into routine clinical practice [26, 27] to see if they do support the earlier detection and care of COVID-19 in emergency medical patients without unintended adverse consequences.

Conclusions

NEWS model M2 and NEWS2 model M2’ appear to provide reasonably accurate predictions of the risk of COVID-19 using routinely collected on-admission NEWS/NEWS2 datasets. The extent to which these models are clinically useful as an early warning system for COVID-19 at the time of admission should be studied.