Introduction

In 1997, Morgan et al. proposed the first aggregate weighted early warning score (EWS) based on the APACHE II [1]. These systems allocate weighted points according to the level of derangement of each vital sign or other parameter included in the score and are based on the rationale that the risk of death increases with the number of abnormal vital signs [2]. Therefore, combining all vital signs into a single score simplifies the process of working with multiple physiological changes, with the total points increasing in-line with the patient’s severity of illness and corresponding risk of death, cardiac arrest and admission to critical care [1].

The APACHE scores collect data over 24 h and are used to benchmark intensive care unit performance, but not to predict individual patient outcomes [3]. In contrast, an EWS is used to track a patient’s clinical progress and trigger interventions. Morgan et al.’s EWS was never intended to be a predictive score but was designed solely to secure the timely presence of skilled clinical help by the bedside of those patients exhibiting physiological signs compatible with established or impending critical illness. Therefore, EWS performance:

“should not be based on its prediction of outcomes such as death, admission to critical care, ‘do not attempt resuscitation’ or cardiopulmonary resuscitation, but on the number of patients whose clinical course was positively influenced at ward level and who, as a result of EWS use, were not admitted to critical care and did not suffer cardiac arrest or death.” (Morgan and Wright, 2007) [4].

EWS pose several questions. What is their intended purpose? How efficiently do they identify the most unwell patients? What should be the optimal response to them? The aim of this review is to critique the benefits and drawbacks of implementing an EWS model, with particular reference to the United Kingdom’s National Early Warning Score (NEWS). Potential developments for the future are also considered, as well as the role for NEWS in an emergency department (ED).

Development of NEWS

Most EWS have been developed empirically from expert opinion [5]. However, in 2011, Prytherch et al. published a mathematically derived EWS from 198,755 electronically collected vital sign observation sets using the VitalPAC™ system in 35,585 consecutive acute medical patients. This VitalPAC early warning score (ViEWS) had a high discrimination for 24-h in-hospital mortality and outperformed 33 other EWS at prediction windows ranging from 12 to 120 h. ViEWS was also the most efficient trigger to set intervention thresholds, i.e., it identified the most patients predicted to have a poor outcome within a group of patients above a chosen score, the number needed to evaluate (NNE) [6].

In 2012, the Royal Colleges of Physicians in the UK adopted a slightly modified version of ViEWS as NEWS [7], which they linked to an escalation protocol of interventions based on expert opinion (Fig. 1). NEWS was updated in 2017 (NEWS2) to incorporate adjustments for patients with chronic lung disease (Fig. 2) [8]. These changes increased the complexity of the score, reduced its sensitivity [9], reduced its overall predictive performance, and provided no benefit for patients with type 2 respiratory failure [10]. Although it was intended for NEWS2 to be rolled out across the UK, in 2021 only 64.5% of hospitals had adopted it [11], the remaining hospitals preferring to continue using different EWS. Nevertheless, NEWS is by far the most well-validated EWS in clinical use [12]. Evaluations of the performance of NEWS have been reported from independent research groups, in undifferentiated patients, in defined illnesses, and a variety of care settings across the world, in different pre-hospital and hospital settings, and for the management of sepsis and other conditions [13]. The discrimination of NEWS appears to be lower in COPD patients [14], while medication may also change NEWS performance; the area under the receiver operator curve (AUROC) of NEWS for in-hospital mortality was lower in patients with suspected sepsis on hypertensive medication than those not on medication [15]. Despite these and other possible confounders, NEWS remains the best performing and/or the most clinically practical score when assessing patients for the risk of a poor outcome within 24 h [13, 14, 16].

Fig. 1
figure 1

The original NEWS scoring system [7]. This version of NEWS has been superseded by NEWS2 and is no longer recommended by the UK Royal College of Physicians

Fig. 2
figure 2

The current NEWS2 scoring system [8]. This is the version currently recommended by the UK Royal College of Physicians for use in clinical practice

Intended use

The reason NEWS was designed to identify patients at risk of death within 24 h was to ensure that patients who need to be seen and reviewed urgently by an appropriately skilled clinician are efficiently identified. NEWS is an unreliable predictor of mortality beyond 24 h [13]. However, accurate and absolute prediction of outcome was not its intended purpose, and perhaps counterintuitively, if NEWS prompts an effective intervention, an adverse outcome will be prevented, thus reducing the score’s predictive performance [17]. On the other hand, if NEWS is measured too late to prompt a preventive intervention, or if no such intervention exists, its predictive performance will be improved. Therefore, NEWS should be considered analogous to a fire alarm and should be judged by the number of fires it helps prevent or put out, not by the number of buildings accurately predicted to burn to the ground.

Although it was not intended to detect or diagnose specific conditions, a NEWS value ≥ 5 has been shown to detect sepsis better than systemic inflammatory response syndrome (SIRS) criteria or the quick sequential organ failure assessment (qSOFA) [18]. The introduction of NEWS encouraged the measurement of complete sets of vital signs, rather than single occasional observations of one or two. It provides a common language and metric for illness severity in all care settings, “empowers nurses to more easily seek senior medical assistance” and prevents conflict when referring patients for review [19]. The use of one standard across an entire healthcare system has obvious advantages. If clinicians communicate using the same score, the severity of illness, prioritization, transportation, and placement of patients becomes clearer. Tracking NEWS from an established baseline shows whether patients are improving or getting worse, the later necessitating a prompt clinical review and escalation of care [20]. However, tracking NEWS values in post-operative patients is questionable, as it correlates poorly with the patient’s clinical status within the first 24 h following surgery and cannot be used as a replacement for nursing acumen [19]. It should not be used alone for risk stratification as its ability to predict mortality beyond 24 h is not reliable because longer term mortality is greatly influenced by other factors, such as age, comorbidity, and the patient’s functional and physiologic reserve [21].

Response to NEWS—UK escalation protocol

Recording NEWS, no matter how accurate its predictive ability, will not improve outcomes unless a remedial intervention takes place within an effective time frame. Although NEWS and similar early warning scores reliably identify patients at risk of imminent death [13], they do not provide insight into what may be wrong with the patient and what to do about it [22]. Therefore, when a doctor is called to the bedside of a patient with an elevated NEWS, he or she must deconstruct the score to try and work out why the score is elevated. In the UK, NEWS is linked to an escalation policy that reflects the postgraduate training hierarchy; patients with slight elevations are seen by those with the least training and experience, and those with the highest scores are seen by the most experienced (Table 1). This escalation was based on expert opinion and did not consider its workload implications or provide any explanation of its rationale. For example, what is the evidence that explaining and managing mild vital sign changes requires less clinical skill than major vital sign changes? The initial assumption that those with a major derangement of one vital sign would require more attention than those with minor changes in several signs has been disproven; a major change in one sign, such as a rapid heart rate, is often caused by pain and/or anxiety and easily managed [23]. The skills required to resuscitate a severely ill patient may not always be possessed by one physician, no matter how experienced, although if the patient’s desired ceiling of care has not been discussed and documented, a senior decision maker may need to confirm how much further critical care is appropriate.

Table 1 Royal College of Physicians NEWS escalation protocol [8]

While the escalation protocol proposed by the Royal College of Physicians is empiric, arbitrary and not based on evidence, two recent systematic reviews found that when implemented it did improve the recording of vital signs [24, 25]. Moreover, Haegdorens et al. [26] found that even though the protocol was complied with less than 50% of the recommended time, more observations were made in clinically unstable patients and fewer in stable patients.

Response to NEWS—rapid response system alternatives

In 2004, the Institute for Healthcare Improvement (IHI)100,000 Lives Campaign recommended the deployment of rapid response teams to be called to a patient before a cardiac arrest occurred [27]. This advocacy was based on the trials of Medical Emergency Teams (MET) published in the UK and Australia. In 2006, the MET concept was refined to a rapid response system (RRS) that includes two main components, one to recognize that help is needed (afferent limb) and one that calls a response team (efferent limb) to provide the required assistance. In addition, the RRS should provide post hoc process improvement activities (quality improvement limb) and an administration infrastructure (administrative limb) to support the entire system [28].

Only two randomized controlled trials [29, 30] of RRS have been performed and both were flawed with equivocal results. However, because RRS makes intuitive sense, it is unlikely that ethical approval would ever be given for a definitive randomized trial. A systematic review of RRS was published in 2007 [31] with a second in 2015 [32]. Both concluded that RRS reduced cardiorespiratory arrests by around 40% in children and adults, and in-hospital mortality by 12–18%.

RRS efficacy does not appear to be greatly influenced by its staff composition or structure [33]. However, compared to a nurse-led RRS, early intubation, central line placement, and activation of massive transfusion protocols by an intensivist-led RRS in the emergency department (ED) and acute care wards may reduce subsequent cardiac arrests both inside and outside critical care. Therefore, avoiding delays in these and other critical care interventions while waiting for a critical care bed may be key to improving outcomes. Each hour of delay in admission to critical care has been associated with a 1.5% increased risk of death in critical care [34]. Therefore, any system that improves prompt access to critical care is likely to be beneficial.

In the UK, enhanced care has been proposed by The Faculty of Intensive Care Medicine as a potential solution to deliver care for sick patients [35]. Patients who might be suitable include those on non-invasive ventilation or high flow nasal oxygen. Enhanced care provides a bridge between standard ward care and high-dependency care, with better communication between the ward and critical care teams. It is described as a “pragmatic” solution for patients who require more than basic ward care but fall short of admission to critical care [35]. At face-value, this approach has obvious appeal; however, more evidence of its efficacy is required.

Although multiple vital sign changes captured by an EWS, such as NEWS, provide more effective calling criteria for RRS than a single vital sign abnormality [33], several other parameters have also been suggested as RRS triggers. For example, changes in breathing, circulation, mentation, mobility and pain included in the Dutch Early Nurse Worry Indicator Score [36]. These factors may also identify clinical deterioration and the need for intervention before significant changes in vital signs occur. Therefore, there remains a debate on what should trigger a RRS call, and if NEWS is used what value is the most efficient and effective.

What is the optimal NEWS cutoff?

While the risk of death increases as the NEWS value increases, the optimal cutoff point for intervention is unclear. In the absence of unequivocal evidence of benefit from EWS and RRS, the selection of the best NEWS cutoff must still be arbitrary and based on measures of its predictive performance, such as the highest Youden statistic (i.e., sensitivity + specificity − 1), which identifies the point closest to the left upper corner of the receiver-operating characteristic (ROC) curve. Lower NEWS values will have a higher sensitivity and lower specificity and, hence, a higher false alarm rate, whereas higher values will have a lower sensitivity and higher specificity yielding a lower false alarm rate. Unfortunately, the published papers of 24-h mortality suggest considerable variation in the Youden statistic of NEWS at different cutoff values, with little difference in the average values for cutoffs between 3 and 5 points [13]. The available literature shows that if a cutoff of ≥ 7 points is selected, only 4% of patients would trigger an intervention and 44% of patients who die would be missed [13]. Moreover, many of the interventions that are triggered may be futile as some patients will have become unsalvageable. Alternatively, a cutoff of ≥ 1 point will trigger an assessment and/or an intervention in 83% of patients, which would probably not benefit most of them. NEWS ≥ 5 points is the most adopted cutoff and 91% of patients have a NEWS below it; the overall 24-h mortality of these patients is only 0.06%, but a quarter of all deaths within 24 h and more than 40% of all in-hospital deaths occur in patients with a NEWS < 5 [13]. Although NEWS ≥ 5 has been recommended as a flag for sepsis [18], it might not be the optimal score to commence antibiotics [37] or other time-sensitive interventions. Many life-saving interventions, such as anti-coagulation for pulmonary embolus, thrombolysis for stroke, emergency surgery, and rehydration to prevent acute kidney injury, should be given as soon as possible, and well before NEWS reaches 5. Therefore, using a lower cutoff might be associated with better outcomes.

Patients with a NEWS < 3 points will have normal mental status and, on average, only have a 0.07% chance of dying within 24 h; no study has reported them to have a mortality above 0.35% within 10 days, and in most studies, their risk of in-hospital death remained below 1%. Therefore, it has been suggested that measuring a complete set of vital signs in patients with a NEWS < 3 more frequently than once a day is not required [38]. However, these low-risk patients still need some form of ongoing monitoring as they accounted for 9% of all deaths within 24 h and 16% of all in-hospital deaths in absolute terms [13].

Response to NEWS—what are the workload implications?

Ease of use, predictive discrimination and accuracy can be misleading metrics for an EWS, as in clinical practice EWS performance depends on the trade-off between early detection of outcomes against the number of false-positive alerts. When the prevalence of an event is low, even an EWS that has a high sensitivity and specificity will have a high false-positive rate [39]. Most predictive scores are far better at predicting survival than death. Although patients with a NEWS < 3 points may be highly unlikely to die [13], many patients with a high score will also survive. This is because our physiology strives to keep us alive, so when trying to predict the time of death, the sickest patient will often confound the best score and surprise the smartest doctor. Although the number of false alarms depends on the patient population and their likely mortality, the chance of death within 24 h in any patient population is very low, so false alarms are inevitable.

It has been argued that successful rapid response systems must consistently deliver a high response “dose” (> 25 calls per 1000 admissions) [40], as an increase in the response dose is associated with a progressive reduction in cardiac arrest rates, and mature systems should have at least 40 calls per 1000 admissions [41]. The NNE may be the most useful measure of clinical utility and cost-efficiency as it provides the number of patients that need to be evaluated further to detect one adverse outcome. The NNE is the reciprocal of the positive predictive value [42]; although reports in the literature of positive predictive value for 24 h mortality vary considerably, overall for patients with a NEWS ≥ 3 the NNE is 1/0.018 or 55.6, and for patients with NEWS ≥ 7, it is 1/0.059 or 16.9 [13]. Prytherch et al. [6] have proposed an efficiency curve, which plots the number of triggers that would be generated by different values of NEWS. As an example, based on analysis of all results reported in the literature [13], a NEWS of ≥ 3 would generate a trigger in 27% of observations, which would detect 88% of all deaths within 24 h. In contrast, a NEWS of ≥ 5 would generate a trigger in 9% of observations, which would detect 73% of all deaths within 24 h and a NEWS of ≥ 7 would generate a trigger in 4% of observations, which would detect 56% of all deaths within 24 h (Fig. 3). The cutoff value selected would depend on the management required of the conditions likely to be present for each score, depending on the patient population and their clinical setting, and the resources available. However, it is probable that deaths in patients with higher scores are less likely to be preventable and, therefore, selecting a lower score as a cutoff may save more lives.

Fig. 3
figure 3

The proportion of all deaths within 24 h of patients below NEWS cutoff thresholds ranging from 1 to 7 points, and the proportion of all patients equal or above each cutoff [13]

Other NEWS shortcomings

Although NEWS measurement requires trained professionals, is time consuming [43] and prone to calculation error [44], it has been shown to improve vital sign documentation and communication between clinicians [19, 45], both on paper and electronically [44, 46]. Nevertheless, there is concern that EWS may deskill practitioners by removing the need to know their patients, thereby inhibiting the development of professional judgment [47]. There is also concern that EWS use has encouraged the delegation of vital sign monitoring to unqualified support staff, and undermined holistic care and clinical judgment [48]. NEWS is a one size fits all score that may not be appropriate for all conditions. It requires the measurement of five vital signs and a calculation, making it time consuming and prone to error. It may trigger too many alarms, and therefore, alarms should be titrated to specific patients and conditions. For example, slight changes in temperature, blood pressure, heart rate and respiratory rate, which could total an increase in NEWS of three or four points, may be an entirely appropriate physiological response to an illness and harmful to correct by over enthusiastic management [49,50,51,52].

Since NEWS does not consider urine output, it may miss acute kidney injury [53], and by not including diastolic blood pressure might miss early distributive shock. It may not detect stroke or raised intracranial pressure (i.e., Cushing’s triad of an irregular breathing pattern, bradycardia, and hypertension). It does not consider the patient’s usual blood pressure, heart rate, respiratory rate or oxygen saturation and may, therefore, cause undertreatment of relative hypotension in a hypertensive [54], or overuse of supplemental oxygen in chronic lung disease [55]. The amendments from NEWS to NEWS2 were introduced to try and address supplemental oxygen in respiratory patients [8].

The measurement of NEWS is greatly influenced by the accurate recording of oxygen saturation and breathing rate, and the clinical judgment required to determine the need for supplemental oxygen, which may be based on poorly defined subjective opinion [56]. This presents inherent dangers; for example, if oxygen is removed erroneously from patients with a NEWS of 4, their score would drop to 2 but their risk of death would increase. Respiratory rate may be the vital sign that most accurately predicts outcome [57], yet manual values are often inaccurate [58] and correlate poorly with machine-measured values [59]. Although machine-measured values should be the best predictors of deterioration and mortality [59, 60], others have found that manual recordings may be better because they are biased by a nurse’s more accurate intuitive judgment of how sick the patient is [61]. The assessment of mental status required by NEWS may also be inadequate and more thorough screening for delirium has been suggested [62].

In practice, patients with a raised NEWS triggering a clinical review may have one or more vital sign scores adjusted to prevent them triggering continuously. This has the advantage of reducing alerts while waiting for a management plan to work, or where an intervention is deemed unlikely to correct the physiological parameter. The disadvantage of this approach is missing a further deterioration.

If and how should NEWS be adopted?

If and how NEWS should be adopted into clinical practice depends on the clinical setting and the resources and expertise available. NEWS efficiently discriminates patients who are sick, and in need of immediate assessment and treatment, from those who might get sick if preventative measures are not taken. A NEWS of 3 points indicates that a pathophysiologic process is already in play; at this level, patient evaluation may not be urgent but identifying the precise pathophysiology may require considerable skill and expertise. However, the need for intervention becomes urgent once with a NEWS ≥ 5 points, although these patients may not require as much expertise. Although it is unlikely that a patient with a NEWS < 3 is in immediate danger, these patients need to be assessed for potentially preventable adverse events that may happen to them.

The UK escalation protocol, which mirrors the nation’s medical training hierarchy, would not be appropriate for many non-academic hospitals staffed by fully trained clinicians. Almost certainly, no physician working in such a hospital would appreciate not being called until their patient’s NEWS was ≥ 7 points. On the other hand, they would not want to be called every time a patient with COPD exceeded a score ≥ 3, for example. Therefore, agreed local protocols of how to respond to the score, and when to override it, must be developed and subsequently modified in the light of experience. If a hospital already has a RRS, the response to NEWS should probably be graded, according to the resources and expertise available. Unlike a NEWS ≥ 7, the ability to immediately intubate and ventilate a patient will not usually be needed if the calling criteria are as low as a score ≥ 3.

Should NEWS be used in the ED?

The accurate prediction of imminent death, ICU admission and cardiac arrest may not be the most appropriate way to trigger a necessary intervention in all clinical situations. In the ED, many patients without life-threatening conditions, such as those in pain, may need immediate interventions. Simple triage scores, which are quicker and easier to use than NEWS and less prone to calculation error, may be more appropriate determinants of acuity [63].

The discrimination of NEWS appears to be lower if it is measured before treatment is started (i.e., in the pre-hospital or ED setting) [13]. Later measurements, made after everything possible to save the patient has been done, are much more likely to accurately predict the outcome than those made before treatment has started. This does not mean that the use of NEWS in the ED is inappropriate as the purpose of NEWS is to trigger interventions, not to make accurate predictions.

The contemporary, routine ED evaluation of acutely ill patients will always include an ECG, full blood count, urea, electrolytes, and probably liver function tests. Other biomarkers such as lactate, troponin, C-reactive protein, and D-dimers are also available for both risk stratification and diagnosis [64, 65]. Nevertheless, the routine measurement of a complete set of vital signs on all patients should be considered a basic standard of care, and NEWS converts them into a single metric that can be followed throughout a patient’s hospitalization. Unfortunately, little is known about the changes and trends of vital signs and NEWS during the entire course of an acute illness in hospital [66]. What little evidence there is suggests that the trajectories of patients admitted with a low score are different from those admitted with a high score, and that patients should be observed for 12 to 24 h before their outcome can be reliably predicted [67]. Other things, apart from vital signs, need to be considered when monitoring patients’ progress both in the ED and after hospital admission, such as their mobility [68], how they feel, if their skin is hot or cold and clammy, as well as breathlessness, weakness or fatigue, and the presence of bleeding [36].

Are there alternatives?

Although NEWS is the most widely validated risk score for death within 24 h, this should not stop the search for a better alternative. Any further improvement in discrimination for 24-h mortality is unlikely to be of clinical benefit. Comparisons of NEWS with other EWS should now concentrate on demonstrating enhancements of clinical value, such as ease of use and/or automation, ease of implementation, resource, and other cost-savings, and/or outcome improvement [69,70,71]. A machine learning derived algorithm (eCART-lite) using age, heart rate, and respiratory rate and their 24-h trends predicts outcomes slightly better than NEWS [72]. Manipulation of oxygen saturation, inspired oxygen concentration and respiratory rate may predict imminent outcome [73]. Recently, the ROX index, which only requires the measurement of respiratory rate, oxygen saturation and the percentage of inspired oxygen, has been reported to predict the deterioration of COVID-19 patients, measured by composite outcomes, earlier than NEWS [74] and with a higher calibration [75].

While most deaths occur in patients with abnormal vital signs and are probably the result of a pathophysiological process that has already started, deaths that occur in patients with near normal vital signs and normal mental status (i.e., NEWS < 3 points) are more likely to result from an event or process that is yet to happen (e.g., ruptures or blockages, mechanical accidents, cardiac arrhythmias, or iatrogenic misadventures). Therefore, assessing comorbidity, exercise capacity and other measures of physiologic reserve are more likely to anticipate these events and predict a patient’s ability to withstand them than their vital signs [21].

Continuous automatic monitoring of vital signs trends and machine learning hold the promise of more accurate predictions and fewer false alarms [76], and machine learning algorithms based on trends of electronically recorded vital signs may rapidly identify patients who have recovered and are safe to discharge [77]. To date, there are no high quality, large, well-controlled studies of continuous vital sign monitoring that show it is of clinical benefit or cost-effective [78]. However, the clinical performance and value of this technology will be influenced by the intensity and frequency of monitoring. For example, data collected every 5 seconds is likely to have a different prediction window and different clinical use from data collected intermittently every 12–24 h. Complex scores with algorithms derived from logistic regression or machine learning from large data sets are beginning to come into clinical practice. These newer EWS operate in the background, analyzing electronic medical record variables with proprietary analytics for the early detection of patient deterioration. Unfortunately, so far, their clinical performance remains unproven or disappointing [79].

Conclusion

NEWS is the most used EWS and its ability to predict death within 24 h has been well validated in multiple clinical settings throughout the world. It provides a common language for the assessment of clinical severity, which can be used to trigger clinical interventions and assess the response to them. It should not be used as the only metric for risk stratification as its ability to predict mortality beyond 24 h is not reliable and greatly influenced by other factors. A universal escalation protocol for all patients anywhere based on NEWS is not possible and a more flexible and tailored approach is required for different clinical settings, depending on the expertise and resources available [80].

Much of the criticisms of NEWS in the literature are directed not so much at NEWS, but on how to respond to it. A universal escalation protocol for all patients anywhere based on NEWS is not possible and a more flexible and tailored approach is required for different clinical settings, depending on the expertise and resources available. The main drawbacks of NEWS are that measuring it requires trained professionals, is time consuming and prone to calculation error.