Background

Serious clinical adverse events are related to physiological abnormalities and changes in physiological parameters, such as blood pressure, pulse rate, temperature, respiratory rate, level of consciousness, often precede the deterioration of patients [1,2,3,4]. Early intervention may improve patient outcomes and failure to recognise acute deterioration in patients may lead to increased morbidity and mortality [5, 6]. Early warning systems and track and trigger systems (TTS) use routine physiological measurements to generate a score with pre-specified alert thresholds. Their aim is to identify patients at risk of deterioration early and trigger appropriate and timely responses known as escalation of care.

Early warning systems are used increasingly in acute care settings and several countries have developed National Early Warning Scores (NEWS). In Ireland, the National Clinical Guideline on the use of NEWS for adult patients came into effect in 2013 [7]. In the UK, The Royal College of Physicians (RCoP) published a National Early Warning Score in 2012 [8], and the National Institute for Health and Care Excellence (NICE) recommends the use of a TTS to monitor hospital patients [9]. In Australia, the Early Recognition of Deteriorating Patient Program introduced a TTS [10]. Similarly, in the USA, Rapid Response Systems with fixed “Calling Criteria” are recommended to trigger adequate medical response [11].

Many acutely ill patients first present to the emergency department (ED). The ED is a complex environment, distinctly different from other hospital departments. Visits are unscheduled and patients attend with undiagnosed, undifferentiated conditions of varying acuity. Medical staff must care for several patients simultaneously, deal with constantly shifting priorities and respond to multiple demands due to the unpredictable nature of the ED environment [12, 13]. Initial triage determines the priority of patients’ treatments but following triage, continuous monitoring and prompt recognition of deteriorating patients is crucial to escalate care appropriately. Early warning systems are sometimes used as an adjunct to triage for early identification of deterioration in the ED, particularly in situations of crowding [14]. Common early warning systems such as the Modified Early Warning Score (MEWS) [15] are used frequently and validated against specific subgroups of patients (e.g. acute renal failure, myocardial infarction, etc.) but may not be directly transferable to an ED setting [14] where patients present with a variety of unspecified conditions. There was an urgent need to evaluate the use of early warning systems and TTS in the ED.

The review addressed five objectives:

  1. 1.

    To describe the use, including the extent of use, the variety of systems in use, and compliance with systems used, of physiologically based early warning systems or TTS for the detection of deterioration in adult patients presenting to the ED;

  2. 2.

    To evaluate the clinical effectiveness of physiologically based early warning systems or TTS in adult patients presenting to the ED;

  3. 3.

    To describe the development and validation of such systems;

  4. 4.

    To evaluate the cost effectiveness, cost impact and resources involved in such systems;

  5. 5.

    To describe the education programmes, including the evaluation of such programmes, established to train staff in the delivery of such systems.

Methods

Study design & scope

We conducted a systematic review, which we report according to the PRISMA guidelines [16]. The scope is presented in Table 1 using the PICOS (Population, Intervention, Comparison, Outcomes, types of Studies) format.

Table 1 Study selection criteria

Search strategy

Search strategies using keywords and subject terms were developed for four electronic databases: the Cochrane Library (all databases therein up to 4 March 2016), Ovid Medline (up to 4 March 2016), Embase (up to 22 February 2016) and CINAHL (up to 4 March 2016). Additional grey literature resources that were searched included cost-effectiveness resources (n = 4; up to 11 March 2016), guidance resources (n = 6; up to 13 March 2016), professional bodies’ resources (n = 22; up to 11 March 2016), grey literature resources (n = 3; up to 13 March 2016) and clinical trial registries (n = 4; up to 13 March 2016). The searches were not restricted by language, however, only data in English were included. Full details of search strategies are provided in Additional file 1. Details of the search results are presented in Fig. 1 [16].

Fig. 1
figure 1

Search and selection Flow diagram. We searched both electronic databases, cost-effectiveness resources, professional bodies’ websites, clinical trial registries and grey literature resources. Experts in the fields were also contacted. We conducted double independent study selection based on title/abstract and full-text

Study selection & extraction

Two reviewers (FW, and PM or SD) independently screened the titles/abstracts. For additional resources, the information specialist (AC) sifted through the search results for potentially eligible studies. Full text reports from databases and additional resources were assessed for inclusion by two reviewers independently (FW, PM) and discrepancies were resolved by discussion or by involving a third person (DD).

Data extraction forms were designed for each of the six types of studies. Data extraction was completed by two reviewers (FW, PM). Each reviewer extracted data from half of the included reports and 50% of entries were checked by a second reviewer for accuracy. The data elements that were extracted are available in Additional file 2. Two reviewers (FW, and VS or DD) independently assessed the Risk of Bias (ROB)/methodological quality of the included reports, using the instruments listed in Table 2.

Table 2 Instruments used to assess risk of bias and quality of reports

Data analysis

Data were summarised in evidence tables and synthesised narratively for use of warning systems, compliance, effects of systems on patient outcomes, development and validation of systems, and cost-effectiveness studies. For the effects of systems on patient outcomes, a meta-analysis was planned but was not performed due to the limited number of studies (n = 1). For validation studies, we provided results for AUROC (area under the receiver operating characteristic curve) [17]. It equals one for a perfect test and 0.5 for a completely uninformative test. For health economics studies, we planned to examine the cost-effectiveness but no such studies were identified. The GRADE (Grades of Recommendation, Assessment, Development and Evaluation) approach was used to assess the certainty of the body of evidence for effects of systems on patient outcomes.

Results

A total of 6397 records were identified. After removal of duplicates, 1147 database records were screened by title/abstract. Full texts of 83 records were assessed of which 43 studies (44 records) were included. The most common reason for exclusion was ‘non ED setting’ (n = 24). One study in Chinese was identified but the abstract was in English and presented relevant data that we included [18]. Five studies of the 56 screened additional resources were included. The results of the search/selection are presented in Fig. 1.

Risk of bias and quality of reports

Three of the four descriptive studies assessing the extent of early warning system use in EDs were judged to be of fair quality [19,20,21] and one of poor quality [22]. The five descriptive studies assessing compliance with using early warning systems were assessed as being of good [23,24,25] and fair quality [26, 27]. The single effectiveness study was rated as having high ROB [28]. Eight studies that developed and validated a system (in the same sample) were rated as having low (n = 6) [29,30,31,32,33,34] and unclear (n = 2) [35, 36] ROB. The 28 studies that validated an existing system in a new cohort were judged as having an overall low (n = 16) [37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52], unclear (n = 9) [18, 53,54,55,56,57,58,59,60] and high ROB (n = 3) [61,62,63]. The domains of selection bias and factor measurement were most commonly rated as unclear ROB because they did not specify the methods of sampling (n = 10) [18, 36,37,38, 47, 48, 54, 58,59,60] or did not state cut-off values used (n = 12) [31, 33,34,35, 42, 46, 49, 57, 58, 60, 61, 63]. One study also did not pre-specify the outcomes clearly [59]. One scoping review of predictive ability of early warning systems was rated of good quality [64]. We have provided full details of the ROB and quality of reports in Additional file 3.

Extent of use and compliance with early warning systems and track and trigger systems (1)

Four studies described the use of early warning systems within the ED and five studies examined compliance. The studies examining the extent of use collected data from medical records [19], a survey [20], a web-survey [21], and through participatory action research [22]. Considine et al. [19] described a pilot study of a 4-parameter system in the ED of a hospital in Australia and found that nurses made 93.1% of activations, the most common reasons being respiratory (25%) and cardiac (22.5%) and the median time between documenting physiological abnormalities and ED early warning system activation was 5 min (range 0–20). A survey in 2012 of 145 (57% response rate) clinical leads of EDs in the UK showed that 71% used an early warning system, most commonly the MEWS (80%) [20]. A survey in seven jurisdictions in Australia, found that 20 of 220 hospitals had a formal rapid response system in the ED but the prevalence of early warning systems in EDs was not reported [21]. Coughlan et al. [22] reported insufficient information in a conference abstract. The findings of these four studies demonstrate that multiple early warning systems are available and the extent of their use in the ED may vary geographically but limited data precludes comparisons between countries.

Three retrospective studies [23,24,25], one prospective study [27] and one audit (before and after early warning system implementation) [26] examined compliance with recording early warning system parameters. There was large variation in compliance ranging from 7% to 66% and factors such as patients’ triage category, age, gender, number of medications, length of hospital stay and the level of crowding in ED affected compliance with early warning systems [24]. Christensen et al. [23] reported a rate of 7% (22/300) of calculated scores in the clinical notes; however, 16% of records included all five vital signs. Heart rate (HR), shortness of breath (SOB) and loss of consciousness (LOC) were reported in 90–95% of records. Compliance with escalation of care varied; all nine patients that met the trauma call activation criteria had triggered a trauma call but only 24 of the 48 emergency call activation criteria had been responded to. Austen et al. [25] found a higher compliance with 66% of records containing an aggregate score, although only 72.6% were accurate. In an audit, the pre-implementation rate (30%) of abnormal vital sign identification was significantly lower than the post-implementation (53.5%) rate (p = 0.007) but no details of the implementation strategy were described [26]. Wilson et al. [27] compared the TTS scores recorded in charts with scores calculated retrospectively and found that 60.6% of charts contained at least one calculated TTS score but 20.6% (n = 211) were incorrect. This was mainly because of incorrect assignment of the score to an individual vital sign, which led to underscoring and reduced escalation activation. Hudson et al. [26] found that using a standardised emergency activation chart resulted in a higher percentage of abnormal vital signs recording (p = 0.007).

Effects of early warning systems and track and trigger systems (2)

One non-randomised controlled design compared the effect of the MEWS (n = 269), recorded by emergency nurses every four hours, with clinical judgment (n = 275) in patients who are waiting for in-patient beds in the ED of a large hospital in Hong Kong [28]. It found that the MEWS might increase the rate of activating a critical pathway (1 per 10 patients with a MEWS >4 versus 1 in 20 patients based on clinical judgement) but might make little or no difference to the detection of deterioration or adverse events (0.4% is both groups). We assessed the overall body of evidence as very low quality (GRADE) due to serious imprecision and high ROB (Additional file 3).

Development & Validation studies of early warning systems and track and trigger systems (3)

A scoping review by Challen et al. [64] identified 119 tools related to outcome prediction in ED; however, the majority were condition-specific tools (n = 94). They found the APACHE II score to have the highest reported AUROC curve (0.984) in patients with peritonitis.

Of the 36 primary development and/or validation studies, 13 were retrospective, 22 were prospective studies and one was a secondary analysis of a Randomised Controlled Trial (RCT) [48]. Eight studies developed and validated (in the same sample) an early warning system, while 28 validated an existing system in a different sample. Three studies included a random sample [30, 39, 43] and participants in the remaining studies were recruited consecutively or the sampling strategy was not stated clearly.

A total of 28 early warning systems were developed and/or validated. Churpek et al. [65] classified early warning systems into single-parameter systems, multiple-parameter systems and aggregate weighted scores. The early warning systems examined in the studies included primarily aggregate weighted scores (Table 3).

Table 3 Types of scores developed and/or validated in the included studies

The most common outcomes examined were in-hospital mortality (n = 21), admission to ICU (n = 12), mortality (not specified where or during a specific follow up time frame possibly beyond hospital discharge) (n = 11), hospital admission (n = 7), and length of hospital stay (n = 5). Only one study measured the number of patients identified as critically ill as outcome [50]. Overall, the APACHE II score, PEDS, VIEWS-L, and THERM scores appeared relatively better at predicting mortality and ICU admission. The MEWS was the most commonly assessed tool and the cut-off value used was 4 or 5, with the exception of Dundar et al. [41] who found an optimal cut-off of 3 for predicting hospitalisation. To synthesise the findings, studies were categorised into three groups according to the degree of differentiation of the ED patient group: a patient group in a specific triage category(ies), a patient group with a certain (suspected) condition or an undifferentiated patient group. Findings are presented in Tables 4, 5 and 6 and full details are provided in Additional file 4.

Table 4 Evidence table: Development and validation studies – Patient groups differentiated by triage category
Table 5 Evidence table: Development and validation studies – Patient groups differentiated by (suspected) condition
Table 6 Evidence table: Development and validation studies – Undifferentiated patient groups

Twelve of the 36 validation studies only included participants in (a) specific triage category(ies) (Table 4). Triage systems varied but included categories of patients that were critically ill (e.g. Manchester triage system I-III, Patient acuity category scale 1 or 2) or were admitted to the resuscitation room. In predicting mortality, the AUROC for the MEWS ranged from 0.63 to 0.75 [36, 37, 44, 57], from 0.70–0.77 for REMS [31, 37], 0.77–0.87 for NEWS [53], 0.90 for PEDS, 0.83 for APACHE II, and 0.77 for RTS [31]. Predicting ICU admission, the AUROC were 0.54 [37] and 0.49 [44] for MEWS and 0.59 for REMS [37], while to predict hospital admission the AUROC for NEWS was 0.66–0.70 [53]. Cattermole et al. [31] and Cattermole et al. [35] used a combined outcome of death and ICU admission and found an AUROC of 0.76 and 0.73 for MEWS, 0.90 and 0.75 for PEDS, 0.73 for APACHE II, 0.75 for RTS, 0.70 and 0.70 for REMS, 0.75 for MEES, 0.71 for NEWS, 0.70 for SCS and 0.84 for THERM. One study assessed the prediction of septic shock by NEWS (AUROC 0.89) [49].

Eleven other studies (12 records; Table 5) included a differentiated patient group with a specific (suspected) condition. Five studies only included patients with (suspected) sepsis [29, 32, 38, 40, 51, 59]. Other study populations were restricted to patients with trauma [46], suspected infection [45, 52], pneumonia [47] or who had signs of shock [48]. Assessing the predictive ability of systems to predict mortality, MEWS had an AUROC of 0.61 [38] and 0.72 [51], CCI of 0.65 [38], mREMS of 0.80 [45], NEWS of 0.70 [47], NEWS-L of 0.73 [47], VIEWS-L of 0.83 [46], SAPS II of 0.72 [48] and 0.90 [52], MPMO II of 0.69 [48], LODS of 0.60 [48], PIRO of 0.71 [59], APACHE II of 0.71 [59] and 0.90 [52], and SOFA of 0.86 [52].

The remaining 13 studies assessed early warning systems in an undifferentiated ED population (Table 6). The AUROC to predict mortality was 0.71 [42], 0.73 [43], and 0.89 [41] for MEWS, 0.76 for MEWS plus [43], 0.91 [33] and 0.85 [34] for REMS, 0.87 [33] and 0.65 [34] for RAPS and 0.90 for APACHE II [33].

We did not identify studies that examined the cost effectiveness of early warning systems or TTS in EDs, nor did we find any studies evaluating related educational programmes (objectives (4) and (5)).

Discussion

Multiple early warning systems were identified but the extent to which they are used in the ED seems to vary across countries for which data were available in the nine included descriptive studies. Moreover, incorrect score calculation was common. Compliance with recording aggregate scores was relatively low although the vital signs HR and BP were usually recorded. This finding emphasises the importance of effective implementation strategies. However, we did not identify any studies examining educational programmes for early warning systems. Existing guidelines regarding the use of early warning systems to monitor acute patients in hospital do include educational tools but are not specific to the ED [7, 8]. Using early warning systems in the ED would likely require contextual adaptation to the ED environment, for example broadening of the ranges of physiological parameters to reflect acutely unwell patients’ physiology. In implementing an early warning system in the ED, staff training could consist of a joined core package applicable to any service supplemented by an ED specific component. The performance of early warning systems in the ED will also depend on the time patients spend in the ED, which varies substantially between countries.

Evidence from 36 validation and development studies demonstrated that early warning systems used in ED settings seem to be able to predict adverse outcomes, based on the AUROC, but there is variability between studies. All but two early warning systems were aggregated scores, which limited the ability to compare between single, multiple parameter and aggregated scores. The APACHE II score, PEDS, VIEWS-L, and THERM scores were relatively best at predicting mortality and ICU admission, providing excellent discrimination ability (AUROC >0.8) [66]. The MEWS was the most commonly assessed system but findings suggest a relatively lower ability to predict mortality and ICU admissions compared to the four scores mentioned above, with only some studies indicating acceptable discriminatory ability (AUROC >0.7) and other studies indicating a lack of discriminatory ability (AUROC <0.7) [66], especially for the outcome of ICU admission. The exception was one low ROB study that found excellent discriminatory ability of MEWS for the outcome in-hospital mortality (AUROC 0.89) [41]. This was the only study that examined the MEWS in an undifferentiated sample, which could contribute to this observed difference. However, the ability of early warning systems to predict adverse outcomes does not mean that they are effective at preventing adverse outcomes through early detection of deterioration. Only one study addressed this question and it found that the introduction of an early warning system may have little or no difference in detecting deterioration or adverse events; however, the evidence was of very low quality making it impossible to draw any strong conclusions. The effectiveness of early warning systems also highly depends on an appropriate response to such systems. If effective, the role of early warning systems in the ED could primarily be to assist with patient and resource management in the post-triage phase, when the time for patients to see a treating clinicians is prolonged (overcrowding). They could also provide additional information to help determine who to refer to critical care admission or to guide discharge from the ED, but this is currently not generally their purpose in places where they have been implemented in the ED. Recent studies also show that additional laboratory data (e.g. D-dimer, lactate) might enhance the performance of early warning systems in predicting adverse outcome [67, 68].

The cost effectiveness of early warning systems remains unclear. While it is clear that implementing early warning systems requires a healthcare resource investment, the degree to which such systems may or may not result in cost savings remains unclear, particularly since the effectiveness of early warning systems in the ED is uncertain. The limited evidence base suggests that early warning systems might be effective in, for example, identifying deteriorating patients. This could result in improved patient outcomes and, should these effects exist, the potential healthcare cost savings could go towards funding, at least to some degree, their implementation. While this theory is open to question, it highlights the need to conduct primary research studies that directly evaluate their cost effectiveness. Such studies should focus on the monitoring of resource use, costs and patient outcomes in order to determine whether early warning systems are likely to deliver good value for money.

Limitations

We did not translate reports although only one non-English study was identified. We could not pool findings of the validation studies due to clinical heterogeneity; however, the AUROC were provided to inform accuracy of the models. Strengths of the review lie in its thorough search strategy, its scope and inclusion of different designs to best address the objectives and in its rigorous methodology with dual independent screening and quality assessment.

Conclusions

There are a lack of high quality RCTs examining the effects of using early warning systems in the ED on patient outcomes. The cost-effectiveness of such interventions, compliance, the effectiveness of related educational programmes and barriers and facilitators to implementation also need to be examined and reported as presently there is a clear lack of such evidence.