Intensive Care Medicine

, Volume 33, Issue 4, pp 667–679

Systematic review and evaluation of physiological track and trigger warning systems for identifying at-risk patients on the ward

  • Haiyan Gao
  • Ann McDonnell
  • David A. Harrison
  • Tracey Moore
  • Sheila Adam
  • Kathleen Daly
  • Lisa Esmonde
  • David R. Goldhill
  • Gareth J. Parry
  • Arash Rashidian
  • Christian P Subbe
  • Sheila Harvey
Systematic Review

DOI: 10.1007/s00134-007-0532-3

Cite this article as:
Gao, H., McDonnell, A., Harrison, D.A. et al. Intensive Care Med (2007) 33: 667. doi:10.1007/s00134-007-0532-3

Abstract

Objective

Physiological track and trigger warning systems (TTs) are used to identify patients outside critical care areas at risk of deterioration and to alert a senior clinician, Critical Care Outreach Service, or equivalent. The aims of this work were: to describe published TTs and the extent to which each has been developed according to established procedures; to review the published evidence and available data on the reliability, validity and utility of existing systems; and to identify the best TT for timely recognition of critically ill patients.

Design and setting

Systematic review of studies identified from electronic, citation and hand searching, and expert informants. Cohort study of data from 31 acute hospitals in England and Wales.

Measurements and results

Thirty-six papers were identified describing 25 distinct TTs. Thirty-one papers described the use of a TT, and five were studies examining the development or testing of TTs. None of the studies met all methodological quality standards. For the cohort study, outcome was measured by a composite of death, admission to critical care, ‘do not attempt resuscitation’ or cardiopulmonary resuscitation. Fifteen datasets met pre-defined quality criteria. Sensitivities and positive predictive values were low, with median (quartiles) of 43.3 (25.4–69.2) and 36.7 (29.3–43.8), respectively.

Conclusion

A wide variety of TTs were in use, with little evidence of reliability, validity and utility. Sensitivity was poor, which might be due in part to the nature of the physiology monitored or to the choice of trigger threshold. Available data were insufficient to identify the best TT.

Keywords

Systematic review Critical care Critical illness Scoring systems 

Introduction

The use of physiological track and trigger warning systems (TTs) outside critical care areas seeks to ensure timely recognition of all patients with potential or established critical illness and to ensure timely attendance by appropriately skilled staff [1].

TTs use periodic observation of selected basic vital signs (the ‘tracking’) with predetermined criteria (the ‘trigger’) for requesting the attendance of more experienced staff. In most cases TTs are drawn from routine observations of vital signs carried out by ward staff, allowing a large number of patients to be monitored without incurring major additional workload. A variety of physiological scoring systems exist to detect patients whose condition is deteriorating.

TTs have predominantly evolved as a means to alert the Critical Care Outreach Service (CCOS) in the UK or the Medical Emergency Team (MET) in Australia, but the concept is rapidly gaining momentum worldwide. In the USA, Rapid Response Teams are a key component of the Institute for Healthcare Improvement 100,000 Lives Campaign [2], and the International Partnership for Acute Care Safety (IPACS) initiative, endorsed by the World Health Organisation, is shortly to commence a global study to investigate antecedents to cardiac arrest, death and emergency intensive care admission. A wide variety of TTs are in use [1] but, as yet, there is no clear evidence to indicate which is best. Furthermore, the extent to which existing systems are reliable and valid tools for detecting patients with impending critical illness is not known.

We performed a systematic review of published papers and an analysis of available data from UK hospitals to identify and describe the range of published TTs, as used by a CCOS or equivalent, to explore the extent to which each system has been developed according to established procedures, to review all aspects of the reliability, validity and utility of existing systems, e.g. their sensitivity, specificity and predictive validity, and if possible to identify the best TT for timely recognition of potential or established critical illness.

Initial results from the analysis of available data were presented at the 26th International Symposium on Intensive Care and Emergency Medicine, Brussels, March 2006 [3].

Methods

Systematic review

Inclusion criteria

Papers were included if they were published in full and in English, and described the use of a TT or were concerned with the testing or development of TTs, based on a population of adult inpatients outside of critical care areas.

Search strategy and data sources

The following electronic databases were searched from 1990 to 2004: MEDLINE, MEDLINE in Progress, EMBASE, CINAHL, PsycInfo, Cochrane Library and Web of Science. A broad search plan was employed with free text searching using keywords in title, abstract or full text where available. Search terms were also included to describe the variety of forms of critical care outreach service. Full details of all database searches can be found in the electronic supplementary material (ESM). Citation searches were performed on Web of Science for two of the original key articles on MET [4] and early warning scores [5]. In addition, the following journals, known to the researchers to have previously published articles on TTs, were hand-searched from 1999 to 2004: Anaesthesia, British Journal of Anaesthesia, Critical Care Medicine, New England Journal of Medicine. Reference lists of key reports [1, 6, 7, 8, 9] were also reviewed, as were the reference lists of all review articles retrieved.

Abstracts of all papers identified through any of the search strategies described above were reviewed against the inclusion criteria, and a list of all the potentially relevant papers was sent to relevant professional bodies and known experts in intensive care (see ESM) with a request to review the list for completeness.

The full text of all papers on the final list was obtained and reviewed according to the inclusion criteria. All papers were reviewed independently by two members of the study team.

Data extraction

Two data extraction forms were developed, one for papers which described the use of a TT and the other for studies concerned with the testing or development of TTs. The design of these forms was informed by published methodological standards and checklists [10, 11]. All data extraction was checked by a second reviewer.

Data synthesis and quality assessment

A broad overview of both types of included papers was conducted. Key elements considered were hospital setting, characteristics of patients, type, purpose and origin of TT, physiological parameters included, scoring system/trigger thresholds, frequency of completion and nature of response. For the studies concerned with the development/testing of TT, additional elements were taken into account. These included the study design, methodological quality, number of patients, outcomes measured, completeness of follow-up and estimates of diagnostic accuracy.

TTs were classified as: single-parameter systems – periodic observation of selected vital signs which are compared to a simple set of criteria with predefined thresholds, with a response algorithm being activated when any criterion is met; multiple-parameter systems – where the response algorithm involves more than one criterion being met or differs according to the number of criteria met; aggregate weighted scoring systems – where weighted scores are assigned to physiological values and compared to predefined trigger thresholds; or combination systems – involving single- or multiple-parameter systems in combination with aggregate weighted scoring systems.

Research studies were assessed against the methodological quality standards described by Laupacis et al. [10] and validity criteria for clinical decision rules defined by McGinn et al. [12].

Evaluation of available data

Data sources and quality

Primary collection of TT data did not fall within the scope and resources available to this study. We therefore sought to take advantage of data available to us from existing sources. All acute National Health Service (NHS) hospitals in England with critical care facilities were contacted by post or e-mail, and a follow-up letter was sent to non-responders. Data were also sought through study members, their contacts, and authors of published studies.

Criteria for assessing the coverage and accuracy of the TT datasets were developed based on those used by the Directory of Clinical Databases (DoCDat) [13, 14] and the QUADAS tool for evaluation of studies of diagnostic accuracy [15] (see ESM). All the datasets received were assessed according to the criteria.

Inclusion criteria

Datasets were excluded from the analysis if: there was no clear definition of the criteria used for inclusion and exclusion; the dataset did not include the minimum outcomes of admission to critical care or death; or fewer than half of variables were at least 95% complete.

Logic, range and consistency checks were applied to each variable used in the analysis. Illogical values, values outside the maximum possible range and inconsistent data were removed. Patients with missing summary scores were treated as not triggered if there were no raw physiological values or scores recorded in the dataset.

Data were excluded from the analyses based on the following criteria: patients aged less than 12; anonymous unique patient identifier and date of admission to hospital both missing; composite outcome could not be identified.

Methodology

The main outcome of the cohort study was the presence of established critical illness, defined as the composite of death, admission to critical care, ‘do not attempt resuscitation’ (DNAR) or cardiopulmonary resuscitation (CPR). For TTs with graded responses, a trigger event was defined as any response involving informing a more experienced member of staff. Responses resulting in, for example, increasing the frequency of observations were not included as trigger events.

For datasets of patients seen by CCOS, two groups of patients – critical care follow-up patients and referrals from the ward (any patients causing concern or who triggered) – were analysed separately if they could be identified from among all patients seen by the CCOS, as sensitivities, specificities and negative predictive values can be calculated for follow-up patients whereas only positive predictive values can be calculated for referrals. If patients had more than one outreach episode during their hospital stay, only the data from the first episode were analysed.

Statistical analysis

For each TT, primary assessment was by sensitivity (proportion of patients with established critical illness who triggered) and positive predictive value (proportion of triggered patients with established critical illness), secondary assessment by specificity (proportion of patients without established critical illness who did not trigger) and negative predictive value (proportion of not-triggered patients without established critical illness). Where possible, receiver operating characteristic (ROC) curves were plotted.

Important confounding variables were taken into account by repeating the analyses in subgroups defined by age (12–17, 18–49, 50–69, 70–79, 80+ years), ward (surgical and medical) and specialty (trauma and orthopaedics, vascular surgery, general surgery, medicine, obstetrics and gynaecology, and neurosurgery).

Heterogeneity among the datasets was evaluated with the Q-statistic for the log diagnostic odds ratio [16] and quantified with the H-statistic [17]. A random-effects meta-regression was used to explore the degree to which the heterogeneity could be explained by the physiological parameters included in each TT, the outcome variables recorded in each dataset, and the inclusion of critical care follow-up versus all ward or medical admissions unit (MAU) patients.

The datasets were randomly assigned letters of the alphabet (hospital A, hospital B, etc.) for anonymous presentation of the results. Statistical analyses were performed using Stata 8.2 (StataCorp LP, College Station, TX, USA).

Results

Systematic review

The literature searches identified 36 papers (Fig. 1), of which five were research studies concerning the development or testing of TTs [18, 19, 20, 21, 22]. In four of these studies, a description of how the TT was used was also provided [18, 19, 20, 21]. Therefore, in total, detailed descriptions of the use of a TT were available for 35 of the 36 papers, providing details of 25 distinct TTs (Table 1).
Fig. 1

Summary of systematic review profile

Table 1

Overview of papers describing physiological track and trigger early warning systems

System

Papers

Country

Setting

Parameters

             
    

Number

Heart rate

Resp rate

Blood pressure

Temperature

Urine

O2 saturation

Consciousness

Concern

Other

             

Single-parameter systems

                          

MET calling criteria (1)

Bellomo et al. 2003 [23], 2004 [24]

Australia

All wards/surgical wards

7

 

              

MET calling criteria (2)

Crispin & Daffurn [25], Hillman et al. 1996 [26], 2001 [27], 2003 [28], Hourihan et al. [29], Lee et al. 1998 [30], Parr et al. [31], Bristow et al. [32]

Australia

All wards/critical care areas and recovery/entire hospital

9

   

Airway threatened; cardiac arrest; pulmonary arrest; Repeated/ prolonged seizures

             

MET calling criteria (3)

Lee et al. 1995 [4]

Australia

All wards, critical care areas and ED

32

 

 

Base excess; blood sugar; pH; potassium; sodium; plus 21 specific events

             

MET calling criteria (4)

Buist et al. 2004 [18]

Australia

Selected general wards

6

  

 

Seizures

             

MET calling criteria (5)

Buist et al. 2002 [33]

Australia

Entire hospital

14

  

Agitation/delirium; airway threatened; difficulty speaking; failure to respond to treatment; repeated/prolonged seizures; respiratory distress; unable to get prompt assistance; uncontrolled pain

             

MET calling criteria (6)

Cioffi et al. [34]

Australia

Not reported

5

   

              

MET calling criteria (7)

Daly et al. [35]

Australia

Entire hospital (except theatre, recovery and ED)

6

  

   

 

Active seizures; cardiac chest pain; cardiopulmonary arrest; severe respiratory distress

             

MET calling criteria (8)

DeVita et al. [36]

US

Not reported

12

  

 

Colour change; pain; respiratory difficulty; suicide attempt; uncontrolled bleeding; unexplained agitation

             

MET calling criteria (9)

Salamonson et al. [37]

Australia

All wards, critical care areas, ED and theatres

9

  

Airway threatened; repeated/prolonged seizures; respiratory arrest

             

Medical crisis response team Condition C calling criteria

Foraida et al. [38]

US

Entire hospital

19

  

 

Bleeding into airway; breathing difficulty; colour change; lethargy/difficulty walking; naxolone use without response; pain; seizure; sudden collapse; sudden loss of movement; suicide attempt; trauma/chest pain/stroke; uncontrolled bleeding; unexplained agitation

             

PERT calling criteria

Hartin et al. [39]

England

Not reported

8

 

Repeated hypoglycaemia

             

Trauma team calling criteria (1)

Sugrue et al. [40]

Australia

ED

20

 

   

 

Seventeen trauma-specific criteria

             

Trauma team calling criteria (2)

Dodek et al. [41]

Canada

ED

15

   

 

Eleven trauma-specific criteria

             

Multiple-parameter systems

                          

PART calling criteria

Goldhill et al. [19], Goldhill [42]

England

All wards

7

 

 

Not fully alert and oriented

             

Aggregate scoring systems

                          

MEWS (1)

Subbe et al. 2001 [21], 2003 [44]

Wales

Medical admissions unit

5

  

               

MEWS (2)

Odell et al. [45]

England

Surgical wards

5

 

 

               

MEWS (3)

Carberry [46]

Scotland

Selected surgical wards

6

 

               

Derby MEWS

Day [47]

England

Selected surgical wards and surgical day unit

6

 

               

Modified MEWS

Pittard [48]

England

Selected surgical wards and surgical HDU

7

 

 

Respiratory support/oxygen therapy

             

PARS (1)

Fox & Rivers [49]

England

Surgical and orthopaedic wards

6

 

               

PARS (2)

Priestley et al. [50]

England

Selected wards

5

 

 

               

Lewisham PAR-T

Sterling & Barrera Groba [51]

England

Selected wards

8

 

Pain

             

Lewisham EWS

Welch [52]

England

Not reported

8

 

Pain

             

MET activation criteria

Hodgetts et al. [20]

England

Not reported

21

Base excess; creatinine; Hb; PaCO2; PaO2; pH; potassium; sodium; urea; AAA pain; chest pain; shortness of breath

             

Combination systems

                          

EWSS

Sharpley [54]

England

Selected wards

6

 

               

MET, Medical Emergency Team; PERT, Patient Emergency Response Team; PARS, Patient At Risk Score; PART, Patient At Risk Team; MEWS, Modified Early Warning Score; PAR-T, Patient At Risk Trigger; EWS, Early Warning Score; EWSS, Early Warning Scoring System

Twenty-one papers described 13 single-parameter systems [4, 18, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41]. Nine of these were variations of the MET calling criteria, all in Australian settings except one (USA). The designated response to a trigger was to call the MET, which in many Australian hospitals had replaced the cardiac arrest/resuscitation team. The other TTs were described as medical crisis response team ‘Condition C’ calling criteria (US), Patient Emergency Response Team (PERT) calling criteria (UK), and two versions of trauma team calling criteria (Australia/Canada). Most of these calling criteria were in use in a wide range of clinical areas. The single-parameter systems incorporated between 4 and 11 physiological parameters and often included additional calling criteria relating to specific events, e.g. cardiopulmonary arrest and seizures. A number of the systems also included explicit instructions to put out a call for any patients ‘causing concern’. All systems included some measure of blood pressure and of consciousness, and most included heart rate and respiratory rate. Most physiological parameters were triggered at specific thresholds, which varied considerably between systems, e.g. from 110 min−1 to 160 min−1 for tachycardia. However, one system used exclusively subjective triggers, e.g. ‘rapidly deteriorating blood pressure’ [35]. Limited information was given on the origins of the single-parameter systems.

Two papers described one multiple-parameter system, the Patient At Risk Team (PART) calling criteria, developed and used in a UK hospital setting [19, 42]. This included a graded response depending on the number of criteria triggered. Thresholds for triggering a response varied accordingly. The criteria in the multiple-parameter system were based on values from a research study [43].

Eleven papers described 10 aggregate scoring systems [20, 21, 44, 45, 46, 47, 48, 49, 50, 51, 52]. All of the aggregate scoring systems were used in UK hospital settings. All systems included heart rate, respiratory rate, systolic blood pressure and a measure of consciousness, usually AVPU (alert/voice/pain/unresponsive). All but one included urine output, and all but three included temperature. Four systems included oxygen saturation and one of these additionally allocated points for respiratory support and oxygen therapy. Three systems allocated points for pain. One of the systems was extremely complex, incorporating 17 physiological parameters, many of which are not routinely recorded on the ward [20]. This system was also the only aggregate scoring system to explicitly allocate points to patients ‘causing concern’. The nature of the response triggered when the score passed a predefined threshold varied and did not necessarily result in a call to the CCOS. This variation was due in part to the available resources within individual hospitals. Few of these TTs were in widespread use across all hospital areas. Most aggregate scoring systems appear to be based on local modifications of either the original Early Warning Score (EWS), developed by Morgan et al. [5], or a later modification of this by Stenhouse et al. [53].

One paper described a combination system, the Early Warning Scoring System (EWSS), used in a UK hospital setting [54]. This system includes an aggregate score, but also triggers a response if any individual parameter is scored at the highest level.

Of the five papers concerning the development or testing of TTs, one derived and validated a scoring system by a stratified case–cohort design [20], and four tested or validated previously derived TTs by cohort study designs [18, 21, 22] or by a case–control study [19]. None of the studies met all methodological quality standards [10], although the outcomes in all five studies were clearly defined. The reporting of diagnostic accuracy was variable (Table 2), and no studies provided a measure of variability around estimates of diagnostic accuracy or described the reproducibility of the individual predictor variables or of the TTs themselves.
Table 2

Overview of papers reporting diagnostic accuracy of physiological track and trigger early warning systems

System

Papers

Patients

Outcomes measured

Diagnostic accuracy

    

Single-parameter systems

        

MET calling criteria (4)

Buist et al. 2004 [18]

6303

Hospital mortality

PPV was 16.2%, but 88.2% with four or more abnormal observations. Sens/Spec, NPV and ROC curve not reported.

    

Multiple-parameter systems

        

PART calling criteria

Goldhill et al. [19]

63

ICU admission

Sens/Spec were 97%/18%, 80%/41% and 27%/67% for patients with at least one, two and three abnormal observations, respectively. PPV, NPV and ROC curve not reported.

    
 

Goldhill & McNarry [22]

548

Thirty-day mortality

Sens/Spec were 7.7%/99.8%; PPV 66.7%. NPV and ROC curve not reported.

    

Aggregate scoring systems

        

MEWS (1)

Subbe et al. 2001 [21]

709

ICU and HDU admission; CPR; 60-day mortality; composite of above

ROC curve on composite endpoint only.

    

MET activation criteria

Hodgetts et al. [20]

250

CPR

Sens/Spec were 100%/17%, 98%/36%, 94%/61%, 89%/77%, 86%/89%, 84%/96% and 52%/99% for scores of 1, 2, 3, 4, 5, 7 and 8, respectively. ROC curve reported. PPV and NPV not reported.

    

MET, Medical Emergency Team; PART, Patient At Risk Team; MEWS, Modified Early Warning Score; Sens/Spec, sensitivity/specificity; PPV, positive predictive value; NPV, negative predictive value; ROC, receiver operating characteristic

None of the TTs achieved the requirements of a level 1 clinical decision rule – a rule that has been validated for use in a wide variety of settings with confidence that it can change clinical behaviour and improve patient outcomes [12]. In particular, the PART calling criteria [19, 22] were found to be poor predictors of mortality or admission to critical care and are likely to result in inappropriate activation of the CCOS.

Evaluation of available data

Twenty-seven datasets were received, representing 30 hospitals in England and one in Wales (Fig. 2). Of these, 12 did not meet the quality criteria and were excluded from the study.
Fig. 2

Flow chart of data received for the evaluation of available data

All TTs in the 15 datasets included in the study were different, having been modified according to local needs (Table 3). The TTs were broadly similar to those identified from UK centres in the systematic review, but only one was identified in the systematic review. There were ten aggregate scoring systems, one single-parameter system and four combination systems. All TTs included heart rate, respiratory rate, systolic blood pressure and level of consciousness, but they varied in terms of the choice of other physiological parameters, assignment of scores to physiological values, and trigger thresholds. Response algorithms also varied considerably. Many of the systems used a graded response incorporating different responses at different thresholds, typically increasing the frequency of observations at a relatively low threshold, informing the nurse in charge or junior doctor at an intermediate threshold, and informing the CCOS or senior doctor at a higher threshold. Only five of the response algorithms, as reported, explicitly stated that further help should be sought for any patient causing concern. The datasets varied widely in sample size, period of data collection, patient characteristics and variables recorded. Variations also existed in the physiological measurements and outcomes (Table 4). As would be expected, the datasets including all patients on a ward or MAU exhibited less extreme physiology and considerably lower levels of unfavourable outcomes than those consisting of patients attended by a CCOS.
Table 3

Summary of available datasets and physiological track and trigger systems

Hospital

System

Data collection period

Setting

Patients

Parameters

              
     

Number

Heart rate

Resp rate

Blood pressure

Temperature

Urine

O2 saturation

Consciousness

Concern

Other

              

A

Combination system

2001–2002

CCOS referrals and follow-up

946

7

                

B

Aggregate scoring system

Jan–Aug 2004

CCOS referrals and follow-up

471

5

 

 

                

C

Aggregate scoring system

Apr–Sep 2004

CCOS referrals

405

6

  

               

D

Combination system

2002–2004

CCOS referrals and follow-up

2371

6

 

                

E

Aggregate scoring system

2003–2004

CCOS referrals and follow-up

3266

6

 

                

F

Aggregate scoring system

Jan–Nov 2004

CCOS referrals and follow-up

330

6

 

                

G

Aggregate scoring system

Aug–Oct 2003

Medical admissions unit

750

5

  

                

H

Aggregate scoring system

2002–2003

CCOS referrals

1051

8

 

Respiratory support

              

I

Aggregate scoring system

2001–2004

CCOS referrals

2463

8

 

Level of oxygen

              

J

Combination system

2003–2004

CCOS referrals and follow-up

1964

6

  

               

K

Aggregate scoring system

Jan–Nov 2004

CCOS referrals and follow-up

380

7

 

 

Respiratory support

              

L

Aggregate scoring system

2002–2004

CCOS referrals and follow-up

339

6

 

                

M

Aggregate scoring system

Mar 2000, Feb–Mar 2001

All ward patients

2321

5

  

                

N

Combination system

2001–2004

CCOS referrals and follow-up

2548

6

 

               

O

Single-parameter system

2002–2004

CCOS referrals and follow-up

592

8

 

Bicarbonate; level of oxygen; PaO2; pH

              

CCOS, Critical Care Outreach Service

Table 4

Summary of core physiological parameters and outcomes in the available data

Hospital

Physiological measurements, mean (SD)

Outcome complete, n (%)

Outcomes, n (%)

          
 

Heart rate

Resp. rate

Systolic BP

Temperature

 

CPR

DNAR

Critical care

Death

Composite

          

A

90.1 (20.0)

21.9 (7.4)

130.8 (24.4)

36.9 (0.8)

946 (100)

  

118 (12.5)

61 (6.5)*

179 (18.9)

          

B

    

471 (100)

 

45 (9.6)

45 (9.6)

14 (3.0)

104 (22.1)

          

C

    

405 (100)

48 (11.9)

23 (5.7)

36 (8.9)

 

107 (26.4)

          

D

    

2371 (100)

187 (7.9)

 

218 (9.2)

73 (3.1)

478 (20.2)

          

E

88.2 (19.4)

20.7 (6.5)

129.0 (25.2)

36.7 (0.8)

3000 (91.9)

 

229 (7.6)

235 (7.8)

52 (1.7)

516 (17.2)

          

F

    

328 (94.0)

 

17 (5.2)

55 (16.8)

9 (2.7)

81 (24.7)

          

G

85.2 (19.5)

19.3 (4.9)

141.4 (32.2)

36.7 (0.8)

750 (100)

4 (0.5)

 

4 (0.5)

35 (4.7)

43 (5.7)

          

H

    

960 (95.0)

 

72 (7.5)

230 (24.0)

50 (5.2)

352 (36.7)

          

I

99.7 (25.9)

26.2 (8.4)

119.9 (31.0)

37.0 (1.0)

2460 (99.9)

145 (5.9)

57(2.3)

1385 (56.57)

1 (0.04)

1592 (64.7)

          

J

89.5 (19.8)

19.5 (8.0)

121.4 (39.6)

36.8 (0.8)

1929 (98.2)

26 (1.4)

128 (6.6)

147 (7.6)

47 (2.4)

348 (18.0)

          

K

    

377 (99.2)

 

10 (2.7)

29 (7.7)

10 (2.7)

49 (13.0)

          

L

104.5 (21.9)

24.7 (7.3)

116.4 (29.9)

36.8 (0.9)

333 (98.2)

 

44 (13.2)

106 (31.8)

8 (2.4)

158 (47.5)

          

M

86.1 (20.5)

20.1 (5.5)

139.1 (27.0)

36.6 (0.8)

2321 (100)

42 (1.8)

37 (1.6)

87 (3.8)

120 (5.2)

286 (12.3)

          

N

    

2515 (98.8)

 

472 (18.8)

241 (9.6)

47 (1.9)

761 (30.2)

          

O

103.3 (25.3)

25.1 (7.9)

115.7 (31.7)

36.9 (1.1)

582 (98.3)

 

108 (18.6)

189(32.5)

9 (1.6)

306 (52.6)

          

*“Treatment limit” including decision that critical care was not appropriate and death while under review by outreach; SD, standard deviation; BP, blood pressure; CPR, cardiopulmonary resuscitation; DNAR, do not attempt resuscitation

The diagnostic accuracy of the TTs varied widely (see ESM). Sensitivities and positive predictive values were low with median (quartiles) values of 43.3 (25.4–69.2) and 36.7 (29.3–43.8), respectively. Specificities and negative predictive values were generally acceptable, with median (quartiles) values of 89.5 (64.2–95.7) and 94.3 (89.5–97.0), respectively, although these were considered to be of secondary importance. The area under the ROC curve varied from 0.61 to 0.84 (see ESM for individual ROC plots). Within hospitals, there were some differences in the discrimination of TTs in different age groups, wards and specialties, but these were not consistent across hospitals.

Twelve datasets including critical care follow-up or all ward/MAU patients were identified, of which 11 datasets were included in the meta-regression and one dataset was dropped since all patients experiencing the composite outcome triggered. There was strong evidence of heterogeneity across datasets in the diagnostic accuracy (Q = 38.3 on 10 degrees of freedom, p< 0.001; H = 2.0, 95% confidence interval 1.5–2.6). Differences in diagnostic accuracy among the datasets were not explained by the physiological parameters included in the TT, the outcome variables recorded in the dataset, or the inclusion of critical care follow-up versus all ward/MAU patients. Fig. 3 shows the summary ROC curve. See the ESM for forest plot and full results of the meta-regression.
Fig. 3

Summary receiver operator characteristic (ROC) curve for composite outcome in critical care follow-up and Medical Admissions Unit patients. Each circle represents one dataset. The area of each circle is inversely proportional to the variance of the log diagnostic odds ratio. The fitted line shows the summary ROC curve. The area under the summary ROC curve  =  0.73, representing acceptable discrimination; however, most datasets are towards the low end of the curve, indicating unacceptably low sensitivity and suggesting trigger thresholds are too high

Discussion

The review has shown that there is a variety of published TTs in use with little rigorous evidence of their validity. Although this review only considered published systems, there are many more TTs being used in a variety of hospital settings, which have not been published, as demonstrated by the evaluation of available data. The evaluation found that sensitivities and positive predictive values were unacceptably low, although specificities and negative predictive values were generally acceptable. The low sensitivity may be due in part to rapidly deteriorating patients, especially in the context of acute myocardial events, and infrequent and non-standardised measurement of physiology. The summary ROC curve from the evaluation of available data indicates that the differences between TTs may largely reflect differing trigger thresholds. Sensitivities could potentially be improved at the cost of decreased specificity, and consequently increased workload, by reducing the trigger threshold.

At present, no TTs meet the requirements for a level 1 clinical decision rule. To meet these requirements, existing TTs would require further validation in different populations and settings. Alternatively, it may be that the current situation of multiple systems developed to meet local needs is the ideal; a level 1 clinical decision rule may not exist.

The potential benefits of using any TT can only be realised if physiological parameters are accurately measured and recorded. No assumptions about the quality of routine observations or chart design, or the effect of introducing a TT on these, should be made on the basis of this study. In addition, this study was not designed to directly assess the impact of introducing a TT on patient outcomes. TTs are usually introduced in combination with a CCOS, MET or similar.

Systematic review

This is the first systematic review of the literature on TTs used by CCOS or equivalent. This review confirms that most published work regarding TTs has been associated with either MET in Australia or CCOS in the UK, with a small body of work identified from North America. However, similar teams are now emerging in a number of countries in Europe, including Sweden [55, 56], the Netherlands [57], Portugal [58] and Italy [59]. As these teams become more widespread around the world, it is essential to consider the best evidence regarding systems to identify at risk patients on the ward.

The best methods to develop TTs should combine clinical judgement and statistical analysis [12]. There was only one published study which derived a TT using recognised statistical techniques to select the most powerful predictors of outcome followed by further analysis to determine which predictors can be omitted from the TT without loss of predictive power [20]. Other TTs appear to have been derived using less rigorous methods, based on clinical judgement alone, single-centre audit of antecedents to intensive care admission or evidence of antecedents of cardiac arrest from observational studies, without the application of these statistical techniques.

Analysis of available data

All acute NHS hospitals in England were invited to contribute data for the cohort study. The datasets received represented a wide variety of TTs. However, data were received from only 31 of the 92 hospitals that indicated they collected data, which may limit the representativeness of the results. As the study was based on existing datasets there was no direct control over data quality. We addressed this by establishing a system of quality criteria and excluding datasets that did not meet certain important criteria.

The ideal outcome for this analysis would be the potential to benefit from some kind of intervention above what is usually available on the ward, but this cannot be measured. We therefore used a surrogate composite outcome, chosen to reflect the presence of established critical illness. Not all components of the composite outcomes were recorded in every dataset, which may introduce some bias. Response time (the time from the trigger event to the response, e.g. arrival of the CCOS) and lead time (the time from the response to when treatment would otherwise have occurred) are also important factors in how well a TT performs, but the data were insufficient to evaluate these. Due to the wide variations in the characteristics of patients, response algorithms and data collection, we were unable to make direct comparisons between the different TTs to establish the best existing TT, or to develop a new high-quality TT for timely recognition of critically ill patients.

Implications for practice

Despite the lack of rigorous testing of the published TTs, and the poor sensitivity in the evaluation of available data, this study does not constitute sufficient evidence that use of existing TTs should be discontinued. For hospitals considering the introduction of a TT, this should be done in the light of the most up-to-date evidence relating to the validity and reliability of existing instruments. This evidence is continuing to emerge. In the absence of evidence for a level 1 clinical decision rule, hospitals with a poorly performing TT and those considering introduction of a TT may do well to seek a system that is suited to their local needs.

Suggestions for future research

If the goal is to develop a level 1 clinical decision rule, then larger prospective studies are required in a variety of clinical settings. These may involve regular timed recordings of routine physiological data from all subjects, and timed recordings of all important outcomes, including mortality, cardiac arrest, admission to critical care, and DNAR, as well as underlying diagnoses and an evaluation of the potential reversibility of a condition.

In addition, further work is needed to validate the TTs in use in their current settings. Hospitals seeking a system suited to their local needs should consider not only measures of diagnostic accuracy, but also reproducibility (inter- and intra-rater reliability) and ease of use in practice, including time to complete and acceptability to patients and staff.

Conclusions

It appears, on the basis of existing publications and available data, that many hospitals have developed their own TTs. Evidence of the reliability, validity and utility of these systems is lacking. The low sensitivity of existing TTs means that a high number of patients requiring intervention are likely to be missed if the ward staff relies solely on these systems for identifying deteriorating patients; they should therefore be used as an adjunct to clinical judgment. It may be possible to increase the sensitivity, at the cost of increased workload, by reducing trigger thresholds.

Acknowledgements

This study was funded by the UK National Health Service Research & Development Service Delivery & Organisation programme (SDO/74/2004). We thank the 31 hospitals that provided data for this analysis: Alex Larkin, Royal Oldham Hospital; Carol Tune, Royal Shrewsbury Hospital; Chris Subbe, Wrexham Maelor Hospital; Clare Bamforth, Dewsbury & District Hospital; David Goldhill, Royal London Hospital; Elizabeth Hogbin, Norfolk & Norwich University Hospital; Jackie Hogan/Stephen Murray, North Manchester Hospital; Jane Chandler/Erin Povey, Wexham Park Hospital; Jane Saunders, Bradford Royal Infirmary; Jane Viner, Torbay Hospital; Kath Daly, Guy's and St. Thomas' Hospital; Kelly Henley, James Cook University Hospital; Lindsay Green/Samantha Fox, Good Hope Hospital; Lorna Johnson, Leeds General Infirmary and St. James's University Hospital; Louise Stock, University Hospital Lewisham; Mike Heap/Kate Bray, Northern General Hospital and Royal Hallamshire Hospital; Natasha Williamson/Claire Brown, Hinchingbrooke Hospital; Pat Eden & Lee Hubbard, Royal National Orthopaedic Hospital; Paul Seymour/David Watts, Bromley Hospital; Ruth Mullett/Karen Robins, Alexandra Hospital and Worcester Royal Hospital; Sally Smith, Kent & Sussex Hospital and Maidstone Hospital; Sarah Ingleby/Chris Booth, Manchester Royal Infirmary; Sheila Adam, University College London Hospitals; Valerie Forde, University Hospitals Coventry; Wendy Watson/Julie Southwell, York Hospital; Wendy Wharton/Peter Groom, Southampton General Hospital.

Supplementary material

134_2007_532_MOESM1_ESM.doc (336 kb)
Electronic Supplementary Material (DOC 337K)

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  • Haiyan Gao
    • 1
  • Ann McDonnell
    • 2
  • David A. Harrison
    • 1
  • Tracey Moore
    • 3
  • Sheila Adam
    • 4
  • Kathleen Daly
    • 5
  • Lisa Esmonde
    • 6
  • David R. Goldhill
    • 7
  • Gareth J. Parry
    • 8
  • Arash Rashidian
    • 9
  • Christian P Subbe
    • 10
  • Sheila Harvey
    • 1
  1. 1.Intensive Care National Audit & Research CentreLondon, WC1H 9HRUK
  2. 2.Faculty of Health and WellbeingSheffield Hallam UniversitySheffield, S10 2DRUK
  3. 3.School of Nursing and MidwiferyUniversity of SheffieldSheffield, S3 7NDUK
  4. 4.Intensive Care UnitUniversity College HospitalLondon, NW1 2BUUK
  5. 5.Intensive Care UnitSt. Thomas’ HospitalLondon, SE1 7EHUK
  6. 6.School of HealthcareUniversity of LeedsLeeds, LS2 9UTUK
  7. 7.AnaestheticsRoyal National Orthopaedic HospitalStanmore, Middlesex HA7 4LPUK
  8. 8.Quality Measurement & AnalysisChildren’s Hospital BostonBostonUSA
  9. 9.Department of Public Health and PolicyLondon School of Hygiene & Tropical MedicineLondon, WC1E 7HTUK
  10. 10.Thoracic MedicineWrexham Maelor HospitalWrexham, LL13 4TXUK

Personalised recommendations