Abstract
Metrics typically used to report the performance of an early warning score (EWS), such as the area under the receiver operator characteristic curve or C-statistic, are not useful for pre-implementation analyses. Because physiological deterioration has an extremely low prevalence of 0.02 per patient-day, these metrics can be misleading. We discuss the statistical reasoning behind this statement and present a novel alternative metric more adequate to operationalize an EWS. We suggest that pre-implementation evaluation of EWSs should include at least two metrics: sensitivity; and either the positive predictive value, number needed to evaluate, or estimated rate of alerts. We also argue the importance of reporting each individual cutoff value.
Similar content being viewed by others
Abbreviations
- AUROC:
-
Area under the receiver operating characteristic
- ELISA:
-
Enzyme-linked immunosorbent assay
- EWS:
-
Early warning score
- NEWS:
-
National Early Warning Score
- NNE:
-
Number needed to evaluate
- PPV:
-
Positive predictive value
- ROC:
-
Receiver operating characteristic
- WTD:
-
Workup to detection
References
Smith GB, Prytherch DR, Meredith P, Schmidt PE, Featherstone PI. The ability of the National Early Warning Score (NEWS) to discriminate patients at risk of early cardiac arrest, unanticipated intensive care unit admission, and death. Resuscitation. 2013;84:465–70.
Tirkkonen J, Olkkola KT, Huhtala H, Tenhunen J, Hoppu S. Medical emergency team activation: performance of conventional dichotomised criteria versus national early warning score. Acta Anaesth Scand. 2014;58:411–9.
Acutely ill patients in hospital: recognition of and response to acute illness in adults in hospital. National Institute for Health and Care Excellence. 2007. www.nice.org.uk/guidance/cg50. Accessed 02 Jul 2015.
National Early Warning Score (NEWS): standardising the assessment of acute-illness severity in the NHS—report of a working party. Royal College of Physicians of London. 2012. www.rcplondon.ac.uk/resources/national-early-warning-score-news. Accessed 02 Jul 2015.
D’Cruz R, Rubulotta F. Implementation of the National Early Warning Score in a teaching hospital [Abstract 0567]. Intensive Care Med. 2014;40 Suppl:160.
Gleeson L, Reynolds O, O’Connor P, Byrne D. Attitudes of doctors and nurses to the National Early Warning Score System. Irish J Med Sci. 2014;183 Suppl 4:193.
Jones M. NEWSDIG: The National Early Warning Score Development and Implementation Group. Clin Med. 2012;12:501–3.
Romero-Brufau S, Huddleston JM, Naessens JM, Johnson MG, Hickman J, Morlan BW, et al. Widely used track and trigger scores: are they ready for automation in practice? Resuscitation. 2014;85:549–52.
Escobar GJ, LaGuardia JC, Turk BJ, Ragins A, Kipnis P, Draper D. Early detection of impending physiologic deterioration among patients who are not in intensive care: development of predictive models using data from an automated electronic medical record. J Hosp Med. 2012;7:388–95.
Donlon J, Levy H, Scriver C. Hyperphenylalaninemia: phenylalanine hydroxylase deficiency. In: Scriver CEA, editor. The metabolic and molecular bases of inherited disease. New York: McGraw-Hill; 2004.
Dodd RY, Notari EP, Stramer SL. Current prevalence and incidence of infectious disease markers and estimated window-period risk in the American Red Cross blood donor population. Transfusion. 2002;42:975–9.
Burkhardt U, Mertens T, Eggers HJ. Comparison of two commercially available anti-HIV ELISAs: Abbott HTLV III EIA and Du Pont HTLV III-ELISA. J Med Virol. 1987;23:217–24.
Stetler HC, Granade TC, Nunez CA, Meza R, Terrell S, Amador L, et al. Field evaluation of rapid HIV serologic tests for screening and confirming HIV-1 infection in Honduras. Aids. 1997;11:369–75.
McAlpine L, Gandhi J, Parry JV, Mortimer PP. Thirteen current anti-HIV-1/HIV-2 enzyme immunoassays: how accurate are they? J Med Virol. 1994;42:115–8.
Smith GB, Prytherch DR, Schmidt PE, Featherstone PI. Review and performance evaluation of aggregate weighted “track and trigger” systems. Resuscitation. 2008;77:170–9.
Smith GB, Prytherch DR, Schmidt PE, Featherstone PI, Higgins B. A review, and performance evaluation, of single-parameter “track and trigger” systems. Resuscitation. 2008;79:11–21.
Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115:928–35.
Graham KC, Cvach M. Monitor alarm fatigue: standardizing use of physiological monitors and decreasing nuisance alarms. Am J Crit Care. 2010;19:28–34. quiz 35.
Hannibal GB. Monitor alarms and alarm fatigue. AACN Adv Crit Care. 2011;22:418–20.
Early warning systems: scorecards that save lives. Institute for Healthcare Improvement. http://www.ihi.org/resources/Pages/ImprovementStories/EarlyWarningSystemsScorecardsThatSaveLives.aspx. Accessed 02 Jul 2015.
Jund J, Rabilloud M, Wallon M, Ecochard R. Methods to estimate the optimal threshold for normally or log-normally distributed biological tests. Med Decis Making. 2005;25:406–15.
Hand DJ, Whitrow C, Adams NM, Juszczak P, Weston D. Performance criteria for plastic card fraud detection tools. J Oper Res Soc. 2008;59:956–62.
Srikureja W, Kyulo NL, Runyon BA, Hu KQ. MELD score is a better prognostic model than Child–Turcotte–Pugh score or discriminant function score in patients with alcoholic hepatitis. J Hepatol. 2005;42:700–6.
Acknowledgments
GJE was supported by the Gordon and Betty Moore Foundation (grant titled “Early detection, prevention, and mitigation of impending physiologic deterioration in hospitalized patients outside intensive care: Phase 3, pilot”), the Permanente Medical Group, Inc., and Kaiser Foundation Hospitals, Inc.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
GJE provided the operational interpretations and expertise and revised the manuscript critically for content. JMH provided operational expertise and revised the manuscript critically for content. ML helped draft the manuscript and provided the statistical examples. S-RB conceived of the article and drafted the manuscript. All authors read and approved the manuscript.
Rights and permissions
About this article
Cite this article
Romero-Brufau, S., Huddleston, J.M., Escobar, G.J. et al. Why the C-statistic is not informative to evaluate early warning scores and what metrics to use. Crit Care 19, 285 (2015). https://doi.org/10.1186/s13054-015-0999-1
Published:
DOI: https://doi.org/10.1186/s13054-015-0999-1