Why the C-statistic is not informative to evaluate early warning scores and what metrics to use

Romero-Brufau, Santiago; Huddleston, Jeanne M.; Escobar, Gabriel J.; Liebow, Mark

doi:10.1186/s13054-015-0999-1

Why the C-statistic is not informative to evaluate early warning scores and what metrics to use

Viewpoint
Published: 01 December 2015

Volume 19, article number 285, (2015)
Cite this article

Critical Care Aims and scope Submit manuscript

Santiago Romero-Brufau ORCID: orcid.org/0000-0002-9922-0083^1,2,
Jeanne M. Huddleston^1,2,3,
Gabriel J. Escobar⁴ &
…
Mark Liebow⁵

7100 Accesses
92 Citations
11 Altmetric
Explore all metrics

Abstract

Metrics typically used to report the performance of an early warning score (EWS), such as the area under the receiver operator characteristic curve or C-statistic, are not useful for pre-implementation analyses. Because physiological deterioration has an extremely low prevalence of 0.02 per patient-day, these metrics can be misleading. We discuss the statistical reasoning behind this statement and present a novel alternative metric more adequate to operationalize an EWS. We suggest that pre-implementation evaluation of EWSs should include at least two metrics: sensitivity; and either the positive predictive value, number needed to evaluate, or estimated rate of alerts. We also argue the importance of reporting each individual cutoff value.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The effectiveness of physiologically based early warning or track and trigger systems after triage in adult patients presenting to emergency departments: a systematic review

Article Open access 06 December 2017

Early warning score validation methodologies and performance metrics: a systematic review

Article Open access 18 June 2020

Economics of Early Warning Scores for identifying clinical deterioration—a systematic review

Article 03 June 2017

Abbreviations

AUROC:: Area under the receiver operating characteristic
ELISA:: Enzyme-linked immunosorbent assay
EWS:: Early warning score
NEWS:: National Early Warning Score
NNE:: Number needed to evaluate
PPV:: Positive predictive value
ROC:: Receiver operating characteristic
WTD:: Workup to detection

References

Smith GB, Prytherch DR, Meredith P, Schmidt PE, Featherstone PI. The ability of the National Early Warning Score (NEWS) to discriminate patients at risk of early cardiac arrest, unanticipated intensive care unit admission, and death. Resuscitation. 2013;84:465–70.
Article PubMed Google Scholar
Tirkkonen J, Olkkola KT, Huhtala H, Tenhunen J, Hoppu S. Medical emergency team activation: performance of conventional dichotomised criteria versus national early warning score. Acta Anaesth Scand. 2014;58:411–9.
Article PubMed Google Scholar
Acutely ill patients in hospital: recognition of and response to acute illness in adults in hospital. National Institute for Health and Care Excellence. 2007. www.nice.org.uk/guidance/cg50. Accessed 02 Jul 2015.
National Early Warning Score (NEWS): standardising the assessment of acute-illness severity in the NHS—report of a working party. Royal College of Physicians of London. 2012. www.rcplondon.ac.uk/resources/national-early-warning-score-news. Accessed 02 Jul 2015.
D’Cruz R, Rubulotta F. Implementation of the National Early Warning Score in a teaching hospital [Abstract 0567]. Intensive Care Med. 2014;40 Suppl:160.
Google Scholar
Gleeson L, Reynolds O, O’Connor P, Byrne D. Attitudes of doctors and nurses to the National Early Warning Score System. Irish J Med Sci. 2014;183 Suppl 4:193.
Google Scholar
Jones M. NEWSDIG: The National Early Warning Score Development and Implementation Group. Clin Med. 2012;12:501–3.
Article Google Scholar
Romero-Brufau S, Huddleston JM, Naessens JM, Johnson MG, Hickman J, Morlan BW, et al. Widely used track and trigger scores: are they ready for automation in practice? Resuscitation. 2014;85:549–52.
Article PubMed Google Scholar
Escobar GJ, LaGuardia JC, Turk BJ, Ragins A, Kipnis P, Draper D. Early detection of impending physiologic deterioration among patients who are not in intensive care: development of predictive models using data from an automated electronic medical record. J Hosp Med. 2012;7:388–95.
Article PubMed Google Scholar
Donlon J, Levy H, Scriver C. Hyperphenylalaninemia: phenylalanine hydroxylase deficiency. In: Scriver CEA, editor. The metabolic and molecular bases of inherited disease. New York: McGraw-Hill; 2004.
Google Scholar
Dodd RY, Notari EP, Stramer SL. Current prevalence and incidence of infectious disease markers and estimated window-period risk in the American Red Cross blood donor population. Transfusion. 2002;42:975–9.
Article CAS PubMed Google Scholar
Burkhardt U, Mertens T, Eggers HJ. Comparison of two commercially available anti-HIV ELISAs: Abbott HTLV III EIA and Du Pont HTLV III-ELISA. J Med Virol. 1987;23:217–24.
Article CAS PubMed Google Scholar
Stetler HC, Granade TC, Nunez CA, Meza R, Terrell S, Amador L, et al. Field evaluation of rapid HIV serologic tests for screening and confirming HIV-1 infection in Honduras. Aids. 1997;11:369–75.
Article CAS PubMed Google Scholar
McAlpine L, Gandhi J, Parry JV, Mortimer PP. Thirteen current anti-HIV-1/HIV-2 enzyme immunoassays: how accurate are they? J Med Virol. 1994;42:115–8.
Article CAS PubMed Google Scholar
Smith GB, Prytherch DR, Schmidt PE, Featherstone PI. Review and performance evaluation of aggregate weighted “track and trigger” systems. Resuscitation. 2008;77:170–9.
Article PubMed Google Scholar
Smith GB, Prytherch DR, Schmidt PE, Featherstone PI, Higgins B. A review, and performance evaluation, of single-parameter “track and trigger” systems. Resuscitation. 2008;79:11–21.
Article PubMed Google Scholar
Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115:928–35.
Article PubMed Google Scholar
Graham KC, Cvach M. Monitor alarm fatigue: standardizing use of physiological monitors and decreasing nuisance alarms. Am J Crit Care. 2010;19:28–34. quiz 35.
Article PubMed Google Scholar
Hannibal GB. Monitor alarms and alarm fatigue. AACN Adv Crit Care. 2011;22:418–20.
Article PubMed Google Scholar
Early warning systems: scorecards that save lives. Institute for Healthcare Improvement. http://www.ihi.org/resources/Pages/ImprovementStories/EarlyWarningSystemsScorecardsThatSaveLives.aspx. Accessed 02 Jul 2015.
Jund J, Rabilloud M, Wallon M, Ecochard R. Methods to estimate the optimal threshold for normally or log-normally distributed biological tests. Med Decis Making. 2005;25:406–15.
Article PubMed Google Scholar
Hand DJ, Whitrow C, Adams NM, Juszczak P, Weston D. Performance criteria for plastic card fraud detection tools. J Oper Res Soc. 2008;59:956–62.
Article Google Scholar
Srikureja W, Kyulo NL, Runyon BA, Hu KQ. MELD score is a better prognostic model than Child–Turcotte–Pugh score or discriminant function score in patients with alcoholic hepatitis. J Hepatol. 2005;42:700–6.
Article PubMed Google Scholar

Download references

Acknowledgments

GJE was supported by the Gordon and Betty Moore Foundation (grant titled “Early detection, prevention, and mitigation of impending physiologic deterioration in hospitalized patients outside intensive care: Phase 3, pilot”), the Permanente Medical Group, Inc., and Kaiser Foundation Hospitals, Inc.

Author information

Authors and Affiliations

Healthcare Systems Engineering Program, Mayo Clinic Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, 200 First Street SW, Rochester, MN, 55905, USA
Santiago Romero-Brufau & Jeanne M. Huddleston
Division of Health Care Policy and Research, Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
Santiago Romero-Brufau & Jeanne M. Huddleston
Division of Hospital Internal Medicine, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
Jeanne M. Huddleston
Kaiser Permanente Division of Research, 2000 Broadway Avenue, 032 R01, Oakland, CA, 94612, USA
Gabriel J. Escobar
Division of General Internal Medicine, Mayo Clinic College of Medicine, 200 First Street SW, Rochester, MN, 55905, USA
Mark Liebow

Authors

Santiago Romero-Brufau
View author publications
You can also search for this author in PubMed Google Scholar
Jeanne M. Huddleston
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel J. Escobar
View author publications
You can also search for this author in PubMed Google Scholar
Mark Liebow
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santiago Romero-Brufau.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

GJE provided the operational interpretations and expertise and revised the manuscript critically for content. JMH provided operational expertise and revised the manuscript critically for content. ML helped draft the manuscript and provided the statistical examples. S-RB conceived of the article and drafted the manuscript. All authors read and approved the manuscript.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Romero-Brufau, S., Huddleston, J.M., Escobar, G.J. et al. Why the C-statistic is not informative to evaluate early warning scores and what metrics to use. Crit Care 19, 285 (2015). https://doi.org/10.1186/s13054-015-0999-1

Download citation

Published: 01 December 2015
DOI: https://doi.org/10.1186/s13054-015-0999-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Why the C-statistic is not informative to evaluate early warning scores and what metrics to use

Abstract

Access this article

Similar content being viewed by others

The effectiveness of physiologically based early warning or track and trigger systems after triage in adult patients presenting to emergency departments: a systematic review

Early warning score validation methodologies and performance metrics: a systematic review

Economics of Early Warning Scores for identifying clinical deterioration—a systematic review

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Why the C-statistic is not informative to evaluate early warning scores and what metrics to use

Abstract

Access this article

Similar content being viewed by others

The effectiveness of physiologically based early warning or track and trigger systems after triage in adult patients presenting to emergency departments: a systematic review

Early warning score validation methodologies and performance metrics: a systematic review

Economics of Early Warning Scores for identifying clinical deterioration—a systematic review

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation