Skip to main content
Log in

Why the C-statistic is not informative to evaluate early warning scores and what metrics to use

  • Viewpoint
  • Published:
Critical Care Aims and scope Submit manuscript

Abstract

Metrics typically used to report the performance of an early warning score (EWS), such as the area under the receiver operator characteristic curve or C-statistic, are not useful for pre-implementation analyses. Because physiological deterioration has an extremely low prevalence of 0.02 per patient-day, these metrics can be misleading. We discuss the statistical reasoning behind this statement and present a novel alternative metric more adequate to operationalize an EWS. We suggest that pre-implementation evaluation of EWSs should include at least two metrics: sensitivity; and either the positive predictive value, number needed to evaluate, or estimated rate of alerts. We also argue the importance of reporting each individual cutoff value.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Abbreviations

AUROC:

Area under the receiver operating characteristic

ELISA:

Enzyme-linked immunosorbent assay

EWS:

Early warning score

NEWS:

National Early Warning Score

NNE:

Number needed to evaluate

PPV:

Positive predictive value

ROC:

Receiver operating characteristic

WTD:

Workup to detection

References

  1. Smith GB, Prytherch DR, Meredith P, Schmidt PE, Featherstone PI. The ability of the National Early Warning Score (NEWS) to discriminate patients at risk of early cardiac arrest, unanticipated intensive care unit admission, and death. Resuscitation. 2013;84:465–70.

    Article  PubMed  Google Scholar 

  2. Tirkkonen J, Olkkola KT, Huhtala H, Tenhunen J, Hoppu S. Medical emergency team activation: performance of conventional dichotomised criteria versus national early warning score. Acta Anaesth Scand. 2014;58:411–9.

    Article  PubMed  Google Scholar 

  3. Acutely ill patients in hospital: recognition of and response to acute illness in adults in hospital. National Institute for Health and Care Excellence. 2007. www.nice.org.uk/guidance/cg50. Accessed 02 Jul 2015.

  4. National Early Warning Score (NEWS): standardising the assessment of acute-illness severity in the NHS—report of a working party. Royal College of Physicians of London. 2012. www.rcplondon.ac.uk/resources/national-early-warning-score-news. Accessed 02 Jul 2015.

  5. D’Cruz R, Rubulotta F. Implementation of the National Early Warning Score in a teaching hospital [Abstract 0567]. Intensive Care Med. 2014;40 Suppl:160.

    Google Scholar 

  6. Gleeson L, Reynolds O, O’Connor P, Byrne D. Attitudes of doctors and nurses to the National Early Warning Score System. Irish J Med Sci. 2014;183 Suppl 4:193.

    Google Scholar 

  7. Jones M. NEWSDIG: The National Early Warning Score Development and Implementation Group. Clin Med. 2012;12:501–3.

    Article  Google Scholar 

  8. Romero-Brufau S, Huddleston JM, Naessens JM, Johnson MG, Hickman J, Morlan BW, et al. Widely used track and trigger scores: are they ready for automation in practice? Resuscitation. 2014;85:549–52.

    Article  PubMed  Google Scholar 

  9. Escobar GJ, LaGuardia JC, Turk BJ, Ragins A, Kipnis P, Draper D. Early detection of impending physiologic deterioration among patients who are not in intensive care: development of predictive models using data from an automated electronic medical record. J Hosp Med. 2012;7:388–95.

    Article  PubMed  Google Scholar 

  10. Donlon J, Levy H, Scriver C. Hyperphenylalaninemia: phenylalanine hydroxylase deficiency. In: Scriver CEA, editor. The metabolic and molecular bases of inherited disease. New York: McGraw-Hill; 2004.

    Google Scholar 

  11. Dodd RY, Notari EP, Stramer SL. Current prevalence and incidence of infectious disease markers and estimated window-period risk in the American Red Cross blood donor population. Transfusion. 2002;42:975–9.

    Article  CAS  PubMed  Google Scholar 

  12. Burkhardt U, Mertens T, Eggers HJ. Comparison of two commercially available anti-HIV ELISAs: Abbott HTLV III EIA and Du Pont HTLV III-ELISA. J Med Virol. 1987;23:217–24.

    Article  CAS  PubMed  Google Scholar 

  13. Stetler HC, Granade TC, Nunez CA, Meza R, Terrell S, Amador L, et al. Field evaluation of rapid HIV serologic tests for screening and confirming HIV-1 infection in Honduras. Aids. 1997;11:369–75.

    Article  CAS  PubMed  Google Scholar 

  14. McAlpine L, Gandhi J, Parry JV, Mortimer PP. Thirteen current anti-HIV-1/HIV-2 enzyme immunoassays: how accurate are they? J Med Virol. 1994;42:115–8.

    Article  CAS  PubMed  Google Scholar 

  15. Smith GB, Prytherch DR, Schmidt PE, Featherstone PI. Review and performance evaluation of aggregate weighted “track and trigger” systems. Resuscitation. 2008;77:170–9.

    Article  PubMed  Google Scholar 

  16. Smith GB, Prytherch DR, Schmidt PE, Featherstone PI, Higgins B. A review, and performance evaluation, of single-parameter “track and trigger” systems. Resuscitation. 2008;79:11–21.

    Article  PubMed  Google Scholar 

  17. Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115:928–35.

    Article  PubMed  Google Scholar 

  18. Graham KC, Cvach M. Monitor alarm fatigue: standardizing use of physiological monitors and decreasing nuisance alarms. Am J Crit Care. 2010;19:28–34. quiz 35.

    Article  PubMed  Google Scholar 

  19. Hannibal GB. Monitor alarms and alarm fatigue. AACN Adv Crit Care. 2011;22:418–20.

    Article  PubMed  Google Scholar 

  20. Early warning systems: scorecards that save lives. Institute for Healthcare Improvement. http://www.ihi.org/resources/Pages/ImprovementStories/EarlyWarningSystemsScorecardsThatSaveLives.aspx. Accessed 02 Jul 2015.

  21. Jund J, Rabilloud M, Wallon M, Ecochard R. Methods to estimate the optimal threshold for normally or log-normally distributed biological tests. Med Decis Making. 2005;25:406–15.

    Article  PubMed  Google Scholar 

  22. Hand DJ, Whitrow C, Adams NM, Juszczak P, Weston D. Performance criteria for plastic card fraud detection tools. J Oper Res Soc. 2008;59:956–62.

    Article  Google Scholar 

  23. Srikureja W, Kyulo NL, Runyon BA, Hu KQ. MELD score is a better prognostic model than Child–Turcotte–Pugh score or discriminant function score in patients with alcoholic hepatitis. J Hepatol. 2005;42:700–6.

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

GJE was supported by the Gordon and Betty Moore Foundation (grant titled “Early detection, prevention, and mitigation of impending physiologic deterioration in hospitalized patients outside intensive care: Phase 3, pilot”), the Permanente Medical Group, Inc., and Kaiser Foundation Hospitals, Inc.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santiago Romero-Brufau.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

GJE provided the operational interpretations and expertise and revised the manuscript critically for content. JMH provided operational expertise and revised the manuscript critically for content. ML helped draft the manuscript and provided the statistical examples. S-RB conceived of the article and drafted the manuscript. All authors read and approved the manuscript.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Romero-Brufau, S., Huddleston, J.M., Escobar, G.J. et al. Why the C-statistic is not informative to evaluate early warning scores and what metrics to use. Crit Care 19, 285 (2015). https://doi.org/10.1186/s13054-015-0999-1

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/s13054-015-0999-1

Keywords

Navigation