Skip to main content
Log in

Classification-algorithm evaluation: Five performance measures based onconfusion matrices

  • Algorithm
  • Published:
Journal of Clinical Monitoring Aims and scope Submit manuscript

Abstract

Objective. The objective of this paper is to introduce, explain, and extend methods for comparing the performance of classification algorithms using error tallies obtained on properly sized, populated, and labeled data sets.Methods. Two distinct contexts of classification are defined, involving “objects-by-inspection” and “objects-by-segmentation.” In the former context, the total number of objects to be classified is unambiguously and self-evidently defined. In the latter, there is troublesome ambiguity. All five of the measures of performance here considered are based on confusion matrices, tables of counts revealing the extent of an algorithm's “confusion” regarding the true classifications. A proper measure of classification-algorithm performancemust meet four requirements. A proper measureshould obey six additional constraints.Results. Four traditional measures of performance are critiqued in terms of the requirements and constraints. Each measure meets the requirements, but fails to obey at least one of the constraints. A nontraditional measure of algorithm performance, the normalized mutual information (NMI), is therefore introduced. Based on the NMI, methods for comparing algorithm performance using confusion matrices are devised.Conclusions. The five performance measures lead to similar inferences when comparing a trio of QRS-detection algorithms using a large data set. The modified NMI is preferred, however, because it obeys each of the constraints and is the most conservative measure of performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Fleiss JL. Statistical methods for rates and proportions, ed 2. New York: Wiley, 1981:223–224

    Google Scholar 

  2. Hand DJ. Discrimination and classification. Chichester, UK: Wiley, 1981:161–162

    Google Scholar 

  3. Liebetrau AM. Measures of association. Beverly Hills: Sage Publications, 1983:34

    Google Scholar 

  4. Sachs L. Applied statistics: A handbook of techniques, ed 5. New York: Springer-Verlag, 1982

    Google Scholar 

  5. Bishop Y, Fienberg SE, Holland PW, et al. Discrete multivariate analysis: Theory and practice. Cambridge, MA: MIT Press, 1975:394

    Google Scholar 

  6. Donker DK, Hasman A, Van Geijin HP, et al. Kappa statistics: What does it say? Medinfo 92, 1992:901

    Google Scholar 

  7. Wickens TD. Multiway contingency table analysis for the social sciences. Hillsdale, NJ: Lawrence Erlbaum, 1989:230–238

    Google Scholar 

  8. Johnson NL, Kotz S. Continuous univariate distributions—2. New York: Wiley, 1970:47

    Google Scholar 

  9. Cox JR, Hermes RE, Ripley KL. Evaluation of performance. In: Wenger K et al, eds. Ambulatory electrocardiographic recording. Chicago: Yearbook Medical Publishers, 1980:183–198

    Google Scholar 

  10. Daniel WW. Applied nonparametric statistics, ed 2. Boston: PWS-Kent Publishing Co, 1990:146–148

    Google Scholar 

  11. Kazakos D, Cotsidas T. A decision theory approach to the approximation of discrete probability densities. IEEE Trans PAMI-2(1), 1980:61–67

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Forbes, A.D. Classification-algorithm evaluation: Five performance measures based onconfusion matrices. J Clin Monitor Comput 11, 189–206 (1995). https://doi.org/10.1007/BF01617722

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01617722

Key words

Navigation