Classification-algorithm evaluation: Five performance measures based onconfusion matrices

Forbes, A. Dean

doi:10.1007/BF01617722

Classification-algorithm evaluation: Five performance measures based onconfusion matrices

Algorithm
Published: May 1995

Volume 11, pages 189–206, (1995)
Cite this article

Journal of Clinical Monitoring Aims and scope Submit manuscript

A. Dean Forbes¹

718 Accesses
103 Citations
Explore all metrics

Abstract

Objective. The objective of this paper is to introduce, explain, and extend methods for comparing the performance of classification algorithms using error tallies obtained on properly sized, populated, and labeled data sets.Methods. Two distinct contexts of classification are defined, involving “objects-by-inspection” and “objects-by-segmentation.” In the former context, the total number of objects to be classified is unambiguously and self-evidently defined. In the latter, there is troublesome ambiguity. All five of the measures of performance here considered are based on confusion matrices, tables of counts revealing the extent of an algorithm's “confusion” regarding the true classifications. A proper measure of classification-algorithm performancemust meet four requirements. A proper measureshould obey six additional constraints.Results. Four traditional measures of performance are critiqued in terms of the requirements and constraints. Each measure meets the requirements, but fails to obey at least one of the constraints. A nontraditional measure of algorithm performance, the normalized mutual information (NMI), is therefore introduced. Based on the NMI, methods for comparing algorithm performance using confusion matrices are devised.Conclusions. The five performance measures lead to similar inferences when comparing a trio of QRS-detection algorithms using a large data set. The modified NMI is preferred, however, because it obeys each of the constraints and is the most conservative measure of performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Fleiss JL. Statistical methods for rates and proportions, ed 2. New York: Wiley, 1981:223–224
Google Scholar
Hand DJ. Discrimination and classification. Chichester, UK: Wiley, 1981:161–162
Google Scholar
Liebetrau AM. Measures of association. Beverly Hills: Sage Publications, 1983:34
Google Scholar
Sachs L. Applied statistics: A handbook of techniques, ed 5. New York: Springer-Verlag, 1982
Google Scholar
Bishop Y, Fienberg SE, Holland PW, et al. Discrete multivariate analysis: Theory and practice. Cambridge, MA: MIT Press, 1975:394
Google Scholar
Donker DK, Hasman A, Van Geijin HP, et al. Kappa statistics: What does it say? Medinfo 92, 1992:901
Google Scholar
Wickens TD. Multiway contingency table analysis for the social sciences. Hillsdale, NJ: Lawrence Erlbaum, 1989:230–238
Google Scholar
Johnson NL, Kotz S. Continuous univariate distributions—2. New York: Wiley, 1970:47
Google Scholar
Cox JR, Hermes RE, Ripley KL. Evaluation of performance. In: Wenger K et al, eds. Ambulatory electrocardiographic recording. Chicago: Yearbook Medical Publishers, 1980:183–198
Google Scholar
Daniel WW. Applied nonparametric statistics, ed 2. Boston: PWS-Kent Publishing Co, 1990:146–148
Google Scholar
Kazakos D, Cotsidas T. A decision theory approach to the approximation of discrete probability densities. IEEE Trans PAMI-2(1), 1980:61–67
Google Scholar

Download references

Author information

Authors and Affiliations

Medical Department, Hewlett-Packard Laboratories, Building 26U-16, Po Box 10350, 94303-0867, Palo Alto, CA
A. Dean Forbes

Authors

A. Dean Forbes
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Forbes, A.D. Classification-algorithm evaluation: Five performance measures based onconfusion matrices. J Clin Monitor Comput 11, 189–206 (1995). https://doi.org/10.1007/BF01617722

Download citation

Issue Date: May 1995
DOI: https://doi.org/10.1007/BF01617722

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification-algorithm evaluation: Five performance measures based onconfusion matrices

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

A survey on semi-supervised learning

A Comprehensive Survey of Clustering Algorithms

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key words

Navigation

Classification-algorithm evaluation: Five performance measures based onconfusion matrices

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

A survey on semi-supervised learning

A Comprehensive Survey of Clustering Algorithms

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation