Abstract
Objective. The objective of this paper is to introduce, explain, and extend methods for comparing the performance of classification algorithms using error tallies obtained on properly sized, populated, and labeled data sets.Methods. Two distinct contexts of classification are defined, involving “objects-by-inspection” and “objects-by-segmentation.” In the former context, the total number of objects to be classified is unambiguously and self-evidently defined. In the latter, there is troublesome ambiguity. All five of the measures of performance here considered are based on confusion matrices, tables of counts revealing the extent of an algorithm's “confusion” regarding the true classifications. A proper measure of classification-algorithm performancemust meet four requirements. A proper measureshould obey six additional constraints.Results. Four traditional measures of performance are critiqued in terms of the requirements and constraints. Each measure meets the requirements, but fails to obey at least one of the constraints. A nontraditional measure of algorithm performance, the normalized mutual information (NMI), is therefore introduced. Based on the NMI, methods for comparing algorithm performance using confusion matrices are devised.Conclusions. The five performance measures lead to similar inferences when comparing a trio of QRS-detection algorithms using a large data set. The modified NMI is preferred, however, because it obeys each of the constraints and is the most conservative measure of performance.
Similar content being viewed by others
References
Fleiss JL. Statistical methods for rates and proportions, ed 2. New York: Wiley, 1981:223–224
Hand DJ. Discrimination and classification. Chichester, UK: Wiley, 1981:161–162
Liebetrau AM. Measures of association. Beverly Hills: Sage Publications, 1983:34
Sachs L. Applied statistics: A handbook of techniques, ed 5. New York: Springer-Verlag, 1982
Bishop Y, Fienberg SE, Holland PW, et al. Discrete multivariate analysis: Theory and practice. Cambridge, MA: MIT Press, 1975:394
Donker DK, Hasman A, Van Geijin HP, et al. Kappa statistics: What does it say? Medinfo 92, 1992:901
Wickens TD. Multiway contingency table analysis for the social sciences. Hillsdale, NJ: Lawrence Erlbaum, 1989:230–238
Johnson NL, Kotz S. Continuous univariate distributions—2. New York: Wiley, 1970:47
Cox JR, Hermes RE, Ripley KL. Evaluation of performance. In: Wenger K et al, eds. Ambulatory electrocardiographic recording. Chicago: Yearbook Medical Publishers, 1980:183–198
Daniel WW. Applied nonparametric statistics, ed 2. Boston: PWS-Kent Publishing Co, 1990:146–148
Kazakos D, Cotsidas T. A decision theory approach to the approximation of discrete probability densities. IEEE Trans PAMI-2(1), 1980:61–67
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Forbes, A.D. Classification-algorithm evaluation: Five performance measures based onconfusion matrices. J Clin Monitor Comput 11, 189–206 (1995). https://doi.org/10.1007/BF01617722
Issue Date:
DOI: https://doi.org/10.1007/BF01617722