Analysis of the consistency of a mixed integer programming-based multi-category constrained discriminant model
- 199 Downloads
Classification is concerned with the development of rules for the allocation of observations to groups, and is a fundamental problem in machine learning. Much of previous work on classification models investigates two-group discrimination. Multi-category classification is less-often considered due to the tendency of generalizations of two-group models to produce misclassification rates that are higher than desirable. Indeed, producing “good” two-group classification rules is a challenging task for some applications, and producing good multi-category rules is generally more difficult. Additionally, even when the “optimal” classification rule is known, inter-group misclassification rates may be higher than tolerable for a given classification model. We investigate properties of a mixed-integer programming based multi-category classification model that allows for the pre-specification of limits on inter-group misclassification rates. The mechanism by which the limits are satisfied is the use of a reserved judgment region, an artificial category into which observations are placed whose attributes do not sufficiently indicate membership to any particular group. The method is shown to be a consistent estimator of a classification rule with misclassification limits, and performance on simulated and real-world data is demonstrated.
KeywordsConstrained discriminant analysis Mixed integer program Multi-category classification Multi-group classification Consistency Reserved judgment
Unable to display preview. Download preview PDF.
- Anderson, J. A. (1969). Constrained discrimination between k populations. Journal of the Royal Statistical Society. Series B (Methodological), 31, 123–139. Google Scholar
- Cover, T. (1968). Rates of convergence for nearest neighbor procedures. In Proceedings of the Hawaii international conference on system sciences (pp. 413–415). Honolulu. Google Scholar
- Devroye, L., Györfi, L., & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Berlin: Springer. Google Scholar
- Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification. New York: Wiley. Google Scholar
- Györfi, L., Györfi, Z., & Vajda, I. (1979). Bayesian decision with rejection. Problems of Control and Information Theory, 8, 445–452. Google Scholar
- Joachims, T. (1999). Making large-scale SVM learning practical. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods—support vector learning, B. Cambridge: MIT-Press. Google Scholar
- Lee, E. K. (2007a). Large-scale optimization-based classification models in medicine and biology. Annals of Biomedical Engineering, Systems Biology and Bioinformatics, 35(6), 1095–1109. Google Scholar
- Lee, E. K., & Wu, T. L. (2007). Classification and disease prediction via mathematical programming. In O. Seref, O.E. Kundakcioglu, & P. Pardalos (Eds.), AIP conference proceedings: Vol. 953. Data mining, systems analysis, and optimization in biomedicine (pp. 1–42). Google Scholar
- Newman, D. J., Hettich, S., Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html.
- Gallagher, R. J., Lee, E. K., & Patterson, D. A. (1996). An optimization model for constrained discriminant analysis and numerical experiments with iris, thyroid, and heart disease datasets. In J.J. Cimino (Ed.), Proceedings of the 1996 American medical informatics association (pp. 209–213). Google Scholar
- Vapnik, V. (1998). Statistical learning theory. New York: Wiley. Google Scholar
- Wright, A. H. (1999). The role of integrins in the differential upregulation of tumor cell motility by endothelial extracellular matrix proteins. PhD thesis, Georgia Institute of Technology. Google Scholar