Annals of Operations Research

, Volume 174, Issue 1, pp 147–168 | Cite as

Analysis of the consistency of a mixed integer programming-based multi-category constrained discriminant model

  • J. Paul Brooks
  • Eva K. LeeEmail author


Classification is concerned with the development of rules for the allocation of observations to groups, and is a fundamental problem in machine learning. Much of previous work on classification models investigates two-group discrimination. Multi-category classification is less-often considered due to the tendency of generalizations of two-group models to produce misclassification rates that are higher than desirable. Indeed, producing “good” two-group classification rules is a challenging task for some applications, and producing good multi-category rules is generally more difficult. Additionally, even when the “optimal” classification rule is known, inter-group misclassification rates may be higher than tolerable for a given classification model. We investigate properties of a mixed-integer programming based multi-category classification model that allows for the pre-specification of limits on inter-group misclassification rates. The mechanism by which the limits are satisfied is the use of a reserved judgment region, an artificial category into which observations are placed whose attributes do not sufficiently indicate membership to any particular group. The method is shown to be a consistent estimator of a classification rule with misclassification limits, and performance on simulated and real-world data is demonstrated.


Constrained discriminant analysis Mixed integer program Multi-category classification Multi-group classification Consistency Reserved judgment 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Anderson, J. A. (1969). Constrained discrimination between k populations. Journal of the Royal Statistical Society. Series B (Methodological), 31, 123–139. Google Scholar
  2. Beckman, R. J., & Johnson, M. E. (2006). A ranking procedure for partial discriminant analysis. Journal of the American Statistical Association, 76, 671–675. CrossRefGoogle Scholar
  3. Broffitt, J. D., Randles, R. H., & Hogg, R. V. (1976). Distribution-free partial discriminant analysis. Journal of the American Statistical Association, 71, 934–939. CrossRefGoogle Scholar
  4. Cover, T. (1968). Rates of convergence for nearest neighbor procedures. In Proceedings of the Hawaii international conference on system sciences (pp. 413–415). Honolulu. Google Scholar
  5. Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J., Sandhu, S., Guppy, K., Lee, S., & Froelicher, V. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology, 64, 304–310. CrossRefGoogle Scholar
  6. Devroye, L., Györfi, L., & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Berlin: Springer. Google Scholar
  7. Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification. New York: Wiley. Google Scholar
  8. Feltus, F. A., Lee, E. K., Costello, J. F., Plass, C., & Vertino, P. M. (2003). Predicting aberrant CpG island methylation. Proceedings of the National Academy of Sciences, 100, 12253–12258. CrossRefGoogle Scholar
  9. Feltus, F. A., Lee, E. K., Costello, J. F., Plass, C., & Vertino, P. M. (2006). DNA motifs associated with aberrant CpG island methylation. Genomics, 87, 572–579. CrossRefGoogle Scholar
  10. Gallagher, R. J., Lee, E. K., & Patterson, D. A. (1997). Constrained discriminant analysis via 0/1 mixed integer programming. Annals of Operations Research, 74, 65–88. CrossRefGoogle Scholar
  11. Györfi, L., Györfi, Z., & Vajda, I. (1979). Bayesian decision with rejection. Problems of Control and Information Theory, 8, 445–452. Google Scholar
  12. Habbema, J. D. F., Hermans, J., & Van Der Burgt, A. T. (1974). Cases of doubt in allocation problems. Biometrika, 61, 313–324. CrossRefGoogle Scholar
  13. Joachims, T. (1999). Making large-scale SVM learning practical. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods—support vector learning, B. Cambridge: MIT-Press. Google Scholar
  14. Lee, E. K. (2007a). Large-scale optimization-based classification models in medicine and biology. Annals of Biomedical Engineering, Systems Biology and Bioinformatics, 35(6), 1095–1109. Google Scholar
  15. Lee, E. K. (2007b). Optimization-based predictive models in medicine and biology. In Springer Series in Optimization and Its Application: Vol. 12. Optimization in Medicine (pp. 127–151). Dordrecht: Springer. CrossRefGoogle Scholar
  16. Lee, E. K., & Wu, T. L. (2007). Classification and disease prediction via mathematical programming. In O. Seref, O.E. Kundakcioglu, & P. Pardalos (Eds.), AIP conference proceedings: Vol. 953. Data mining, systems analysis, and optimization in biomedicine (pp. 1–42). Google Scholar
  17. Lee, E. K., Fung, A. Y. C., Brooks, J. P., & Zaider, M. (2002). Automated planning volume definition in soft-tissue sarcoma adjuvant brachytherapy. Biology in Physics and Medicine, 47, 1891–1910. CrossRefGoogle Scholar
  18. Lee, E. K., Gallagher, R. J., & Patterson, D. A. (2003). A linear programming approach to discriminant analysis with a reserved-judgment region. INFORMS Journal on Computing, 15, 23–41. CrossRefGoogle Scholar
  19. Lee, E. K., Gallagher, R. J., Campbell, A. M., & Prausnitz, M. R. (2004). Prediction of ultrasound-mediated disruption of cell membranes using machine learning techniques and statistical analysis of acoustic spectra. IEEE Transactions on Biomedical Engineering, 51, 1–9. CrossRefGoogle Scholar
  20. Newman, D. J., Hettich, S., Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases.
  21. Quesenberry, C. P., & Gessaman, M. P. (1968). Nonparametric discrimination using tolerance regions. Annals of Mathematical Statistics, 39, 664–673. CrossRefGoogle Scholar
  22. Gallagher, R. J., Lee, E. K., & Patterson, D. A. (1996). An optimization model for constrained discriminant analysis and numerical experiments with iris, thyroid, and heart disease datasets. In J.J. Cimino (Ed.), Proceedings of the 1996 American medical informatics association (pp. 209–213). Google Scholar
  23. Vapnik, V. (1998). Statistical learning theory. New York: Wiley. Google Scholar
  24. Vapnik, V. N., & Chervonenkis, A. Ya. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16, 264–270. CrossRefGoogle Scholar
  25. Vapnik, V. N., & Chervonenkis, A. Ya. (1981). Necessary and sufficient conditions for the uniform convergence of means to their expectations. Theory of Probability and its Applications, 26, 532–553. CrossRefGoogle Scholar
  26. Wright, A. H. (1999). The role of integrins in the differential upregulation of tumor cell motility by endothelial extracellular matrix proteins. PhD thesis, Georgia Institute of Technology. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Department of Statistical Sciences and Operations ResearchVirginia Commonwealth UniversityRichmondUSA
  2. 2.School of Industrial and Systems EngineeringGeorgia Institute of TechnologyAtlantaUSA

Personalised recommendations