Skip to main content

Using discriminant analysis for multi-class classification: an experimental investigation


Many supervised machine learning tasks can be cast as multi-class classification problems. Support vector machines (SVMs) excel at binary classification problems, but the elegant theory behind large-margin hyperplane cannot be easily extended to their multi-class counterparts. On the other hand, it was shown that the decision hyperplanes for binary classification obtained by SVMs are equivalent to the solutions obtained by Fisher's linear discriminant on the set of support vectors. Discriminant analysis approaches are well known to learn discriminative feature transformations in the statistical pattern recognition literature and can be easily extend to multi-class cases. The use of discriminant analysis, however, has not been fully experimented in the data mining literature. In this paper, we explore the use of discriminant analysis for multi-class classification problems. We evaluate the performance of discriminant analysis on a large collection of benchmark datasets and investigate its usage in text categorization. Our experiments suggest that discriminant analysis provides a fast, efficient yet accurate alternative for general multi-class classification problems.

This is a preview of subscription content, access via your institution.


  1. 1.

    Allwein EL, et al (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. JMLR 1:113–141

    Google Scholar 

  2. 2.

    Bai Z (1992) The CSD, GSVD, their applications and computations. Tech. Rep. IMA Preprint Series 958, Minneapolis, MN

  3. 3.

    Barber D, Williams CKI (1997) Gaussian processes for bayesian classification via hybrid Monte Carlo. In: Mozer MC, Jordan MI, Petsche T (eds) Advances in neural information processing systems, vol.9. The MIT Press, p 340

  4. 4.

    Blake C, Merz C (1998) UCI repository of machine learning databasesIrvine, Department of Information and Computer Science, University of California, CA, COLT 2000 []

  5. 5.

    Boley D, et al (1999) Document categorization and query generation on the world wide web using WebACE. AI Rev 13(5–6):365–391

    Google Scholar 

  6. 6.

    Bottou L, et al (1994) Comparison of classifier methods: a case study in handwriting digit recognition. In: International Conference on Pattern Recognition, pp 77–87

  7. 7.

    Breiman L, et al (1993) Classification and regression trees. Chapman and Hall, New York

  8. 8.

    Chen L, et al (2000) A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recogn 33(10):1713–1726

    Article  Google Scholar 

  9. 9.

    Collobert R, Bengio S (2001) SVMTorch: support vector machines for large-scale regression problems. J Machine Learn Res 1:143–160

    Article  MathSciNet  Google Scholar 

  10. 10.

    Crammer K, Singer Y (2000) On the learnability and design of output codes for multiclass problems. Comput Learn Theory, COLT 2000, pp 35–46

  11. 11.

    Crammer K, Singer Y (2001) Ultraconservative online algorithm for multiclass problems. In: Proceedings of COLT 2001, pp 99–115

  12. 12.

    Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286

    Google Scholar 

  13. 13.

    Duda RO, et al (2001) Pattern classification. Wiley, Inc.

  14. 14.

    Dzeroski S, Zenko B (2002) Stacking with multi-response model trees. In: Proceedings of The third international workshop on multiple classifier systems, MCS, Springer-Verlag, pp 201–211

  15. 15.

    Fisher R (1936) The use of multiple measurements in taxonomic problems. Annal Eugen (7):179–188

  16. 16.

    Friedman J (1996) Another approach to polychotomous classification. Technical report, Department of Statistics, Stanford

  17. 17.

    Fukunaga K (1990) Introduction to statistical pattern recognition. Academic

  18. 18.

    Gallinari P, et al (1991) On the relations between discriminant analysis and multilayer perceptrons. Neural Networks 4(3):349–360

    Google Scholar 

  19. 19.

    Ghani R (2001) Combining labeled and unlabeled data for text classification with a large number of categories. In: Proceedings of ICDM-01, pp 597–598

  20. 20.

    Gibbs MN, MacKay DJC (2000) Variational gaussian process classifiers. IEEE Trans Neural Networks 11(6):1458

    Google Scholar 

  21. 21.

    Godbole S, et al (2002) Scaling multi-class support vector machine using inter-class confusion. In: Proceedings of KDD-02, pp 513–518

  22. 22.

    Golub TR, et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–536

    Google Scholar 

  23. 23.

    Guruswami V, Sahai A (1999) Multiclass learning, boosting, and error-correcting codes. In: Proceedings of the 12th annual conference on Computational learning theory, ACM Press, pp 145–155

  24. 24.

    Han E-H, et al (1998) WebACE: A Web agent for document categorization and exploration. In: Sycara KP, Wooldridge M, (eds) Proceedings of the 2nd International Conference on Autonomous Agents. ACM Press, New York, pp 408–415

  25. 25.

    Hastie T, Tibshirani R (1998) Classification by pairwise coupling. In: Jordan MI, Kearns MJ, Solla SA (eds) Advances in neural information processing systems, vol 10, The MIT Press

  26. 26.

    Hastie T, et al (2001) The elemetns of statistical learning: data mining, inference, prediction. Springer

  27. 27.

    Hsu C-W, Lin C-J (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Networks (13):415–425

    Article  Google Scholar 

  28. 28.

    Huang R, et al (2002) Solving the small size problem of LDA. In: 16th international conference on pattern recognition (ICPR 2002), vol 3

  29. 29.

    Joachims T (2001) A statistical learning model of text classification with support vector machines. In: Proceedings of the conference on research and development in information retrieval (SIGIR), ACM

  30. 30.

    Johnson RA, Wichern DW (1988) Applied multivariate statistical analysis. Prentice Hall

  31. 31.

    Kawatani T (2002) Topic difference factor extraction between two document sets of its application to text categorization.In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, Tampere, Finland. ACM, pp 137–144

  32. 32.

    Kreeel UH-G (1999) Pairwise classification and support vector machines. In: Advances in Kernel mathods, MIT Press

  33. 33.

    Lee Y, et al (2001) Multicategory support vector machines. In: Proceedings of the 33rd symposium on the interface

  34. 34.

    Loan CV (1976) Generalizing the singular value decomposition. SIAM J Num Anal 13:76–83

    Google Scholar 

  35. 35.

    Loog M, et al (2001) Multiclass linear dimension reduction by weighted pairwise fisher criteria. IEEE Trans Pattern Anal Machine Intell 23(7):762–766

    Article  Google Scholar 

  36. 36.

    Martinez AM, Kak AC (2001) PCA versus LDA. IEEE Trans Pattern Anal Machine Intell 23(2):228–233

    Article  Google Scholar 

  37. 37.

    McCallum AK (1996) Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering.

  38. 38.

    McLachlan GJ (1992) Discriminant analysis and statistical pattern recognition. Wiley

  39. 39.

    Mika S, et al (1999) Fisher Discriminant Analysis with Kernels. In: Hu Y-H, Larsen J, Wilson E, Douglas S (eds) Neural networks for signal processing IX, IEEE, pp 41–48

  40. 40.

    Mitchell TM (1997) Machine learning. The McGraw-Hill Companies, Inc.

  41. 41.

    Noordewier MO, et al (1991) Training knowledge-based neural networks to recognize genes. In: Lippman RP, Moody JE, Touretzky DS (eds) Advances in neural information processing systems, vol 3. Morgan Kauffmann, Publishers, Inc., pp 530–536

  42. 42.

    Papadimitriou CH, et al (1998) Latent semantic indexing: a probabilistic analysis. In: Proceedings of the 17th ACM symposium on the principles of database systems, ACM Press, pp 159–168

  43. 43.

    Park H, et al (2001) Dimension reduction for text data representation based on cluster structure preserving projection. Tech. Rep. 01-013, Department of Computer Science, University of Minnesota

  44. 44.

    Platt J, et al (2000) Large Margin DAGs for Multiclass Classification. In: Solla S, Leen T, Muller K-R (eds) Advances in neural information processing systems, vol 12, MIT Press

  45. 45.

    Quinlan J (1993) C4.5: Programs for machine learning, Morgan Kaufmann

  46. 46.

    Rennie JDM (2001) Improving Multi-class Text Classification with Naive Bayes. Master's thesis, Massachusetts Institute of Technology

  47. 47.

    Roth D, et al (2000) Learning to Recognize Objects. In: Computer vision and pattern recognition (CVPR), pp 724–731

  48. 48.

    Roth V (2001) Probabilistic discriminative Kernel classifiers for multi-class problems. Lecture Notes in Computer Sci 2191:246–253

  49. 49.

    Schapire RE, Singer Y (2000) BoosTexter: A boosting-based system for text categorization. Machine Learn 39(2–3):135–168

    Article  MATH  Google Scholar 

  50. 50.

    Schölkopf B, Smola AJ (2002) Learning with Kernels. MIT Press, Cambridge, MA

  51. 51.

    SGI (2000) MLC++: Datasets from UCI.

  52. 52.

    Shashua A (1999) On the equivalence between the support vector machine for classification and sparsified Fisher's linear discriminant. Neural Process Lett 9(2):129–139

    Article  Google Scholar 

  53. 53.

    Swets DL, Weng J (1996) Using discriminant eigenfeatures for image retrieval. IEEE Trans Pattern Anal Machine Intell 18(8):831–836

    Article  Google Scholar 

  54. 54.

    TDT2 (1998) Nist Topic detection and tracking corpus. http://www.nist.gove/speech/tests/tdt/tdt98/index.htm

  55. 55.

    Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  56. 56.

    Weston J, Watkins C (1998) Multi-class support vector machines. Tech. rep., Department of Computer Science, University of London, London

  57. 57.

    Yang C-H, et al (2000) Efficient routability check algorithms for segmented channel routing. ACM Trans Des Autom Electron Syst 5(3):735–747

    Google Scholar 

  58. 58.

    Yang Y, Liu X (1999) A re-examination of text categorization methods. In: the 22th Ann Int ACM SIGIR conference on research and development in information retrieval (SIGIR'99), pp 42-49

  59. 59.

    Yang Y, Pederson JO (1997) A Comparative study on Feature selection in text categorization. In: Proceedings of the fourteenth international conference on machine learning (ICML), pp 412–420

  60. 60.

    Zadrozny B (2001) Reducing multiclass to binary by coupling probability estimates. In: NIPS, pp 1041–1048

  61. 61.

    Zhao W, et al (1999) Subspace linear discriminant analysis for face recognition. Tech. Rep. CAR-TR-914., University of Maryland, College Park

Download references

Author information



Corresponding author

Correspondence to Tao Li.

Additional information

Tao Li is currently an assistant professor in the School of Computer Science at Florida International University. He received his Ph.D. degree in Computer Science from University of Rochester in 2004. His primary research interests are: data mining, machine learning, bioinformatics, and music information retrieval.

Shenghuo Zhu is currently a researcher in NEC Laboratories America, Inc. He received his B.E. from Zhejiang University in 1994, B.E. from Tsinghua University in 1997, and Ph.D degree in Computer Science from University of Rochester in 2003. His primary research interests include information retrieval, machine learning, and data mining.

Mitsunori Ogihara received a Ph.D. in Information Sciences at Tokyo Institute of Technology in 1993. He is currently Professor and Chair of the Department of Computer Science at the University of Rochester. His primary research interests are data mining, computational complexity, and molecular computation.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Li, T., Zhu, S. & Ogihara, M. Using discriminant analysis for multi-class classification: an experimental investigation. Knowl Inf Syst 10, 453–472 (2006).

Download citation


  • Multi-class classification
  • Discriminant analysis