A review on the combination of binary classifiers in multiclass problems

  • Ana Carolina Lorena
  • André C. P. L. F. de Carvalho
  • João M. P. Gama
Article

Abstract

Several real problems involve the classification of data into categories or classes. Given a data set containing data whose classes are known, Machine Learning algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, performing the desired discrimination. Some learning techniques are originally conceived for the solution of problems with only two classes, also named binary classification problems. However, many problems require the discrimination of examples into more than two categories or classes. This paper presents a survey on the main strategies for the generalization of binary classifiers to problems with more than two classes, known as multiclass classification problems. The focus is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are combined to obtain the final prediction.

Keywords

Machine learning Supervised learning Multiclass classification 

References

  1. Allwein EL, Shapire RE, Singer Y (2000) Reducing multiclass to binary: a unifying approach for magin classifiers. In: Proceedings of the 17th international conference on machine learning. Morgan Kaufmann, pp 9–16Google Scholar
  2. Boser RC, Ray-Chaudhuri DK (1960) On a class of error-correcting binary group codes. Inform Control 3: 68–79CrossRefGoogle Scholar
  3. Crammer K, Singer Y (2002) On the learnability and design of output codes for multiclass problems. Mach Learn 47(2–3): 201–233MATHCrossRefGoogle Scholar
  4. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, CambridgeGoogle Scholar
  5. Cohen WW (1995) Fast effective rule induction. In: Machine learning: Proceedings of the 12th conference on machine learning, pp 115–123Google Scholar
  6. Collins M, Shapire RE, Singer Y (2002) Logistic regression, adaboost and bregman distances. Mach Learn 47(2/3): 253–285CrossRefGoogle Scholar
  7. Dekel O, Singer Y (2003) Multiclass learning by probabilistic embeddings. In: Advances in neural information processing systems, vol. 15. MIT Press, Cambridge, pp 945–952Google Scholar
  8. Dietterich TG, Bariki G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2: 263–286MATHGoogle Scholar
  9. Duan K, Keerthi SS (2005) Which is the best multiclass svm method? An empirical study. In: Proceedings of the 6th international workshop on multiple classifier systems, MCS 2005, vol. 3541 of lecture notes in computer science, pp 278–285Google Scholar
  10. Escalera S, Pujol O, Radeva R (2006) Decoding of ternary error correcting output codes. In: Proceedings of the 11th iberoamerican congress on pattern recognition, vol. 4225 of lecture notes in computer science. Springer, New York, pp 753–763Google Scholar
  11. Feng J, Yang Y, Fan J (2005) Fuzzy multi-class SVM classifier based on optimal directed acyclic graph using in similar handwritten chinese characters recognition. In: Wang J, Liao X, Yi Z (eds) Proceedings of the international symposium on neural networks, vol. 3496 of lecture notes in computer science. Springer, New York, pp 875–880Google Scholar
  12. Frank E, Kramer S (2004) Ensembles of nested dichotomies for multi-class problems. In: Proceedings of the 21st international conference on machine learning. ACM Press, pp 305–312Google Scholar
  13. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 1(55): 119–139CrossRefMathSciNetGoogle Scholar
  14. Furnkranz J (2002) Round robin classification. J Mach Learn Res 2: 721–747CrossRefMathSciNetGoogle Scholar
  15. Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 2: 451–471MathSciNetGoogle Scholar
  16. Haykin S (1999) Neural networks—a compreensive foundation, 2nd edn. Prentice-Hall, New JerseyGoogle Scholar
  17. Hsu C-W, Lin C-J (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Netw 13(2): 415–425CrossRefGoogle Scholar
  18. Huang T-K, Weng RC, Lin C-J (2006) Generalized bradley-terry models and multi-class probability estimates. J Mach Learn Res 7: 85–115MathSciNetGoogle Scholar
  19. Kijsirikul B, Ussivakul N (2002) Multiclass support vector machines using adaptive directed acyclic graph. In: Proceedings of international joint conference on neural networks (IJCNN 2002), pp 980–985Google Scholar
  20. Klautau A, Jevtić N, Orlistky A (2003) On nearest-neighbor error-correcting output codes with application to all-pairs multiclass support vector machines. J Mach Learn Res 4: 1–15CrossRefGoogle Scholar
  21. Knerr S, Personnaz L, Dreyfus G (1990) Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Fogelman-Soulie F, Herault J (eds) Neurocomputing: algorithms, architectures and applications. Springer, New York, pp 41–50Google Scholar
  22. Knerr S, Personnaz L, Dreyfus G (1992) Handwritten digit recognition by neural networks with single-layer training. IEEE Trans Neural Netw 3(6): 962–968CrossRefGoogle Scholar
  23. Krebel U (1999) Pairwise classification and support vector machines. In: Scholkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel methods—support vector learning. MIT Press, Cambridge, pp 185–208Google Scholar
  24. Kuncheva LI (2005) Using diversity measures for generating error-correcting output codes in classifier ensembles. Pattern Recognit Lett 26: 83–90CrossRefGoogle Scholar
  25. Lee J-S, Oh I-S (2003) Binary classification trees for multi-class classification problems. In: Proceedings of the 7th international conference on document analysis and recognition, vol. 2, pp 770–774Google Scholar
  26. Lei H, Govindaraju V (2005) Half-against-half multi-class support vector machines. In: Oza NC, Polikar R, Kittler J, Roli F (eds) Proceedings of the 6th international workshop on multiple classifier systems, vol. 3541 of lecture notes in computer science. Springer, New York, pp 156–164Google Scholar
  27. Lorena AC, Carvalho ACPLF (2004) Comparing techniques for multiclass classification with binary SVM predictors. In: Monroy R, Arroyo-Figueroa G, Sucar LE, Azuela JHS (eds) MICAI 2004: advances in artificial intelligence, third Mexican international conference on artificial intelligence, Mexico City, Mexico, vol. 2972 of lecture notes in artificial intelligence. Springer, New York, pp 272–281Google Scholar
  28. Lorena AC, Carvalho ACPLF (2007a) Evolutionary design of multiclass support vector machines. J Intell Fuzzy Syst 18: 445–454MATHGoogle Scholar
  29. Lorena AC, Carvalho ACPLF (2007b) Design of directed acyclic graph multiclass structures. Neural Netw World 17: 657–674Google Scholar
  30. Lorena AC, Carvalho ACPLF (2008a) Hierarchical decomposition of multiclass problems. Neural Netw World 5: 407–425Google Scholar
  31. Lorena AC, Carvalho ACPLF (2008b) Investigation of strategies for the generation of multiclass support vector machines. In: The twenty first international conference on industrial, engineering & other applications of applied intelligent systems (IEA/AIE), 1st edn, vol. 134 of studies in computational intelligence. Springer, New York, pp 319–328Google Scholar
  32. Mayoraz E, Moreira M (1996) On the decomposition of polychotomies into dichotomies, research report 96–08, IDIAP, Dalle Molle institute for perceptive artificial intelligence. MartignyGoogle Scholar
  33. Mayoraz E, Alpaydim E (1998) Support vector machines for multi-class classification, research report IDIAP-RR-98-06. Dalle Molle institute for perceptual artificial intelligence, MartignyGoogle Scholar
  34. Mitchell T (1997) Machine learning. McGraw Hill, New YorkMATHGoogle Scholar
  35. Mitchell M (1999) An introduction to genetic algorithms. MIT Press, CambridgeGoogle Scholar
  36. Passerini A, Pontil M, Frasconi P (2004) New results on error correcting output codes of kernel machines. IEEE Trans Neural Netw 15: 45–54CrossRefGoogle Scholar
  37. Phetkaew T, Kijsirikul B, Rivepiboon W (2003) Reordering adaptive directed acyclic graphs: an improved algorithm for multiclass support vector machines. In: Proceedings of the international conference on neural networks. IEEE Computer Society Press, pp 1605–1610Google Scholar
  38. Phetkaew T, Rivepiboon W, Kijsirikul B (2003) Reordering adaptive directed acyclic graphs for multiclass support vector machines. J Adv Comput Intell Intell Inform 7(3): 315–321Google Scholar
  39. Pimenta E, Gama J (2005) A study on error correcting output codes. In: Proceedings of the 2005 Portuguese conference on artificial intelligence. IEEE Computer Society Press, pp 218–223Google Scholar
  40. Platt JC, Cristiani N, Shawe-Taylor J (2000) Large margin DAGs for multiclass classification. In: Advances in neural information processing systems, vol. 12. The MIT Press, Cambridge, pp 547–553Google Scholar
  41. Pontil M, Verri A (1998) Support vector machines for 3d object recognition. IEEE Trans Pattern Anal Mach Intell 20(6): 637–646CrossRefGoogle Scholar
  42. Pujol O, Tadeva P, Vitrià J (2006) Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes. IEEE Trans Pattern Anal Mach Intell 28(6): 1007–1012CrossRefGoogle Scholar
  43. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1): 81–106Google Scholar
  44. Ratsch G, Smola AJ, Mika S (2003) Adapting codes and embeddings for polychotomies. In: Advances in neural information processing systems, vol. 15. MIT Press, New York, pp 513–520Google Scholar
  45. Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5: 1533–7928MathSciNetGoogle Scholar
  46. Savicky P, Furnkranz J (2003) Combining pairwise classifiers with stacking. In: Berthold MR, Lenz H-J, Bradley E, Kruse R, Borgelt C (eds) Advances in intelligent data analysis V, 5th international symposium on intelligent data analysis. IDA 2003, pp 219–229Google Scholar
  47. Schwenker F (2000) Hierarquical support vector machines for multi-class pattern recognition. In: Proceedings of the 4th international conference on knowledge-based intelligent systems and allied technologies. IEEE Computer Society Press, pp 561–565Google Scholar
  48. Schwenker F, Palm G (2001) Tree-structured support vector machines for multiclass pattern recognition. In: Kittler J, Roli F (eds) Proceedings of the international workshop on multiple classifier systems, vol. 2096 of lecture notes in computer science. Springer, New York, pp 409–417Google Scholar
  49. Shen L, Tan EC Seeking better output-codes with genetic algorithm for multiclass cancer classification (Submitted to Bioinformatics)Google Scholar
  50. Takahashi F, Abe S (2002) Decision-tree-based multiclass support vector machines. In: Proceedings of the 9th international conference on neural information processing, vol. 3, pp 1418–1422Google Scholar
  51. Takahashi F, Abe S (2003) Optimizing directed acyclic graph support vector machines. In: Proceedings of artificial neural networks in pattern recognition, pp 166–170Google Scholar
  52. Vural V, Dy JG (2004) A hierarchical method for multi-class support vector machines. In: Proceedings of the 21st international conference on machine learning. Banff, pp 831–838Google Scholar
  53. Zadrozny B (2001) Reducing multiclass to binary by coupling probability estimates. In: Advances in neural information processing systems, vol. 14Google Scholar
  54. Zhang G, Jun W (2006) Automatic construction algorithm for multi-class support vector machines with binary tree architecture. Int J Comput Sci Netw Secur 6(2A): 119–126MathSciNetGoogle Scholar
  55. Zhigang L, Wenzhong S, Qianqing Q, Xiaowen L, Donghui X (2005) Hierarchical support vector machines, In: Proceedings of the IEEE international geoscience and remote sensing symposium. IEEE Computer Society Press, 4 ppGoogle Scholar
  56. Weston J, Watkins V (1998) Multi-class support vector machines. Technical Report CSD-TR-98-04. Department of Computer Science. University of London, LondonGoogle Scholar
  57. Windeatt T, Ghaderi R (2003) Coding and decoding strategies for multi-class learning problems. Inform Fusion 4(1): 11–21CrossRefGoogle Scholar
  58. Wu T-F, Lin C-J, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5: 975–1005MathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  • Ana Carolina Lorena
    • 1
  • André C. P. L. F. de Carvalho
    • 2
  • João M. P. Gama
    • 3
  1. 1.Centro de Matemática, Computação e CogniçãoUniversidade Federal do ABCSanto AndréBrazil
  2. 2.Departamento de Ciências de Computação, Instituto de Ciências Matemáticas e de ComputaçãoUniversidade de São PauloSão CarlosBrazil
  3. 3.Laboratório de Inteligência Artificial e Ciência de ComputadoresUniversidade do PortoPortoPortugal

Personalised recommendations