Abstract
Several real problems involve the classification of data into categories or classes. Given a data set containing data whose classes are known, Machine Learning algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, performing the desired discrimination. Some learning techniques are originally conceived for the solution of problems with only two classes, also named binary classification problems. However, many problems require the discrimination of examples into more than two categories or classes. This paper presents a survey on the main strategies for the generalization of binary classifiers to problems with more than two classes, known as multiclass classification problems. The focus is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are combined to obtain the final prediction.
Similar content being viewed by others
References
Allwein EL, Shapire RE, Singer Y (2000) Reducing multiclass to binary: a unifying approach for magin classifiers. In: Proceedings of the 17th international conference on machine learning. Morgan Kaufmann, pp 9–16
Boser RC, Ray-Chaudhuri DK (1960) On a class of error-correcting binary group codes. Inform Control 3: 68–79
Crammer K, Singer Y (2002) On the learnability and design of output codes for multiclass problems. Mach Learn 47(2–3): 201–233
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Cohen WW (1995) Fast effective rule induction. In: Machine learning: Proceedings of the 12th conference on machine learning, pp 115–123
Collins M, Shapire RE, Singer Y (2002) Logistic regression, adaboost and bregman distances. Mach Learn 47(2/3): 253–285
Dekel O, Singer Y (2003) Multiclass learning by probabilistic embeddings. In: Advances in neural information processing systems, vol. 15. MIT Press, Cambridge, pp 945–952
Dietterich TG, Bariki G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2: 263–286
Duan K, Keerthi SS (2005) Which is the best multiclass svm method? An empirical study. In: Proceedings of the 6th international workshop on multiple classifier systems, MCS 2005, vol. 3541 of lecture notes in computer science, pp 278–285
Escalera S, Pujol O, Radeva R (2006) Decoding of ternary error correcting output codes. In: Proceedings of the 11th iberoamerican congress on pattern recognition, vol. 4225 of lecture notes in computer science. Springer, New York, pp 753–763
Feng J, Yang Y, Fan J (2005) Fuzzy multi-class SVM classifier based on optimal directed acyclic graph using in similar handwritten chinese characters recognition. In: Wang J, Liao X, Yi Z (eds) Proceedings of the international symposium on neural networks, vol. 3496 of lecture notes in computer science. Springer, New York, pp 875–880
Frank E, Kramer S (2004) Ensembles of nested dichotomies for multi-class problems. In: Proceedings of the 21st international conference on machine learning. ACM Press, pp 305–312
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 1(55): 119–139
Furnkranz J (2002) Round robin classification. J Mach Learn Res 2: 721–747
Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 2: 451–471
Haykin S (1999) Neural networks—a compreensive foundation, 2nd edn. Prentice-Hall, New Jersey
Hsu C-W, Lin C-J (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Netw 13(2): 415–425
Huang T-K, Weng RC, Lin C-J (2006) Generalized bradley-terry models and multi-class probability estimates. J Mach Learn Res 7: 85–115
Kijsirikul B, Ussivakul N (2002) Multiclass support vector machines using adaptive directed acyclic graph. In: Proceedings of international joint conference on neural networks (IJCNN 2002), pp 980–985
Klautau A, Jevtić N, Orlistky A (2003) On nearest-neighbor error-correcting output codes with application to all-pairs multiclass support vector machines. J Mach Learn Res 4: 1–15
Knerr S, Personnaz L, Dreyfus G (1990) Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Fogelman-Soulie F, Herault J (eds) Neurocomputing: algorithms, architectures and applications. Springer, New York, pp 41–50
Knerr S, Personnaz L, Dreyfus G (1992) Handwritten digit recognition by neural networks with single-layer training. IEEE Trans Neural Netw 3(6): 962–968
Krebel U (1999) Pairwise classification and support vector machines. In: Scholkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel methods—support vector learning. MIT Press, Cambridge, pp 185–208
Kuncheva LI (2005) Using diversity measures for generating error-correcting output codes in classifier ensembles. Pattern Recognit Lett 26: 83–90
Lee J-S, Oh I-S (2003) Binary classification trees for multi-class classification problems. In: Proceedings of the 7th international conference on document analysis and recognition, vol. 2, pp 770–774
Lei H, Govindaraju V (2005) Half-against-half multi-class support vector machines. In: Oza NC, Polikar R, Kittler J, Roli F (eds) Proceedings of the 6th international workshop on multiple classifier systems, vol. 3541 of lecture notes in computer science. Springer, New York, pp 156–164
Lorena AC, Carvalho ACPLF (2004) Comparing techniques for multiclass classification with binary SVM predictors. In: Monroy R, Arroyo-Figueroa G, Sucar LE, Azuela JHS (eds) MICAI 2004: advances in artificial intelligence, third Mexican international conference on artificial intelligence, Mexico City, Mexico, vol. 2972 of lecture notes in artificial intelligence. Springer, New York, pp 272–281
Lorena AC, Carvalho ACPLF (2007a) Evolutionary design of multiclass support vector machines. J Intell Fuzzy Syst 18: 445–454
Lorena AC, Carvalho ACPLF (2007b) Design of directed acyclic graph multiclass structures. Neural Netw World 17: 657–674
Lorena AC, Carvalho ACPLF (2008a) Hierarchical decomposition of multiclass problems. Neural Netw World 5: 407–425
Lorena AC, Carvalho ACPLF (2008b) Investigation of strategies for the generation of multiclass support vector machines. In: The twenty first international conference on industrial, engineering & other applications of applied intelligent systems (IEA/AIE), 1st edn, vol. 134 of studies in computational intelligence. Springer, New York, pp 319–328
Mayoraz E, Moreira M (1996) On the decomposition of polychotomies into dichotomies, research report 96–08, IDIAP, Dalle Molle institute for perceptive artificial intelligence. Martigny
Mayoraz E, Alpaydim E (1998) Support vector machines for multi-class classification, research report IDIAP-RR-98-06. Dalle Molle institute for perceptual artificial intelligence, Martigny
Mitchell T (1997) Machine learning. McGraw Hill, New York
Mitchell M (1999) An introduction to genetic algorithms. MIT Press, Cambridge
Passerini A, Pontil M, Frasconi P (2004) New results on error correcting output codes of kernel machines. IEEE Trans Neural Netw 15: 45–54
Phetkaew T, Kijsirikul B, Rivepiboon W (2003) Reordering adaptive directed acyclic graphs: an improved algorithm for multiclass support vector machines. In: Proceedings of the international conference on neural networks. IEEE Computer Society Press, pp 1605–1610
Phetkaew T, Rivepiboon W, Kijsirikul B (2003) Reordering adaptive directed acyclic graphs for multiclass support vector machines. J Adv Comput Intell Intell Inform 7(3): 315–321
Pimenta E, Gama J (2005) A study on error correcting output codes. In: Proceedings of the 2005 Portuguese conference on artificial intelligence. IEEE Computer Society Press, pp 218–223
Platt JC, Cristiani N, Shawe-Taylor J (2000) Large margin DAGs for multiclass classification. In: Advances in neural information processing systems, vol. 12. The MIT Press, Cambridge, pp 547–553
Pontil M, Verri A (1998) Support vector machines for 3d object recognition. IEEE Trans Pattern Anal Mach Intell 20(6): 637–646
Pujol O, Tadeva P, Vitrià J (2006) Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes. IEEE Trans Pattern Anal Mach Intell 28(6): 1007–1012
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1): 81–106
Ratsch G, Smola AJ, Mika S (2003) Adapting codes and embeddings for polychotomies. In: Advances in neural information processing systems, vol. 15. MIT Press, New York, pp 513–520
Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5: 1533–7928
Savicky P, Furnkranz J (2003) Combining pairwise classifiers with stacking. In: Berthold MR, Lenz H-J, Bradley E, Kruse R, Borgelt C (eds) Advances in intelligent data analysis V, 5th international symposium on intelligent data analysis. IDA 2003, pp 219–229
Schwenker F (2000) Hierarquical support vector machines for multi-class pattern recognition. In: Proceedings of the 4th international conference on knowledge-based intelligent systems and allied technologies. IEEE Computer Society Press, pp 561–565
Schwenker F, Palm G (2001) Tree-structured support vector machines for multiclass pattern recognition. In: Kittler J, Roli F (eds) Proceedings of the international workshop on multiple classifier systems, vol. 2096 of lecture notes in computer science. Springer, New York, pp 409–417
Shen L, Tan EC Seeking better output-codes with genetic algorithm for multiclass cancer classification (Submitted to Bioinformatics)
Takahashi F, Abe S (2002) Decision-tree-based multiclass support vector machines. In: Proceedings of the 9th international conference on neural information processing, vol. 3, pp 1418–1422
Takahashi F, Abe S (2003) Optimizing directed acyclic graph support vector machines. In: Proceedings of artificial neural networks in pattern recognition, pp 166–170
Vural V, Dy JG (2004) A hierarchical method for multi-class support vector machines. In: Proceedings of the 21st international conference on machine learning. Banff, pp 831–838
Zadrozny B (2001) Reducing multiclass to binary by coupling probability estimates. In: Advances in neural information processing systems, vol. 14
Zhang G, Jun W (2006) Automatic construction algorithm for multi-class support vector machines with binary tree architecture. Int J Comput Sci Netw Secur 6(2A): 119–126
Zhigang L, Wenzhong S, Qianqing Q, Xiaowen L, Donghui X (2005) Hierarchical support vector machines, In: Proceedings of the IEEE international geoscience and remote sensing symposium. IEEE Computer Society Press, 4 pp
Weston J, Watkins V (1998) Multi-class support vector machines. Technical Report CSD-TR-98-04. Department of Computer Science. University of London, London
Windeatt T, Ghaderi R (2003) Coding and decoding strategies for multi-class learning problems. Inform Fusion 4(1): 11–21
Wu T-F, Lin C-J, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5: 975–1005
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lorena, A.C., de Carvalho, A.C.P.L.F. & Gama, J.M.P. A review on the combination of binary classifiers in multiclass problems. Artif Intell Rev 30, 19 (2008). https://doi.org/10.1007/s10462-009-9114-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10462-009-9114-9