Data Mining and Knowledge Discovery

, Volume 24, Issue 1, pp 40–77 | Cite as

Efficient prediction algorithms for binary decomposition techniques

  • Sang-Hyeun ParkEmail author
  • Johannes Fürnkranz


Binary decomposition methods transform multiclass learning problems into a series of two-class learning problems that can be solved with simpler learning algorithms. As the number of such binary learning problems often grows super-linearly with the number of classes, we need efficient methods for computing the predictions. In this article, we discuss an efficient algorithm that queries only a dynamically determined subset of the trained classifiers, but still predicts the same classes that would have been predicted if all classifiers had been queried. The algorithm is first derived for the simple case of pairwise classification, and then generalized to arbitrary pairwise decompositions of the learning problem in the form of ternary error-correcting output codes under a variety of different code designs and decoding strategies.


Binary decomposition Pairwise classification Ternary ECOC Multiclass classification Aggregation Efficient decoding Efficient voting 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Allwein EL, Schapire RE, Singer Y (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1: 113–141MathSciNetGoogle Scholar
  2. Bose RC, Ray-Chaudhuri DK (1960) On a class of error correcting binary group codes. Inform Control 3(1): 68–79CrossRefzbMATHMathSciNetGoogle Scholar
  3. Brenner SE, Koehl P, Levitt M (2000) The astral compendium for protein structure and sequence analysis. Nucleic Acids Res 28(1): 254–256CrossRefGoogle Scholar
  4. Cardoso JS, da Costa JFP (2007) Learning to classify ordinal data: the data replication method. J Mach Learn Res 8: 1393–1429zbMATHMathSciNetGoogle Scholar
  5. Crammer K, Singer Y (2002) On the learnability and design of output codes for multiclass problems. Mach Learn 47(2–3): 201–233CrossRefzbMATHGoogle Scholar
  6. Cutzu F (2003a) How to do multi-way classification with two-way classifiers. In: Kaynak O, Alpaydin E, Oja E, Xu L (eds) Artificial neural networks and neural information processing—ICANN/ICONIP 2003, joint international conference ICANN/ICONIP 2003, Istanbul. Lecture notes in computer science, vol 2714. Springer, Heidelberg, pp 375–384Google Scholar
  7. Cutzu F (2003b) Polychotomous classification with pairwise classifiers: a new voting principle. In: Windeatt T, Roli F (eds) Multiple classifier systems, 4th international workshop (MCS 2003), Guilford. Lecture notes in computer science, vol 2709. Springer, Heidelberg, pp 115–124Google Scholar
  8. Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2: 263–286zbMATHGoogle Scholar
  9. Donoho DL (2006) Compressed sensing. IEEE Trans Inform Theory 52(4): 1289–1306CrossRefMathSciNetGoogle Scholar
  10. Escalera S, Pujol O, Radeva P (2006) Decoding of ternary error correcting output codes. In: Trinidad JFM, Carrasco-Ochoa JA, Kittler J (eds) Proceedings of the 11th Iberoamerican congress in pattern recognition (CIARP-06). Springer, Heidelberg, pp 753–763Google Scholar
  11. Escalera S, Pujol O, Radeva P (2010) On the decoding process in ternary error-correcting output codes. IEEE Trans Pattern Anal Mach Intell 32(1): 120–134CrossRefGoogle Scholar
  12. Frank A, Asuncion A (2010) UCI machine learning repository. University of California, IrvineGoogle Scholar
  13. Fürnkranz J (2002) Round robin classification. J Mach Learn Res 2:721–747zbMATHMathSciNetGoogle Scholar
  14. Fürnkranz J (2003) Round robin ensembles. Intell Data Anal 7(5):385–403Google Scholar
  15. Gallager RG (1968) Information theory and reliable communication. Wiley, New YorkzbMATHGoogle Scholar
  16. Ghani R (2001) Using error-correcting codes for efficient text classification with a large number of categories. Master’s thesis, Center for Automated Learning and Discovery, Carnegie Mellon UniversityGoogle Scholar
  17. Hastie T, Tibshirani R (1997) Classification by pairwise coupling. In: Jordan MI, Kearns MJ, Solla SA (eds) Advances in neural information processing systems 10 (NIPS 1997). MIT, CambridgeGoogle Scholar
  18. Hsu CW, Lin CJ (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Netw 13(2): 415–425CrossRefGoogle Scholar
  19. Hsu D, Kakade S, Langford J, Zhang T (2009) Multi-label prediction via compressed sensing. In: Bengio Y, Schuurmans D, Lafferty J, Williams CKI, Culotta A (eds) Advances in neural information processing systems 22, pp 772–780Google Scholar
  20. Hüllermeier E, Fürnkranz J (2004a) Comparison of ranking procedures in pairwise preference learning. In: Proceedings of the 10th international conference on information processing and management of uncertainty in knowledge-based systems (IPMU-04), PerugiaGoogle Scholar
  21. Hüllermeier E, Fürnkranz J (2004b) Ranking by pairwise comparison: a note on risk minimization. In: Proceedings of the IEEE iInternational conference on fuzzy systems (FUZZ-IEEE-04), BudapestGoogle Scholar
  22. Hüllermeier E, Fürnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artif Intell 172(16–17):1897–1916CrossRefzbMATHGoogle Scholar
  23. Kong EB, Dietterich TG (1995) Error-correcting output coding corrects bias and variance. In: Proceedings of the twelfth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 313–321Google Scholar
  24. Lewis DD, Yang Y, Rose TG, Li F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5: 361–397Google Scholar
  25. Lorena AC, de Carvalho ACPLF, Gama J (2008) A review on the combination of binary classifiers in multiclass problems. Artif Intell Rev 30(1–4): 19–37CrossRefGoogle Scholar
  26. MacWilliams FJ, Sloane NJA (1983) The theory of error-correcting codes. North-Holland Mathematical Library, North HollandGoogle Scholar
  27. Melvin I, Ie E, Weston J, Noble WS, Leslie C (2007) Multi-class protein classification using adaptive codes. J Mach Learn Res 8: 1557–1581zbMATHMathSciNetGoogle Scholar
  28. Mencía EL, Park SH, Fürnkranz J (2010) Efficient voting prediction for pairwise multilabel classification. Neurocomputing 73(7–9): 1164–1176CrossRefGoogle Scholar
  29. Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) Scop: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247: 536–540Google Scholar
  30. Park SH, Fürnkranz J (2007a) Efficient pairwise classification. In: Kok JN, Koronacki J, Lopez de Mantaras R, Matwin S, Mladenič D, Skowron A (eds) Proceedings of 18th European conference on machine learning (ECML-07), Warsaw. Springer-Verlag, Berlin, pp 658–665Google Scholar
  31. Park SH, Fürnkranz J (2007b) Efficient pairwise classification and ranking. Technical Report TUD-KE-2007-03, Knowledge Engineering Group. TU DarmstadtGoogle Scholar
  32. Park SH, Fürnkranz J (2009) Efficient decoding of ternary error-correcting output codes for multiclass classification. In: Buntine WL, Grobelnik M, Mladenič D, Shawe-Taylor J (eds) Proceedings of 20th European conference on machine learning (ECML-09), Bled. Springer-Verlag, Berlin, pp 189–204Google Scholar
  33. Pimenta E, Gama J, de Leon Ferreira, de Carvalho ACP (2008) The dimension of ECOCs for multiclass classification problems. Int J Artif Intell Tools 17(3): 433–447CrossRefGoogle Scholar
  34. Platt JC, Cristianini N, Shawe-Taylor J (1999) Large margin DAGs for multiclass classification. In: Solla SA, Leen TK, Müller KR (eds) Advances in neural information processing systems 12 (NIPS 1999). The MIT Press, Denver, pp 547–553Google Scholar
  35. Pujol O, Radeva P, Vitrià J (2006) Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes. IEEE Trans Pattern Anal Mach Intell 28(6):10071-1012Google Scholar
  36. Rifkin RM, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5: 101–141zbMATHMathSciNetGoogle Scholar
  37. Smith RS, Windeatt T (2005) Decoding rules for error correcting output code ensembles. In: Oza NC, Polikar R, Kittler J, Roli F (eds) Proceedings of the 6th international workshop on multiple classifier systems (MCS-05), Seaside. Springer, New York, pp 53–63Google Scholar
  38. Windeatt T, Ghaderi R (2003) Coding and decoding strategies for multi-class learning problems. Inform Fusion 4(1): 11–21CrossRefGoogle Scholar
  39. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San FranciscozbMATHGoogle Scholar
  40. Wu T-F, Lin CJ, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5: 975–1005zbMATHMathSciNetGoogle Scholar
  41. Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Fisher DH (ed) Proceedings of the fourteenth international conference on machine learning (ICML 1997), Nashville. Morgan Kaufmann, San Francisco, pp 412–420Google Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.Knowledge Engineering Group, Department of Computer ScienceTU DarmstadtDarmstadtGermany

Personalised recommendations