Advertisement

Machine Learning

, Volume 47, Issue 2–3, pp 201–233 | Cite as

On the Learnability and Design of Output Codes for Multiclass Problems

  • Koby Crammer
  • Yoram Singer
Article

Abstract

Output coding is a general framework for solving multiclass categorization problems. Previous research on output codes has focused on building multiclass machines given predefined output codes. In this paper we discuss for the first time the problem of designing output codes for multiclass problems. For the design problem of discrete codes, which have been used extensively in previous works, we present mostly negative results. We then introduce the notion of continuous codes and cast the design problem of continuous codes as a constrained optimization problem. We describe three optimization problems corresponding to three different norms of the code matrix. Interestingly, for the l2 norm our formalism results in a quadratic program whose dual does not depend on the length of the code. A special case of our formalism provides a multiclass scheme for building support vector machines which can be solved efficiently. We give a time and space efficient algorithm for solving the quadratic program. We describe preliminary experiments with synthetic data show that our algorithm is often two orders of magnitude faster than standard quadratic programming packages. We conclude with the generalization properties of the algorithm.

multiclass categorization output coding SVM 

References

  1. Aha, D. W., & Bankert, R. L. (1997). Cloud classification using error-correcting output codes. Artificial Intelligence Applications: Natural Science, Agriculture, and Environmental Science, 11, 13–28.Google Scholar
  2. Allwein, E., Schapire, R., & Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. Machine Learning: Proceedings of the Seventeenth International Conference.Google Scholar
  3. Berger, A. (1999). Error-correcting output coding for text classification. In IJCAI'99: Workshop on Machine Learning for Information Filtering.Google Scholar
  4. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont, CA: Wadsworth & Brooks.Google Scholar
  5. Chvatal, V. (1980). Linear Programming. New York: Freeman.Google Scholar
  6. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20:3, 273–297.Google Scholar
  7. Dietterich, G. B. T. G. (1999). Achieving high-accuracy text-to-speech with machine learning. In Data mining in speech synthesis.Google Scholar
  8. Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.Google Scholar
  9. Dietterich, T., & Kong, E. B. (1995). Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Technical report, Oregon State University. Available via the WWW at http://www.cs.orst.edu:80/~tgd/cv/tr.html.Google Scholar
  10. Fletcher, R. (1987). Practical methods of optimization 2nd edn. New York: John Wiley.Google Scholar
  11. Hastie, T., & Tibshirani, R. (1998). Classification by pairwise coupling. The Annals of Statistics, 26:1, 451–471.Google Scholar
  12. Höffgen, K. U., Horn, K. S. V., & Simon, H. U. (1995). Robust trainability of single neurons. Journal of Computer and System Sciences, 50:1, 114–125.Google Scholar
  13. James, G., & Hastie, T. (1998). The error coding method and PiCT. Journal of Computational and Graphical Stastistics, 7:3, 377–387.Google Scholar
  14. Kong, E. B., & Dietterich, T. G. (1995). Error-correcting output coding corrects bias and variance. In Proceedings of the Twelfth International Conference on Machine Learning (pp. 313-321).Google Scholar
  15. Platt, J. (1998). Fast training of Support Vector Machines using sequential minimal optimization. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in Kernel methods-support vector learning. Cambridge, MA: MIT Press.Google Scholar
  16. Platt, J., Cristianini, N., & Shawe-Taylor, J. (2000). Large margin dags for multiclass classification. In Advances in neural information processing systems 12 (pp. 547–553). Cambridge, MA: MIT Press.Google Scholar
  17. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.Google Scholar
  18. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart, & J. L. McClelland (Eds.), Parallel distributed processing-explorations in the microstructure of cognition (ch. 8, pp. 318–362). Cambridge, MA: MIT Press.Google Scholar
  19. Schapire, R. E. (1997). Using output codes to boost multiclass learning problems. In Machine Learning: Proceedings of the Fourteenth International Conference (pp. 313-321).Google Scholar
  20. Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37:3, 1–40.Google Scholar
  21. Vapnik, V. N. (1998). Statistical Learning Theory. New York: Wiley.Google Scholar
  22. Weston, J., & Watkins, C. (1999). Support vector machines for multi-class pattern recognition. In Proceedings of the Seventh European Symposium On Artificial Neural Networks.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Koby Crammer
    • 1
  • Yoram Singer
    • 1
  1. 1.School of Computer Science & EngineeringThe Hebrew UniversityJerusalemIsrael

Personalised recommendations