Constraint Classification: A New Approach to Multiclass Classification

  • Sariel Har-Peled
  • Dan Roth
  • Dav Zimak
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2533)


In this paper, we present a newviewof multiclass classification and introduce the constraint classification problem, a generalization that captures many flavors of multiclass classification. We provide the first optimal, distribution independent bounds for many multiclass learning algorithms, including winner-take-all (WTA). Based on our view, we present a learning algorithm that learns via a single linear classifier in high dimension. In addition to the distribution independent bounds, we provide a simple margin-based analysis improving generalization bounds for linear multiclass support vector machines.


Partial Order Growth Function Output Space Multiclass Classification Hypothesis Class 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [AB99]
    M. Anthony and P. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, England, 1999.zbMATHGoogle Scholar
  2. [ADW94]
    Chidanand Apte, Fred Damerau, and Sholom M. Weiss. Automated learning of decision rules for text categorization. Information Systems, 12(3):233–251, 1994.Google Scholar
  3. [ASS00]
    E. Allwein, R.E. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. In Proc. 17th International Conf. on Machine Learning, pages 9–16. Morgan Kaufmann, San Francisco, CA, 2000.Google Scholar
  4. [BCHL95]
    S. Ben-David, N. Cesa-Bianchi, D. Haussler, and P. Long. Characterizations of learnability for classes of 0,..., n-valued functions. J. Comput. Sys. Sci., 50(1):74–86, 1995.zbMATHCrossRefMathSciNetGoogle Scholar
  5. [Bri94]
    E. Brill. Some advances in transformation-based part of speech tagging. In AAAI, Vol. 1, pages 722–727, 1994.Google Scholar
  6. [CCRR99]
    A. Carlson, C. Cumby, J. Rosen, and D. Roth. The SNoW learning architecture. Technical Report UIUCDCS-R-99-2101, UIUC Computer Science Department, May 1999.Google Scholar
  7. [CS00]
    K. Crammer and Y. Singer. On the learnability and design of output codes for multiclass problems. In Computational Learing Theory, pages 35–46, 2000.Google Scholar
  8. [CS01a]
    K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. J. Machine Learning Research, 2 (December):265–292, 2001.CrossRefGoogle Scholar
  9. [CS01b]
    K. Crammer and Y. Singer. Ultraconservative online algorithms for multiclass problems. In COLT/EuroCOLT, pages 99–115, 2001.Google Scholar
  10. [CST00]
    Nello Cristianini and John Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000.Google Scholar
  11. [DKR97]
    I. Dagan, Y. Karov, and D. Roth. Mistake-driven learning in text categorization. In EMNLP-97, The Second Conference on Empirical Methods in Natural Language Processing, pages 55–63, 1997.Google Scholar
  12. [HT98]
    T. Hastie and R. Tibshirani. Classification by pairwise coupling. In NIPS-10, The 1997 Conference on Advances in Neural Information Processing Systems, pages 507–513. MIT Press, 1998.Google Scholar
  13. [Jel98]
    F. Jelinek. Statistical Methods for Speech Recognition. The MIT Press, Cambridge, Massachusetts, 1998.Google Scholar
  14. [Koh01]
    T. Kohonen. Sel-Organizing Maps. Springer Verlag, NewYork, 3rd edition, 2001.Google Scholar
  15. [LB+89]
    Y. Le Cun, B. Boser, J. Denker, D. Hendersen, R. Howard, W. Hubbard, and L. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1:pp 541, 1989.CrossRefGoogle Scholar
  16. [LS97]
    D. Lee and H. Seung. Unsupervised learning by convex and conic coding. In Michael C. Mozer, Michael I. Jordan, and Thomas Petsche, editors, Advances in Neural Information Processing Systems, volume 9, page 515. The MIT Press, 1997.Google Scholar
  17. [Maa00]
    W. Maass. On the computational power of winner-take-all. Neural Computation, 12(11):2519–2536, 2000.CrossRefMathSciNetGoogle Scholar
  18. [Rot98]
    D. Roth. Learning to resolve natural language ambiguities: A unified approach. In Proc. of AAAI, pages 806–813, 1998.Google Scholar
  19. [RZ98]
    D. Roth and D. Zelenko. Part of speech tagging using a network of linear separators. In COLING-ACL 98, The 17th International Conference on Computational Linguistics, pages 1136–1142, 1998.Google Scholar
  20. [Sch97]
    R.E. Schapire. Using output codes to boost multiclass learning problems. In Proc. 14th Internat. Conf. on Machine Learning, pages 313–321. Morgan Kaufmann, 1997.Google Scholar
  21. [Vap98]
    V. Vapnik. Statistical Learning Theory. Wiley, 605 Third Avenue, New York, New York, 10158–10212, 1998.Google Scholar
  22. [WW99]
    J. Weston and C. Watkins. Support vector machines for multiclass pattern recognition. In Proceedings of the Seventh European Symposium On Artificial Neural Networks, 4 1999.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Sariel Har-Peled
    • 1
  • Dan Roth
    • 1
  • Dav Zimak
    • 1
  1. 1.Department of Computer ScienceUniversity of IllinoisUrbana

Personalised recommendations