Entropy and Margin Maximization for Structured Output Learning

  • Patrick Pletscher
  • Cheng Soon Ong
  • Joachim M. Buhmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6323)


We consider the problem of training discriminative structured output predictors, such as conditional random fields (CRFs) and structured support vector machines (SSVMs). A generalized loss function is introduced, which jointly maximizes the entropy and the margin of the solution. The CRF and SSVM emerge as special cases of our framework. The probabilistic interpretation of large margin methods reveals insights about margin and slack rescaling. Furthermore, we derive the corresponding extensions for latent variable models, in which training operates on partially observed outputs. Experimental results for multiclass, linear-chain models and multiple instance learning demonstrate that the generalized loss can improve accuracy of the resulting classifiers.


Inverse Temperature Hide Variable Test Error Margin Maximization Multiple Instance Learning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML (2001)Google Scholar
  2. 2.
    Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML, p. 104 (2004)Google Scholar
  3. 3.
    Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: NIPS (2003)Google Scholar
  4. 4.
    Bakir, G., Hofmann, T., Schölkopf, B., Smola, A., Taskar, B., Vishwanathan, S.V.N.: Predicting Structured Data. MIT Press, Cambridge (2007)Google Scholar
  5. 5.
    Wainwright, M., Jordan, M.: Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning (2008)Google Scholar
  6. 6.
    Zhang, T., Oles, F.J.: Text categorization based on regularized linear classification methods. Information Retrieval 4, 5–31 (2000)CrossRefGoogle Scholar
  7. 7.
    Collins, M., Globerson, A., Koo, T., Carreras, X., Bartlett, P.L.: Exponentiated gradient algorithms for conditional random fields and max-margin Markov networks. J. Mach. Learn. Res. 9, 1775–1822 (2008)MathSciNetGoogle Scholar
  8. 8.
    Bartlett, P.L., Tewari, A.: Sparseness vs estimating conditional probabilities: Some asymptotic results. J. Mach. Learn. Res. 8, 775–790 (2007)MathSciNetGoogle Scholar
  9. 9.
    Quattoni, A., Wang, S., Morency, L., Collins, M., Darrell, T.: Hidden-state conditional random fields. PAMI 29(10), 1848–1852 (2007)Google Scholar
  10. 10.
    Yu, C., Joachims, T.: Learning structural SVMs with latent variables. In: ICML, pp. 1169–1176 (2009)Google Scholar
  11. 11.
    Canu, S., Smola, A.J.: Kernel methods and the exponential family. Neurocomputing 69(7-9), 714–720 (2006)CrossRefGoogle Scholar
  12. 12.
    Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: AISTATS, pp. 57–64 (2005)Google Scholar
  13. 13.
    Zhang, T.: Class-size independent generalization analysis of some discriminative multi-category classification. In: NIPS, Cambridge, MA (2005)Google Scholar
  14. 14.
    Shi, Q., Reid, M., Caetano, T.: Hybrid model of conditional random field and support vector machine. In: Workshop at NIPS (2009)Google Scholar
  15. 15.
    Gimpel, K., Smith, N.: Softmax-margin crfs: Training log-linear models with cost functions. In: HLT, pp. 733–736 (2010)Google Scholar
  16. 16.
    Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2 (2001)Google Scholar
  17. 17.
    Mooij, J.: libDAI: A free/open source C++ library for Discrete Approximate Inference (2009)Google Scholar
  18. 18.
    Ray, S., Craven, M.: Supervised versus multiple instance learning: An empirical comparison. In: ICML (2005)Google Scholar
  19. 19.
    Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: NIPS, pp. 561–568 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Patrick Pletscher
    • 1
  • Cheng Soon Ong
    • 1
  • Joachim M. Buhmann
    • 1
  1. 1.Department of Computer ScienceETH ZürichSwitzerland

Personalised recommendations