Abstract
We consider the problem of training discriminative structured output predictors, such as conditional random fields (CRFs) and structured support vector machines (SSVMs). A generalized loss function is introduced, which jointly maximizes the entropy and the margin of the solution. The CRF and SSVM emerge as special cases of our framework. The probabilistic interpretation of large margin methods reveals insights about margin and slack rescaling. Furthermore, we derive the corresponding extensions for latent variable models, in which training operates on partially observed outputs. Experimental results for multiclass, linear-chain models and multiple instance learning demonstrate that the generalized loss can improve accuracy of the resulting classifiers.
Chapter PDF
References
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML (2001)
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML, p. 104 (2004)
Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: NIPS (2003)
Bakir, G., Hofmann, T., Schölkopf, B., Smola, A., Taskar, B., Vishwanathan, S.V.N.: Predicting Structured Data. MIT Press, Cambridge (2007)
Wainwright, M., Jordan, M.: Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning (2008)
Zhang, T., Oles, F.J.: Text categorization based on regularized linear classification methods. Information Retrieval 4, 5–31 (2000)
Collins, M., Globerson, A., Koo, T., Carreras, X., Bartlett, P.L.: Exponentiated gradient algorithms for conditional random fields and max-margin Markov networks. J. Mach. Learn. Res. 9, 1775–1822 (2008)
Bartlett, P.L., Tewari, A.: Sparseness vs estimating conditional probabilities: Some asymptotic results. J. Mach. Learn. Res. 8, 775–790 (2007)
Quattoni, A., Wang, S., Morency, L., Collins, M., Darrell, T.: Hidden-state conditional random fields. PAMI 29(10), 1848–1852 (2007)
Yu, C., Joachims, T.: Learning structural SVMs with latent variables. In: ICML, pp. 1169–1176 (2009)
Canu, S., Smola, A.J.: Kernel methods and the exponential family. Neurocomputing 69(7-9), 714–720 (2006)
Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: AISTATS, pp. 57–64 (2005)
Zhang, T.: Class-size independent generalization analysis of some discriminative multi-category classification. In: NIPS, Cambridge, MA (2005)
Shi, Q., Reid, M., Caetano, T.: Hybrid model of conditional random field and support vector machine. In: Workshop at NIPS (2009)
Gimpel, K., Smith, N.: Softmax-margin crfs: Training log-linear models with cost functions. In: HLT, pp. 733–736 (2010)
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2 (2001)
Mooij, J.: libDAI: A free/open source C++ library for Discrete Approximate Inference (2009)
Ray, S., Craven, M.: Supervised versus multiple instance learning: An empirical comparison. In: ICML (2005)
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: NIPS, pp. 561–568 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pletscher, P., Ong, C.S., Buhmann, J.M. (2010). Entropy and Margin Maximization for Structured Output Learning. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15939-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-15939-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15938-1
Online ISBN: 978-3-642-15939-8
eBook Packages: Computer ScienceComputer Science (R0)