Accelerated Training of Max-Margin Markov Networks with Kernels

  • Xinhua Zhang
  • Ankan Saha
  • S. V. N. Vishwanathan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6925)


Structured output prediction is an important machine learning problem both in theory and practice, and the max-margin Markov network (M 3 N) is an effective approach. All state-of-the-art algorithms for optimizing M 3 N objectives take at least O(1/ε) number of iterations to find an ε accurate solution. [1] broke this barrier by proposing an excessive gap reduction technique (EGR) which converges in \(O(1/\sqrt{\epsilon})\) iterations. However, it is restricted to Euclidean projections which consequently requires an intractable amount of computation for each iteration when applied to solve M 3 N. In this paper, we show that by extending EGR to Bregman projection, this faster rate of convergence can be retained, and more importantly, the updates can be performed efficiently by exploiting graphical model factorization. Further, we design a kernelized procedure which allows all computations per iteration to be performed at the same cost as the state-of-the-art approaches.


Convergence Rate Graphical Model Bundle Method Machine Learn Research Word Alignment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Nesterov, Y.: Excessive gap technique in nonsmooth convex minimization. SIAM Journal on Optimization 16(1) (2005)Google Scholar
  2. 2.
    Bakir, G., Hofmann, T., Schölkopf, B., Smola, A., Taskar, B., Vishwanathan, S.V.N.: Predicting Structured Data. MIT Press, Cambridge (2007)Google Scholar
  3. 3.
    Lafferty, J.D., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic modeling for segmenting and labeling sequence data. In: Proceedings of International Conference on Machine Learning (2001)Google Scholar
  4. 4.
    Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: Advances in Neural Information Processing Systems, vol. 16 (2004)Google Scholar
  5. 5.
    Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL (2003)Google Scholar
  6. 6.
    Teo, C., Vishwanthan, S.V.N., Smola, A., Le, Q.: Bundle methods for regularized risk minimization. Journal of Machine Learning Research 11, 311–365 (2010)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Collins, M., Globerson, A., Koo, T., Carreras, X., Bartlett, P.: Exponentiated gradient algorithms for conditional random fields and max-margin Markov networks. Journal of Machine Learning Research 9, 1775–1822 (2008)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  9. 9.
    Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research 6, 1453–1484 (2005)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Taskar, B., Lacoste-Julien, S., Jordan, M.: Structured prediction, dual extragradient and bregman projections. Journal of Machine Learning Research 7, 1627–1653 (2006)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Taskar, B.: Learning Structured Prediction Models: A Large Margin Approach. PhD thesis, Stanford University (2004)Google Scholar
  12. 12.
    List, N., Simon, H.U.: Svm-optimization and steepest-descent line search. In: Proceedings of the Annual Conference on Computational Learning Theory (2009)Google Scholar
  13. 13.
    Nemirovski, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. John Wiley and Sons, Chichester (1983)Google Scholar
  14. 14.
    Borwein, J.M., Lewis, A.S.: Convex Analysis and Nonlinear Optimization: Theory and Examples. Canadian Mathematical Society (2000)Google Scholar
  15. 15.
    Beck, A., Teboulle, B.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters 31(3), 167–175 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Lauritzen, S.L.: Graphical Models. Oxford University Press, Oxford (1996)zbMATHGoogle Scholar
  17. 17.
    Wainwright, M., Jordan, M.: Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1(1-2), 1–305 (2008)zbMATHGoogle Scholar
  18. 18.
    Andrieu, C., de Freitas, N., Doucet, A., Jordan, M.I.: An introduction to MCMC for machine learning. Machine Learning 50, 5–43 (2003)CrossRefzbMATHGoogle Scholar
  19. 19.
    Kschischang, F., Frey, B.J., Loeliger, H.-A.: Factor graphs and the sum-product algorithm. IEEE Trans. on Information Theory 47(2), 498–519 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Taskar, B., Lacoste-Julien, S., Klein, D.: A discriminative matching approach to word alignment. In: Empirical Methods in Natural Language Processing (2005)Google Scholar
  21. 21.
    Taskar, B., Klein, D., Collins, M., Koller, D., Manning, C.: Max-margin parsing. In: Empirical Methods in Natural Language Processing (2004)Google Scholar
  22. 22.
    Altun, Y., Hofmann, T., Tsochandiridis, I.: Support vector machine learning for interdependent and structured output spaces. In: Bakir, G., Hofmann, T., Schölkopf, B., Smola, A., Taskar, B., Vishwanathan, S.V.N. (eds.) Predicting Structured Data, ch.5, pp. 85–103. MIT Press, Cambridge (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Xinhua Zhang
    • 1
  • Ankan Saha
    • 2
  • S. V. N. Vishwanathan
    • 3
  1. 1.Department of Computing ScienceUniversity of AlbertaEdmontonCanada
  2. 2.Department of Computer ScienceUniversity of ChicagoChicagoUSA
  3. 3.Department of Statistics and Computer SciencePurdue UniversityUSA

Personalised recommendations