Abstract

This abstract accompanying a presentation at S+SSPR 2006 explores the use of Support Vector Machines (SVMs) for predicting structured objects like trees, equivalence relations, or alignments. It is shown that SVMs can be extended to these problems in a well-founded way, still leading to a convex quadratic training problem and maintaining the ability to use kernels. While the training problem has exponential size, there is a simple algorithm that allows training in polynomial time. The algorithm is implemented in the SVM-Struct software, and it is discussed how the approach can be applied to problems ranging from natural language parsing to supervised clustering.

References

  1. 1.
    Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden Markov support vector machines. In: International Conference on Machine Learning (ICML) (2003)Google Scholar
  2. 2.
    Anguelov, D., Taskar, B., Chatalbashev, V., Koller, D., Gupta, D., Heitz, G.: Discriminative learning of markov random fields for segmentation of 3d scan data. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2005)Google Scholar
  3. 3.
    Collins, M.: Discriminative reranking for natural language parsing. In: International Conference on Machine Learning (ICML) (2000)Google Scholar
  4. 4.
    Collins, M.: Discriminative training methods for Hidden Markov Models: Theory and experiments with perceptron algorithms. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2002)Google Scholar
  5. 5.
    Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In: Conference of the Association for Computational Linguistics (ACL) (2002)Google Scholar
  6. 6.
    Cortes, C., Vapnik, V.N.: Support–vector networks. Machine Learning Journal 20, 273–297 (1995)MATHGoogle Scholar
  7. 7.
    Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research (JMLR) 2, 265–292 (2001)CrossRefGoogle Scholar
  8. 8.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge (1998)MATHCrossRefGoogle Scholar
  9. 9.
    Finley, T., Joachims, T.: Supervised clustering with support vector machines. In: International Conference on Machine Learning (ICML) (2005)Google Scholar
  10. 10.
    Joachims, T.: Learning to align sequences: A maximum-margin approach (online manuscript, August 2003)Google Scholar
  11. 11.
    Joachims, T.: A support vector method for multivariate performance measures. In: International Conference on Machine Learning (ICML) (2005)Google Scholar
  12. 12.
    Joachims, T., Galor, T., Elber, R.: Learning to align sequences: A maximum-margin approach. In: Leimkuhler, B. (ed.) New Algorithms for Macromolecular Simulation. LNCS, vol. 49, pp. 57–68. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001)Google Scholar
  14. 14.
    Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)MATHGoogle Scholar
  15. 15.
    Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Annual Meeting of the Assoc. for Comp. Linguistics (ACL) (2002)Google Scholar
  16. 16.
    Taskar, B., Guestrin, C., Koller, D.: Maximum-margin markov networks. In: Neural Information Processing Systems (NIPS) (2003)Google Scholar
  17. 17.
    Taskar, B., Klein, D., Collins, M., Koller, D., Manning, C.: Max-margin parsing. In: Empirical Methods in Natural Language Processing (EMNLP) (2004)Google Scholar
  18. 18.
    Taskar, B., Lacoste-Julien, S., Klein, D.: A discriminative matching approach to word alignment. In: Empirical Methods in Natural Language Processing (EMNLP) (2005)Google Scholar
  19. 19.
    Taskar, B.: Learning Structured Prediction Models: A Large Margin Approach. PhD thesis, Stanford University (2004)Google Scholar
  20. 20.
    Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: International Conference on Machine Learning (ICML) (2004)Google Scholar
  21. 21.
    Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (JMLR) 6, 1453–1484 (2005)MathSciNetGoogle Scholar
  22. 22.
    Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)MATHGoogle Scholar
  23. 23.
    Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., Vapnik, V.: Kernel dependency estimation. In: Advances in Neural Information Processing Systems 15. MIT Press, Cambridge (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Thorsten Joachims
    • 1
  1. 1.Cornell UniversityIthacaUSA

Personalised recommendations