Fast Support Vector Machines for Structural Kernels

  • Aliaksei Severyn
  • Alessandro Moschitti
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6913)

Abstract

In this paper, we propose three important enhancements of the approximate cutting plane algorithm (CPA) to train Support Vector Machines with structural kernels: (i) we exploit a compact yet exact representation of cutting plane models using directed acyclic graphs to speed up both training and classification, (ii) we provide a parallel implementation, which makes the training scale almost linearly with the number of CPUs, and (iii) we propose an alternative sampling strategy to handle class-imbalanced problem and show that theoretical convergence bounds are preserved. The experimental evaluations on three diverse datasets demonstrate the soundness of our approach and the possibility to carry out fast learning and classification with structural kernels.

Keywords

Directed Acyclic Graph Importance Weight Minority Class Machine Learn Research Rejection Sampling 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Aiolli, F., Martino, G.D.S., Sperduti, A., Moschitti, A.: Efficient kernel-based learning for trees. In: CIDM, pp. 308–315 (2007)Google Scholar
  2. 2.
    Cancedda, N., Gaussier, E., Goutte, C., Renders, J.M.: Word sequence kernels. Journal of Machine Learning Research 3, 1059–1082 (2003)MATHMathSciNetGoogle Scholar
  3. 3.
    Charniak, E.: A maximum-entropy-inspired parser. In: ANLP, pp. 132–139 (2000)Google Scholar
  4. 4.
    Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In: ACL, pp. 263–270 (2002)Google Scholar
  5. 5.
    Cumby, C., Roth, D.: Kernel Methods for Relational Learning. In: Proceedings of ICML 2003 (2003)Google Scholar
  6. 6.
    Daumé III, H., Marcu, D.: A tree-position kernel for document compression. In: Proceedings of the DUC, Boston, MA (May 6-7, 2004)Google Scholar
  7. 7.
    Fan, R., Chen, P., Lin, C.: Working set selection using the second order information for training svm. Journal of Machine Learning Research 6, 1889–1918 (2005)MATHGoogle Scholar
  8. 8.
    Franc, V., Sonnenburg, S.: Optimized cutting plane algorithm for support vector machines. In: ICML, pp. 320–327 (2008)Google Scholar
  9. 9.
    Joachims, T.: Making large-scale SVM learning practical. In: Advances in Kernel Methods - Support Vector Learning, ch. 11, pp. 169–184. MIT Press, Cambridge (1999)Google Scholar
  10. 10.
    Joachims, T.: Training linear SVMs in linear time. In: KDD (2006)Google Scholar
  11. 11.
    Joachims, T., Yu, C.N.J.: Sparse kernel svms via cutting-plane training. Machine Learning 76(2-3), 179–193 (2009), eCMLCrossRefGoogle Scholar
  12. 12.
    Kate, R.J., Mooney, R.J.: Using string-kernels for learning semantic parsers. In: ACL (July 2006)Google Scholar
  13. 13.
    Kudo, T., Matsumoto, Y.: Fast methods for kernel-based text analysis. In: Proceedings of ACL 2003 (2003)Google Scholar
  14. 14.
    Moschitti, A.: Making tree kernels practical for natural language learning. In: EACL. The Association for Computer Linguistics (2006)Google Scholar
  15. 15.
    Palmer, M., Kingsbury, P., Gildea, D.: The proposition bank: An annotated corpus of semantic roles. Computational Linguistics 31(1), 71–106 (2005)CrossRefGoogle Scholar
  16. 16.
    Severyn, A., Moschitti, A.: Large-scale support vector learning with structural kernels. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS, vol. 6323, pp. 229–244. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  17. 17.
    Shen, L., Sarkar, A., Joshi, A.k.: Using LTAG Based Features in Parse Reranking. In: Proceedings of EMNLP 2006 (2003)Google Scholar
  18. 18.
    Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A.J., Vishwanathan, S.V.N.: Hash kernels for structured data. JMLR 10, 2615–2637 (2009)MATHMathSciNetGoogle Scholar
  19. 19.
    Steinwart, I.: Sparseness of support vector machines. Journal of Machine Learning Research 4, 1071–1105 (2003)MATHMathSciNetGoogle Scholar
  20. 20.
    Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research 6, 1453–1484 (2005)MATHMathSciNetGoogle Scholar
  21. 21.
    Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: Proceedings of the IJCAI, pp. 55–60 (1999)Google Scholar
  22. 22.
    Wu, G., Chang, E.: Class-boundary alignment for imbalanced dataset learning. In: ICML 2003 Workshop on Learning from Imbalanced Data Sets II, Washington, DC, pp. 49–56 (2003)Google Scholar
  23. 23.
    Yu, C.N.J., Joachims, T.: Training structural svms with kernels using sampled cuts. In: KDD, pp. 794–802 (2008)Google Scholar
  24. 24.
    Zadrozny, B., Langford, J., Abe, N.: Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of ICDM (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Aliaksei Severyn
    • 1
  • Alessandro Moschitti
    • 1
  1. 1.Department of Computer Science and EngineeringUniversity of TrentoPOVOItaly

Personalised recommendations