Learning Bounds for Support Vector Machines with Learned Kernels

  • Nathan Srebro
  • Shai Ben-David
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4005)


Consider the problem of learning a kernel for use in SVM classification. We bound the estimation error of a large margin classifier when the kernel, relative to which this margin is defined, is chosen from a family of kernels based on the training sample. For a kernel family with pseudodimension d φ , we present a bound of \(\sqrt{\tilde{\mathcal{O}}{({d_{\phi}}+1/\gamma^2)}/n}\) on the estimation error for SVMs with margin γ. This is the first bound in which the relation between the margin term and the family-of-kernels term is additive rather then multiplicative. The pseudodimension of families of linear combinations of base kernels is the number of base kernels. Unlike in previous (multiplicative) bounds, there is no non-negativity requirement on the coefficients of the linear combinations. We also give simple bounds on the pseudodimension for families of Gaussian kernels.


Support Vector Machine Kernel Function Gaussian Kernel Convex Combination Large Margin 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lanckriet, G.R., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn Res. 5, 27–72 (2004)Google Scholar
  2. 2.
    Bousquet, O., Herrmann, D.J.L.: On the complexity of learning the kernel matrix. In: Adv. in Neural Information Processing Systems, vol. 15 (2003)Google Scholar
  3. 3.
    Crammer, K., Keshet, J., Singer, Y.: Kernel design using boosting. In: Advances in Neural Information Processing Systems 15 (2003)Google Scholar
  4. 4.
    Lanckriet, G.R.G., De Bie, T., Cristianini, N., Jordan, M.I., Noble, W.S.: A statistical framework for genomic data fusion. Bioinformatics 20 (2004)Google Scholar
  5. 5.
    Sonnenburg, S., Rätsch, G., Schafer, C.: Learning interpretable SVMs for biological sequence classification. In: Research in Computational Molecular Biology (2005)Google Scholar
  6. 6.
    Ben-Hur, A., Noble, W.S.: Kernel methods for predicting protein-protein interactions. Bioinformatics 21 (2005)Google Scholar
  7. 7.
    Cristianini, N., Campbell, C., Shawe-Taylor, J.: Dynamically adapting kernels in support vector machines. In: Adv. in Neural Information Proceedings Systems 11 (1999)Google Scholar
  8. 8.
    Chapelle, O., Vapnik, V., Bousquet, O., Makhuerjee, S.: Choosing multiple parameters for support vector machines. Machine Learning 46, 131–159 (2002)CrossRefMATHGoogle Scholar
  9. 9.
    Keerthi, S.S.: Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms. IEEE Tran. on Neural Networks 13, 1225–1229 (2002)CrossRefGoogle Scholar
  10. 10.
    Glasmachers, T., Igel, C.: Gradient-based adaptation of general gaussian kernels. Neural Comput. 17, 2099–2105 (2005)CrossRefMathSciNetMATHGoogle Scholar
  11. 11.
    Ong, C.S., Smola, A.J., Williamson, R.C.: Learning the kernel with hyperkernels. J. Mach. Learn. Res. 6 (2005)Google Scholar
  12. 12.
    Micchelli, C.A., Pontil, M.: Learning the kernel function via regularization. J. Mach. Learn. Res. 6 (2005)Google Scholar
  13. 13.
    Argyriou, A., Micchelli, C.A., Pontil, M.: Learning convex combinations of continuously parameterized basic kernels. In: 18th Annual Conf. on Learning Theory (2005)Google Scholar
  14. 14.
    Micchelli, C.A., Pontil, M., Wu, Q., Zhou, D.X.: Error bounds for learning the kernel. Research Note RN/05/09, University College London Dept. of Computer Science (2005)Google Scholar
  15. 15.
    Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30 (2002)Google Scholar
  16. 16.
    Smola, A.J., Schölkopf, B.: Learning with Kernels. MIT Press, Cambridge (2002)Google Scholar
  17. 17.
    Anthony, M., Bartlett, P.L.: Neural Networks Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)CrossRefGoogle Scholar
  18. 18.
    Bhatia, R.: Matrix Analysis. Springer, Heidelberg (1997)Google Scholar
  19. 19.
    Warren, H.E.: Lower bounds for approximation by nonlinear manifolds. T. Am. Math. Soc. 133, 167–178 (1968)CrossRefMathSciNetMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nathan Srebro
    • 1
  • Shai Ben-David
    • 2
  1. 1.Department of Computer ScienceUniversity of TorontoTorontoCanada
  2. 2.School of Computer ScienceUniversity of WaterlooWaterlooCanada

Personalised recommendations