Margin and Radius Based Multiple Kernel Learning

  • Huyen Do
  • Alexandros Kalousis
  • Adam Woznica
  • Melanie Hilario
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5781)


A serious drawback of kernel methods, and Support Vector Machines (SVM) in particular, is the difficulty in choosing a suitable kernel function for a given dataset. One of the approaches proposed to address this problem is Multiple Kernel Learning (MKL) in which several kernels are combined adaptively for a given dataset. Many of the existing MKL methods use the SVM objective function and try to find a linear combination of basic kernels such that the separating margin between the classes is maximized. However, these methods ignore the fact that the theoretical error bound depends not only on the margin, but also on the radius of the smallest sphere that contains all the training instances. We present a novel MKL algorithm that optimizes the error bound taking account of both the margin and the radius. The empirical results show that the proposed method compares favorably with other state-of-the-art MKL methods.


Learning Kernel Combination Support Vector Machines convex optimization 


  1. 1.
    Shawe-Taylor, J., Cristianini, N.: Kernel methods for pattern analysis. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  2. 2.
    Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)Google Scholar
  3. 3.
    Lanckriet, G., Cristianini, N., Bartlett, P., Ghaoui, L.E.: Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research 5, 27–72 (2004)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Ong, C.S., Smola, A.J., Williamson, R.C.: Learning the kernel with hyperkernels. Journal of Machine Learning Research 6, 1043–1071 (2005)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Sonnenburg, S., Ratsch, G., Schafer, C.: A general and efficient multiple kernel learning algorithm. Journal of Machine Learning Research 7, 1531–1565 (2006)zbMATHGoogle Scholar
  6. 6.
    Bach, F., Rakotomamonjy, A., Canu, S., Grandvalet, Y.: SimpleMKL. Journal of Machine Learning Research (2008)Google Scholar
  7. 7.
    Bach, F.R., Lanckriet, G.R.G., Jordan, M.I.: Multiple kernel learning, conic duality, and the smo algorithm. In: ICML 2004: Proceedings of the twenty-first international conference on Machine learning, p. 6. ACM, New York (2004)Google Scholar
  8. 8.
    Lanckriet, G., Bie, T.D., Cristianini, N.: A statistical framework for genomic data fusion. Bioinformatics 20 (2004)Google Scholar
  9. 9.
    Cristianini, N., Shawe-Taylor, J., Elisseeff, A.: On kernel-target alignment. Journal of Machine Learning Research (2002)Google Scholar
  10. 10.
    Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Machine Learning 46(1-3), 131–159 (2002)CrossRefzbMATHGoogle Scholar
  11. 11.
    Crammer, K., Keshet, J., Singer, Y.: Kernel design using boosting. In: Advances in Neural Information Processing Systems, vol. 14. MIT Press, Cambridge (2002)Google Scholar
  12. 12.
    Bousquet, O., Herrmann, D.: On the complexity of learning the kernel matrix. In: Advances in Neural Information Processing Systems, vol. 14. MIT Press, Cambridge (2003)Google Scholar
  13. 13.
    Cristianini, N., Shawe-Taylor, J.: An introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)zbMATHGoogle Scholar
  14. 14.
    Vapnik, V.: Statistical learning theory. Wiley Interscience, Hoboken (1998)zbMATHGoogle Scholar
  15. 15.
    Bonnans, J., Shapiro, A.: Optimization problems with perturbation: A guided tour. SIAM Review 40(2), 202–227 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high dimensional spaces. Knowledge and Information Systems 12(1), 95–116 (2007)CrossRefGoogle Scholar
  17. 17.
    McNemar, Q.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12, 153–157 (1947)CrossRefGoogle Scholar
  18. 18.
    Kalousis, A., Theoharis, T.: Noemon: Design, implementation and performance results for an intelligent assistant for classifier selection. Intelligent Data Analysis Journal 3, 319–337 (1999)CrossRefzbMATHGoogle Scholar
  19. 19.
    Leo Liberti, N.M. (ed.): Global Optimization - From Theory to Implementation. Springer, Heidelberg (2006)zbMATHGoogle Scholar
  20. 20.
    Collobert, R., Weston, J., Bottou, L.: Trading convexity for scalability. In: Proceedings of the 23th Conference on Machine Learning (2006)Google Scholar
  21. 21.
    Stephen Boyd, L.V. (ed.): Convex optimization. Cambridge University Press, Cambridge (2004)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Huyen Do
    • 1
  • Alexandros Kalousis
    • 1
  • Adam Woznica
    • 1
  • Melanie Hilario
    • 1
  1. 1.Computer Science DepartmentUniversity of GenevaCarougeSwitzerland

Personalised recommendations