A Unifying View of Multiple Kernel Learning

  • Marius Kloft
  • Ulrich Rückert
  • Peter L. Bartlett
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6322)


Recent research on multiple kernel learning has lead to a number of approaches for combining kernels in regularized risk minimization. The proposed approaches include different formulations of objectives and varying regularization strategies. In this paper we present a unifying optimization criterion for multiple kernel learning and show how existing formulations are subsumed as special cases. We also derive the criterion’s dual representation, which is suitable for general smooth optimization algorithms. Finally, we evaluate multiple kernel learning in this framework analytically using a Rademacher complexity bound on the generalization error and empirically in a set of experiments.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abeel, T., Van de Peer, Y., Saeys, Y.: Towards a gold standard for promoter prediction evaluation. Bioinformatics (2009)Google Scholar
  2. 2.
    Aflalo, J., Ben-Tal, A., Bhattacharyya, C., Saketha Nath, J., Raman, S.: Variable sparsity kernel learning — algorithms and applications. Journal of Machine Learning Research (submitted, 2010), http://mllab.csa.iisc.ernet.in/vskl.html
  3. 3.
    Agarwal, A., Rakhlin, A., Bartlett, P.: Matrix regularization techniques for online multitask learning. Technical Report UCB/EECS-2008-138, EECS Department, University of California, Berkeley (October 2008)Google Scholar
  4. 4.
    Bach, F.R., Lanckriet, G.R.G., Jordan, M.I.: Multiple kernel learning, conic duality, and the smo algorithm. In: Proc. 21st ICML. ACM, New York (2004)Google Scholar
  5. 5.
    Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3, 463–482 (2002)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Chapelle, O.: Training a support vector machine in the primal. Neural Computation (2006)Google Scholar
  7. 7.
    Cortes, C., Mohri, M., Rostamizadeh, A.: L2 regularization for learning kernels. In: Proceedings, 26th ICML (2009)Google Scholar
  8. 8.
    Cortes, C., Mohri, M., Rostamizadeh, A.: Generalization bounds for learning kernels. In: Proceedings, 27th ICML (to appear, 2010), CoRR abs/0912.3309, http://arxiv.org/abs/0912.3309
  9. 9.
    Kloft, M., Brefeld, U., Sonnenburg, S., Laskov, P., Müller, K.-R., Zien, A.: Efficient and accurate lp-norm multiple kernel learning. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 997–1005. MIT Press, Cambridge (2009)Google Scholar
  10. 10.
    Kloft, M., Brefeld, U., Sonnenburg, S., Zien, A.: Non-sparse regularization and efficient training with multiple kernels. Technical Report UCB/EECS-2010-21, EECS Department, University of California, Berkeley (February 2010), CoRR abs/1003.0079, http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-21.html
  11. 11.
    Lanckriet, G.R.G., Cristianini, N., Bartlett, P., El Ghaoui, L., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research 5, 27–72 (2004)Google Scholar
  12. 12.
    Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Neural Networks 12(2), 181–201 (2001)CrossRefGoogle Scholar
  13. 13.
    Nath, J.S., Dinesh, G., Ramanand, S., Bhattacharyya, C., Ben-Tal, A., Ramakrishnan, K.R.: On the algorithmics and applications of a mixed-norm based kernel learning formulation. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 844–852 (2009)Google Scholar
  14. 14.
    Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: SimpleMKL. Journal of Machine Learning Research 9, 2491–2521 (2008)MathSciNetGoogle Scholar
  15. 15.
    Rifkin, R.M., Lippert, R.A.: Value regularization and fenchel duality. J. Mach. Learn. Res. 8, 441–479 (2007)MathSciNetGoogle Scholar
  16. 16.
    Rockafellar, R.T.: Convex Analysis. Princeton Landmarks in Mathemathics. Princeton University Press, New Jersey (1970)Google Scholar
  17. 17.
    Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)Google Scholar
  18. 18.
    Schölkopf, B., Smola, A.J., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10, 1299–1319 (1998)CrossRefGoogle Scholar
  19. 19.
    Showalter, R.E.: Monotone operators in banach space and nonlinear partial differential equations. Mathematical Surveys and Monographs 18 (1997)Google Scholar
  20. 20.
    Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large Scale Multiple Kernel Learning. Journal of Machine Learning Research 7, 1531–1565 (2006)Google Scholar
  21. 21.
    Sonnenburg, S., Zien, A., Rätsch, G.: ARTS: Accurate Recognition of Transcription Starts in Human. Bioinformatics, 22(14), e472–e480 (2006)Google Scholar
  22. 22.
    Tomioka, R., Suzuki, T.: Sparsity-accuracy trade-off in mkl. In: arxiv (2010), CoRR abs/1001.2615Google Scholar
  23. 23.
    Vapnik, V.N.: Statistical Learning Theory. Wiley, Chichester (1998)MATHGoogle Scholar
  24. 24.
    Zhu, C., Byrd, R.H., Lu, P., Nocedal, J.: Algorithm 778: L-bfgs-b: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw. 23(4), 550–560 (1997)MATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B 67, 301–320 (2005)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Marius Kloft
    • 1
  • Ulrich Rückert
    • 1
  • Peter L. Bartlett
    • 1
  1. 1.University of CaliforniaBerkeleyUSA

Personalised recommendations