Efficient Bayesian Maximum Margin Multiple Kernel Learning

  • Changying DuEmail author
  • Changde Du
  • Guoping Long
  • Xin Jin
  • Yucheng Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9851)


Multiple Kernel Learning (MKL) suffers from slow learning speed and poor generalization ability. Existing methods seldom address these problems well simultaneously. In this paper, by defining a multiclass (pseudo-) likelihood function that accounts for the margin loss for kernelized classification, we develop a robust Bayesian maximum margin MKL framework with Dirichlet and the three parameter Beta normal priors imposed on the kernel and sample combination weights respectively. For inference, we exploit the data augmentation idea and devise an efficient MCMC algorithm in the augmented variable space, employing the Riemann manifold Hamiltonian Monte Carlo technique to sample from the conditional posterior of kernel weights, and making use of local conjugacy for all other variables. Such geometry and conjugacy based posterior sampling leads to very fast mixing rate and scales linearly with the number of kernels used. Extensive experiments on classification tasks validate the superiority of the proposed method in both efficacy and efficiency.


Markov Chain Monte Carlo Kernel Weight Multiple Kernel Learn Inverse Gaussian Quadratically Constrain Quadratic Programming 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported by the National Natural Science Foundation of China (No. 61473273, 61573335, 91546122, 61303059), Guangdong provincial science and technology plan projects (No. 2015B010109005), and the Science and Technology Funds of Guiyang (No. 201410012).


  1. 1.
    Andrews, D.F., Mallows, C.L.: Scale mixtures of normal distributions. J. Roy. Stat. Soc. B (Methodol.) 36(1), 99–102 (1974)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Armagan, A., Clyde, M., Dunson, D.B.: Generalized beta mixtures of gaussians. In: NIPS (2011)Google Scholar
  3. 3.
    Bach, F.R.: Consistency of the group lasso and multiple kernel learning. JMLR 9, 1179–1225 (2008)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Bach, F.R., Lanckriet, G.R., Jordan, M.I.: Multiple kernel learning, conic duality, and the smo algorithm. In: ICML (2004)Google Scholar
  5. 5.
    Chen, T., Fox, E., Guestrin, C.: Stochastic gradient Hamiltonian Monte Carlo. In: ICML (2014)Google Scholar
  6. 6.
    Chhikara, R.: The Inverse Gaussian Distribution: Theory, Methodology, and Applications, vol. 95. CRC Press, Boca Raton (1988)Google Scholar
  7. 7.
    Christoudias, M., Urtasun, R., Darrell, T.: Bayesian localized multiple kernel learning. Univ. California Berkeley, Berkeley (2009)Google Scholar
  8. 8.
    Cortes, C.: Invited talk: can learning kernels help performance? In: ICML (2009)Google Scholar
  9. 9.
    Cortes, C., Mohri, M., Rostamizadeh, A.: Multi-class classification with maximum margin multiple kernel. In: ICML, pp. 46–54 (2013)Google Scholar
  10. 10.
    Damoulas, T., Girolami, M.A.: Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection. Bioinformatics 24(10), 1264–1270 (2008)CrossRefGoogle Scholar
  11. 11.
    Devroye, L.: Random variate generation for the generalized inverse gaussian distribution. Stat. Comput. 24(2), 239–246 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Du, C., Zhe, S., Zhuang, F., Qi, Y., He, Q., Shi, Z.: Bayesian maximum margin principal component analysis. In: AAAI (2015)Google Scholar
  13. 13.
    Girolami, M., Calderhead, B.: Riemann manifold langevin and hamiltonian monte carlo methods. J. Roy. Stat. Soc. B (Stat. Methodol.) 73(2), 123–214 (2011)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Girolami, M., Rogers, S.: Hierarchic bayesian models for kernel learning. In: ICML (2005)Google Scholar
  15. 15.
    Gönen, M.: Bayesian efficient multiple kernel learning. In: ICML, pp. 1–8 (2012)Google Scholar
  16. 16.
    Gönen, M., Alpaydin, E.: Localized multiple kernel learning. In: ICML, pp. 352–359 (2008)Google Scholar
  17. 17.
    Gönen, M., Alpaydın, E.: Multiple kernel learning algorithms. JMLR 12, 2211–2268 (2011)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Henao, R., Yuan, X., Carin, L.: Bayesian nonlinear support vector machines and discriminative factor modeling. In: NIPS, pp. 1754–1762 (2014)Google Scholar
  19. 19.
    Jain, A., Vishwanathan, S.V., Varma, M.: Spg-gmkl: generalized multiple kernel learning with a million kernels. In: SIGKDD (2012)Google Scholar
  20. 20.
    Kloft, M., Brefeld, U., Sonnenburg, S., Zien, A.: Lp-norm multiple kernel learning. JMLR 12, 953–997 (2011)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Lanckriet, G.R., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. JMLR 5, 27–72 (2004)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Lázaro-gredilla, M., Titsias, M.K.: Spike and slab variational inference for multi-task and multiple kernel learning. In: NIPS, pp. 2339–2347 (2011)Google Scholar
  23. 23.
    Liu, X., Wang, L., Yin, J., Dou, Y., Zhang, J.: Absent multiple kernel learning. In: AAAI (2015)Google Scholar
  24. 24.
    Liu, X., Wang, L., Zhang, J., Yin, J.: Sample-adaptive multiple kernel learning. In: AAAI (2014)Google Scholar
  25. 25.
    Ma, Y.A., Chen, T., Fox, E.: A complete recipe for stochastic gradient mcmc. In: NIPS (2015)Google Scholar
  26. 26.
    Mangasarian, O.L., Wild, E.W.: Proximal support vector machine classifiers. In: SIGKDD (2001)Google Scholar
  27. 27.
    Neal, R.M.: Mcmc using hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, vol. 2 (2011)Google Scholar
  28. 28.
    Ni, B., Li, T., Moulin, P.: Beta process multiple kernel learning. In: CVPR (2014)Google Scholar
  29. 29.
    Orabona, F., Jie, L., Caputo, B.: Multi kernel learning with online-batch optimization. JMLR 13(1), 227–253 (2012)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Patterson, S., Teh, Y.W.: Stochastic gradient riemannian langevin dynamics on the probability simplex. In: NIPS, pp. 3102–3110 (2013)Google Scholar
  31. 31.
    Polson, N.G., Scott, S.L.: Data augmentation for support vector machines. Bayesian Analysis 6(1), 1–23 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: More efficiency in multiple kernel learning. In: ICML, pp. 775–782 (2007)Google Scholar
  33. 33.
    Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: Simplemkl. JMLR 9, 2491–2521 (2008)MathSciNetzbMATHGoogle Scholar
  34. 34.
    Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. JMLR 7, 1531–1565 (2006)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Suzuki, T., Tomioka, R.: Spicymkl: a fast algorithm for multiple kernel learning with thousands of kernels. Machine learning 85(1–2), 77–108 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Xu, Z., Jin, R., King, I., Lyu, M.: An extended level method for efficient multiple kernel learning. In: NIPS, pp. 1825–1832 (2009)Google Scholar
  37. 37.
    Xu, Z., Jin, R., Yang, H., King, I., Lyu, M.R.: Simple and efficient multiple kernel learning by group lasso. In: ICML (2010)Google Scholar
  38. 38.
    Yang, J., Li, Y., Tian, Y., Duan, L., Gao, W.: Group-sensitive multiple kernel learning for object categorization. In: CVPR (2009)Google Scholar
  39. 39.
    Zhu, J., Chen, N., Xing, E.P.: Bayesian inference with posterior regularization and applications to infinite latent svms. JMLR 15, 1799–1847 (2014)MathSciNetzbMATHGoogle Scholar
  40. 40.
    Zien, A., Ong, C.S.: Multiclass multiple kernel learning. In: ICML, pp. 1191–1198 (2007)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Changying Du
    • 1
    • 2
    Email author
  • Changde Du
    • 3
  • Guoping Long
    • 1
  • Xin Jin
    • 4
  • Yucheng Li
    • 1
  1. 1.Laboratory of Parallel Software and Computational ScienceInstitute of Software, Chinese Academy of SciencesBeijingChina
  2. 2.Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Institute of Computing Technology, CASBeijingChina
  3. 3.Research Center for Brain-inspired IntelligenceInstitute of Automation, Chinese Academy of SciencesBeijingChina
  4. 4.Central Software InstituteHuawei Technologies Co. Ltd.BeijingChina

Personalised recommendations