SpicyMKL: a fast algorithm for Multiple Kernel Learning with thousands of kernels

Suzuki, Taiji; Tomioka, Ryota

doi:10.1007/s10994-011-5252-9

SpicyMKL: a fast algorithm for Multiple Kernel Learning with thousands of kernels

Published: 03 June 2011

Volume 85, pages 77–108, (2011)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

SpicyMKL: a fast algorithm for Multiple Kernel Learning with thousands of kernels

Download PDF

Taiji Suzuki¹ &
Ryota Tomioka¹

1384 Accesses
38 Citations
Explore all metrics

Abstract

We propose a new optimization algorithm for Multiple Kernel Learning (MKL) called SpicyMKL, which is applicable to general convex loss functions and general types of regularization. The proposed SpicyMKL iteratively solves smooth minimization problems. Thus, there is no need of solving SVM, LP, or QP internally. SpicyMKL can be viewed as a proximal minimization method and converges super-linearly. The cost of inner minimization is roughly proportional to the number of active kernels. Therefore, when we aim for a sparse kernel combination, our algorithm scales well against increasing number of kernels. Moreover, we give a general block-norm formulation of MKL that includes non-sparse regularizations, such as elastic-net and ℓ _p-norm regularizations. Extending SpicyMKL, we propose an efficient optimization method for the general regularization framework. Experimental results show that our algorithm is faster than existing methods especially when the number of kernels is large (>1000).

References

Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68, 337–404.
Article MathSciNet MATH Google Scholar
Asuncion, A., & Newman, D. (2007). UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html.
Bach, F. R. (2008). Consistency of the group Lasso and multiple kernel learning. Journal of Machine Learning Research, 9, 1179–1225.
MathSciNet Google Scholar
Bach, F. R., Lanckriet, G., & Jordan, M. (2004). Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the 21st international conference on machine learning (pp. 41–48).
Google Scholar
Bach, F. R., Thibaux, R., & Jordan, M. I. (2005). Computing regularization paths for learning multiple kernels. In Advances in neural information processing systems (Vol. 17, pp. 73–80). Cambridge: MIT Press.
Google Scholar
Bertsekas, D. P. (1982). Constrained optimization and Lagrange multiplier methods. New York: Academic Press.
MATH Google Scholar
Bertsekas, D. P. (1999). Nonlinear programming. Nashua: Athena Scientific.
MATH Google Scholar
Candes, E. J., Romberg, J., & Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2), 489–509.
Article MathSciNet Google Scholar
Chapelle, O., & Rakotomamonjy, A. (2008). Second order optimization of kernel parameters. In NIPS workshop on kernel learning: automatic selection of optimal kernels, Whistler.
Google Scholar
Cortes, C. (2009). Can learning kernels help performance? Invited talk at International Conference on Machine Learning (ICML 2009), Montréal, Canada.
Cortes, C., Mohri, M., & Rostamizadeh, A. (2009). L ₂ regularization for learning kernels. In Proceedings of the 25th conference on uncertainty in artificial intelligence (UAI 2009), Montréal, Canada.
Google Scholar
Daubechies, I., Defrise, M., & Mol, C. D. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics, LVII, 1413–1457.
Article Google Scholar
Figueiredo, M., & Nowak, R. (2003). An EM algorithm for wavelet-based image restoration. IEEE Transactions on Image Processing, 12, 906–916.
Article MathSciNet Google Scholar
Gehler, P. V., & Nowozin, S. (2009). Let the kernel figure it out; principled learning of pre-processing for kernel classifiers. In Proceedings of the IEEE computer society conference on computer vision and pattern (CVPR2009).
Google Scholar
Hestenes, M. (1969). Multiplier and gradient methods. Journal of Optimization Theory and Applications, 4, 303–320.
Article MathSciNet MATH Google Scholar
Kimeldorf, G. S., & Wahba, G. (1971). Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications, 33, 82–95.
Article MathSciNet MATH Google Scholar
Kloft, M., Brefeld, U., Sonnenburg, S., Laskov, P., Müller, K. R., & Zien, A. (2009). Efficient and accurate ℓ _p-norm multiple kernel learning. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (Vol. 22, pp. 997–1005). Cambridge: MIT Press.
Google Scholar
Kloft, M., Rückert, U., & Bartlett, P. L. (2010). A unifying view of multiple kernel learning. arXiv:1005.0437.
Lanckriet, G., Cristianini, N., Ghaoui, L. E., Bartlett, P., & Jordan, M. (2004). Learning the kernel matrix with semi-definite programming. Journal of Machine Learning Research, 5, 27–72.
Google Scholar
Micchelli, C. A., & Pontil, M. (2005). Learning the kernel function via regularization. Journal of Machine Learning Research, 6, 1099–1125.
MathSciNet Google Scholar
Mosci, S., Santoro, M., Verri, A., & Villa, S. (2008). A new algorithm to learn an optimal kernel based on Fenchel duality. In NIPS 2008 workshop: kernel learning: automatic selection of optimal kernels, Whistler.
Google Scholar
Nath, J. S., Dinesh, G., Raman, S., Bhattacharyya, C., Ben-Tal, A., & Ramakrishnan, K. R. (2009). On the algorithmics and applications of a mixed-norm based kernel learning formulation. In Advances in neural information processing systems (Vol. 22, pp. 844–852). Cambridge: MIT Press.
Google Scholar
Palmer, J., Wipf, D., Kreutz-Delgado, K., & Rao, B. (2006). Variational EM algorithms for non-Gaussian latent variable models. In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Advances in neural information processing systems (Vol. 18, pp. 1059–1066). Cambridge: MIT Press.
Google Scholar
Platt, J. C. (1999). Using sparseness and analytic QP to speed training of support vector machines. In Advances in neural information processing systems (Vol. 11, pp. 557–563). Cambridge: MIT Press.
Google Scholar
Powell, M. (1969). A method for nonlinear constraints in minimization problems. In R. Fletcher (Ed.), Optimization (pp. 283–298). London: Academic Press.
Google Scholar
Rakotomamonjy, A., Bach, F., & Canu, S. Y. G. (2008). SimpleMKL. Journal of Machine Learning Research, 9, 2491–2521.
MathSciNet Google Scholar
Rätsch, G., Onoda, T., & Müller, K. R. (2001). Soft margins for adaboost. Machine Learning, 42(3), 287–320.
Article MATH Google Scholar
Rockafellar, R. T. (1970). Convex analysis. Princeton: Princeton University Press.
MATH Google Scholar
Rockafellar, R. T. (1976). Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Mathematics of Operations Research, 1, 97–116.
Article MathSciNet MATH Google Scholar
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press.
Google Scholar
Sonnenburg, S., Rätsch, G., Schäfer, C., & Schölkopf, B. (2006). Large scale multiple kernel learning. Journal of Machine Learning Research, 7, 1531–1565.
Google Scholar
Tomioka, R., & Sugiyama, M. (2009). Dual augmented lagrangian method for efficient sparse reconstruction. IEEE Signal Processing Letters, 16(12), 1067–1070.
Article Google Scholar
Tomioka, R., & Suzuki, T. (2009). Sparsity-accuracy trade-off in MKL. arXiv:1001.2615.
Tomioka, R., & Suzuki, T. (2011). Regularization strategies and empirical Bayesian learning for MKL. arXiv:1011.3090.
Tomioka, R., Suzuki, T., & Sugiyama, M. (2011). Super-linear convergence of dual augmented lagrangian algorithm for sparse learning. Journal of Machine Learning Research, 12, 1501–1550.
Google Scholar
Wright, S. J., Nowak, R. D., & Figueiredo, M. A. T. (2009). Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57(7), 2479–2493. doi:10.1109/TSP.2009.2016892.
Article MathSciNet Google Scholar
Xu, Z., Jin, R., King, I., & Lyu, M. R. (2009). An extended level method for efficient multiple kernel learning. In Advances in neural information processing systems (Vol. 21, pp. 1825–1832). Cambridge: MIT Press.
Google Scholar
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68(1), 49–67.
Article MathSciNet MATH Google Scholar
Zangwill, W. I. (1969). Nonlinear programming: a unified approach. New York: Prentice Hall.
MATH Google Scholar
Zien, A., & Ong, C. (2007). Multiclass multiple kernel learning. In Proceedings of the 24th international conference on machine learning (pp. 11910–1198). New York: ACM.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematical Informatics, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
Taiji Suzuki & Ryota Tomioka

Authors

Taiji Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Ryota Tomioka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taiji Suzuki.

Additional information

Editors: Süreyya Özöǧür-Akyüz, Dervim Ünay, Alex Smola.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suzuki, T., Tomioka, R. SpicyMKL: a fast algorithm for Multiple Kernel Learning with thousands of kernels. Mach Learn 85, 77–108 (2011). https://doi.org/10.1007/s10994-011-5252-9

Download citation

Received: 28 February 2010
Accepted: 12 May 2011
Published: 03 June 2011
Issue Date: October 2011
DOI: https://doi.org/10.1007/s10994-011-5252-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

SpicyMKL: a fast algorithm for Multiple Kernel Learning with thousands of kernels

Abstract

Article PDF

Similar content being viewed by others

Nyström-SGD: Fast Learning of Kernel-Classifiers with Conditioned Stochastic Gradient Descent

Efficient Mixed-Norm Multiple Kernel Learning

A Novel Multiple Kernel Learning Method Based on the Kullback–Leibler Divergence

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SpicyMKL: a fast algorithm for Multiple Kernel Learning with thousands of kernels

Abstract

Article PDF

Similar content being viewed by others

Nyström-SGD: Fast Learning of Kernel-Classifiers with Conditioned Stochastic Gradient Descent

Efficient Mixed-Norm Multiple Kernel Learning

A Novel Multiple Kernel Learning Method Based on the Kullback–Leibler Divergence

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation