Feature space perspectives for learning the kernel

Micchelli, Charles A.; Pontil, Massimiliano

doi:10.1007/s10994-006-0679-0

Feature space perspectives for learning the kernel

Published: 09 January 2007

Volume 66, pages 297–319, (2007)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Feature space perspectives for learning the kernel

Download PDF

Charles A. Micchelli¹ &
Massimiliano Pontil²

613 Accesses
25 Citations
Explore all metrics

Abstract

In this paper, we continue our study of learning an optimal kernel in a prescribed convex set of kernels (Micchelli & Pontil, 2005) . We present a reformulation of this problem within a feature space environment. This leads us to study regularization in the dual space of all continuous functions on a compact domain with values in a Hilbert space with a mix norm. We also relate this problem in a special case to \({\cal L}^p\) regularization.

Article PDF

Continuous Kernel Learning

Data Based Construction of Kernels for Classification

Nyström-SGD: Fast Learning of Kernel-Classifiers with Conditioned Stochastic Gradient Descent

References

Argyriou, A., Micchelli, C. A., & Pontil, M. (2005). Learning convex combinations of continuously parameterized basic kernels. In Proc. 18-th Annual Conference on Learning Theory (COLT’05), Bertinoro, Italy.
Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc., 686, 337–404.
Article MathSciNet Google Scholar
Bach, F. R., Lanckriet, G. R. G., & Jordan, M. I. (2004). Multiple kernels learning, conic duality, and the SMO algorithm. In Proc. of the Int. Conf. on Machine Learning (ICML’04).
Borwein, J. M., & Lewis, A. S. (2000). Convex analysis and nonlinear optimization. Theory and examples. CMS (Canadian Mathematical Society) Springer-Verlag, New York.
Bousquet, O., & Herrmann, D. J. L. (2003). On the complexity of learning the kernel matrix. Advances in Neural Information Processing Systems, 15.
Chen, S. S., Donoho, D. L., Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput., 20(1), 33–61.
Article MathSciNet Google Scholar
Cristianini, N., Shawe-Taylor, J., Elisseeff, A., Kandola, J. S. (2002). On kernel-target alignment. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems, vol. 14.
Fung, G. M., & Mangasarian, O. L. (2004). A feature selection Newton method for support vector machine classification. Comput. Optim. Appl., 28(2), 185–202.
Article MATH MathSciNet Google Scholar
Gunn, S. R., & Kandola, J. S. (2002) Structural modelling with sparse kernels Machine Learning, 48(1), 137–163.
Article MATH Google Scholar
Herbster, M. (2004). Relative loss bounds and polynomial-time predictions for the K-LMS-NET algorithm. In Proc. of the 15th Int. Conference on Algorithmic Learning Theory.
Lanckriet, G. R. G., Cristianini, N., Bartlett, P., El Ghaoui, L., Jordan, M. I. (2004). Learning the kernel matrix with semi-definite programming. J. of Machine Learning Research, 5, 27–72.
Google Scholar
Lee, Y., Kim, Y., Lee, S., & Koo, J.-Y. (2004). Structured multicategory support vector machine with ANOVA decomposition. Technical Report No. 743, Department of Statistics, The Ohio State University.
Lin, Y., & Zhang, H. H. (2003). Component selection and smoothing in smoothing spline analysis of variance models–cosso. Institute of Statistics Mimeo Series 2556, NCSU.
Micchelli, C. A. (1992). Curves from variational principles. Mathematical Modeling and Numerical Analysis, 26, 77–93.
MATH MathSciNet Google Scholar
Micchelli, C. A., & Pinkus, A. (1994). Variational problems arising from balancing different error criteria. Rendiconti di Matematica, Serie VII, 14, 37–86.
MATH MathSciNet Google Scholar
Micchelli, C. A., & Pontil, M. (2004). A function representation for learning in Banach spaces. In Proc. of the 17th Annual Conference on Learning Theory (COLT’04), Banff, Alberta.
Micchelli, C. A. & Pontil, M. (2005). On learning vector-valued functions. Neural Computation, 17, 177–204.
Article MATH MathSciNet Google Scholar
Micchelli, C. A., & Pontil, M. (2005). Learning the kernel function via regularization. J. of Machine Learning Research, 6, 1099–1125.
Google Scholar
Micchelli, C. A., Pontil, M., Wu, Q., & Zhou, D. X. (2005). Error bounds for learning the kernel. Research Note 05/09, Dept. of Computer Science, University College London.
Ong, C. S., Smola, A. J., & Williamson, R. C. (2003). Hyperkernels. In S. Becker, S. Thrun, K. Obermayer (Eds.), Advances in Neural Information Processing Systems, vol. 15, MIT Press, Cambridge, MA.
Royden, H. L. (1964). Real analysis, 2nd edition. Macmillan Publishing Company, New York.
Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. B, 58, 267– 288.
MATH MathSciNet Google Scholar
Wahba, G. (1990). Splines models for observational data. Series in Applied Mathematics, vol. 59, SIAM, Philadelphia.
Wu, Q., Ying, Y., & Zhou, D. X. Multi-kernel regularization classifiers. J. of Complexity (to appear).

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, State University of New York, The University at Albany, 1400 Washington Avenue, Albany, NY, 12222, USA
Charles A. Micchelli
Department of Computer Science, University College London, Gower Street, London, WC1E, England, UK
Massimiliano Pontil

Authors

Charles A. Micchelli
View author publications
You can also search for this author in PubMed Google Scholar
Massimiliano Pontil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Charles A. Micchelli.

Additional information

Editors: Olivier Bousquet and Andre Elisseeff

This work was supported by NSF Grant ITR-0312113, EPSRC Grant GR/T18707/01 and by the IST Programme of the European Community, under the PASCAL Network of Excellence IST-2002-506778.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Micchelli, C.A., Pontil, M. Feature space perspectives for learning the kernel. Mach Learn 66, 297–319 (2007). https://doi.org/10.1007/s10994-006-0679-0

Download citation

Received: 15 August 2006
Revised: 17 August 2006
Accepted: 11 October 2006
Published: 09 January 2007
Issue Date: March 2007
DOI: https://doi.org/10.1007/s10994-006-0679-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Feature space perspectives for learning the kernel

Abstract

Article PDF

Similar content being viewed by others

Continuous Kernel Learning

Data Based Construction of Kernels for Classification

Nyström-SGD: Fast Learning of Kernel-Classifiers with Conditioned Stochastic Gradient Descent

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature space perspectives for learning the kernel

Abstract

Article PDF

Similar content being viewed by others

Continuous Kernel Learning

Data Based Construction of Kernels for Classification

Nyström-SGD: Fast Learning of Kernel-Classifiers with Conditioned Stochastic Gradient Descent

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation