Abstract
In this paper, we continue our study of learning an optimal kernel in a prescribed convex set of kernels (Micchelli & Pontil, 2005) . We present a reformulation of this problem within a feature space environment. This leads us to study regularization in the dual space of all continuous functions on a compact domain with values in a Hilbert space with a mix norm. We also relate this problem in a special case to \({\cal L}^p\) regularization.
Article PDF
Similar content being viewed by others
References
Argyriou, A., Micchelli, C. A., & Pontil, M. (2005). Learning convex combinations of continuously parameterized basic kernels. In Proc. 18-th Annual Conference on Learning Theory (COLT’05), Bertinoro, Italy.
Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc., 686, 337–404.
Bach, F. R., Lanckriet, G. R. G., & Jordan, M. I. (2004). Multiple kernels learning, conic duality, and the SMO algorithm. In Proc. of the Int. Conf. on Machine Learning (ICML’04).
Borwein, J. M., & Lewis, A. S. (2000). Convex analysis and nonlinear optimization. Theory and examples. CMS (Canadian Mathematical Society) Springer-Verlag, New York.
Bousquet, O., & Herrmann, D. J. L. (2003). On the complexity of learning the kernel matrix. Advances in Neural Information Processing Systems, 15.
Chen, S. S., Donoho, D. L., Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput., 20(1), 33–61.
Cristianini, N., Shawe-Taylor, J., Elisseeff, A., Kandola, J. S. (2002). On kernel-target alignment. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems, vol. 14.
Fung, G. M., & Mangasarian, O. L. (2004). A feature selection Newton method for support vector machine classification. Comput. Optim. Appl., 28(2), 185–202.
Gunn, S. R., & Kandola, J. S. (2002) Structural modelling with sparse kernels Machine Learning, 48(1), 137–163.
Herbster, M. (2004). Relative loss bounds and polynomial-time predictions for the K-LMS-NET algorithm. In Proc. of the 15th Int. Conference on Algorithmic Learning Theory.
Lanckriet, G. R. G., Cristianini, N., Bartlett, P., El Ghaoui, L., Jordan, M. I. (2004). Learning the kernel matrix with semi-definite programming. J. of Machine Learning Research, 5, 27–72.
Lee, Y., Kim, Y., Lee, S., & Koo, J.-Y. (2004). Structured multicategory support vector machine with ANOVA decomposition. Technical Report No. 743, Department of Statistics, The Ohio State University.
Lin, Y., & Zhang, H. H. (2003). Component selection and smoothing in smoothing spline analysis of variance models–cosso. Institute of Statistics Mimeo Series 2556, NCSU.
Micchelli, C. A. (1992). Curves from variational principles. Mathematical Modeling and Numerical Analysis, 26, 77–93.
Micchelli, C. A., & Pinkus, A. (1994). Variational problems arising from balancing different error criteria. Rendiconti di Matematica, Serie VII, 14, 37–86.
Micchelli, C. A., & Pontil, M. (2004). A function representation for learning in Banach spaces. In Proc. of the 17th Annual Conference on Learning Theory (COLT’04), Banff, Alberta.
Micchelli, C. A. & Pontil, M. (2005). On learning vector-valued functions. Neural Computation, 17, 177–204.
Micchelli, C. A., & Pontil, M. (2005). Learning the kernel function via regularization. J. of Machine Learning Research, 6, 1099–1125.
Micchelli, C. A., Pontil, M., Wu, Q., & Zhou, D. X. (2005). Error bounds for learning the kernel. Research Note 05/09, Dept. of Computer Science, University College London.
Ong, C. S., Smola, A. J., & Williamson, R. C. (2003). Hyperkernels. In S. Becker, S. Thrun, K. Obermayer (Eds.), Advances in Neural Information Processing Systems, vol. 15, MIT Press, Cambridge, MA.
Royden, H. L. (1964). Real analysis, 2nd edition. Macmillan Publishing Company, New York.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. B, 58, 267– 288.
Wahba, G. (1990). Splines models for observational data. Series in Applied Mathematics, vol. 59, SIAM, Philadelphia.
Wu, Q., Ying, Y., & Zhou, D. X. Multi-kernel regularization classifiers. J. of Complexity (to appear).
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Olivier Bousquet and Andre Elisseeff
This work was supported by NSF Grant ITR-0312113, EPSRC Grant GR/T18707/01 and by the IST Programme of the European Community, under the PASCAL Network of Excellence IST-2002-506778.
Rights and permissions
About this article
Cite this article
Micchelli, C.A., Pontil, M. Feature space perspectives for learning the kernel. Mach Learn 66, 297–319 (2007). https://doi.org/10.1007/s10994-006-0679-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-006-0679-0