Kernels as features: On kernels, margins, and low-dimensional mappings

Balcan, Maria-Florina; Blum, Avrim; Vempala, Santosh

doi:10.1007/s10994-006-7550-1

Kernels as features: On kernels, margins, and low-dimensional mappings

Published: 03 May 2006

Volume 65, pages 79–94, (2006)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Kernels as features: On kernels, margins, and low-dimensional mappings

Download PDF

Maria-Florina Balcan¹,
Avrim Blum¹ &
Santosh Vempala²

1127 Accesses
62 Citations
Explore all metrics

Abstract

Kernel functions are typically viewed as providing an implicit mapping of points into a high-dimensional space, with the ability to gain much of the power of that space without incurring a high cost if the result is linearly-separable by a large margin γ. However, the Johnson-Lindenstrauss lemma suggests that in the presence of a large margin, a kernel function can also be viewed as a mapping to a low-dimensional space, one of dimension only \(\tilde{O}(1/\gamma^2)\). In this paper, we explore the question of whether one can efficiently produce such low-dimensional mappings, using only black-box access to a kernel function. That is, given just a program that computes K(x,y) on inputs x,y of our choosing, can we efficiently construct an explicit (small) set of features that effectively capture the power of the implicit high-dimensional space? We answer this question in the affirmative if our method is also allowed black-box access to the underlying data distribution (i.e., unlabeled examples). We also give a lower bound, showing that if we do not have access to the distribution, then this is not possible for an arbitrary black-box kernel function; we leave as an open problem, however, whether this can be done for standard kernel functions such as the polynomial kernel. Our positive result can be viewed as saying that designing a good kernel function is much like designing a good feature space. Given a kernel, by running it in a black-box manner on random unlabeled examples, we can efficiently generate an explicit set of \(\tilde{O}(1/\gamma^2)\) features, such that if the data was linearly separable with margin γ under the kernel, then it is approximately separable in this new feature space.

Article PDF

Kernel Methods

Nyström-SGD: Fast Learning of Kernel-Classifiers with Conditioned Stochastic Gradient Descent

Kernel Learning with Hilbert-Schmidt Independence Criterion

References

Achlioptas, D. (2003). Database-friendly random projections. Journal of Computer and System Sciences, 66(4), 671–687.
Article MATH MathSciNet Google Scholar
Arriaga, R. I., & Vempala, S. (1999). An algorithmic theory of learning, robust concepts and random projection. Proceedings of the 40th foundations of computer science. Journal version to appear in Machine Learning, 616–623.
Bartlett, P., & Shawe-Taylor, J. (1999). Generalization performance of support vector machines and other pattern classifiers. Advances in kernel methods: Support vector learning (pp. 43–54). MIT Press.
Ben-David, S., Eiron, N., & Simon, H.U. (2003). Limitations of learning via embeddings in euclidean half-spaces Journal of Machine Learning Research, 3, 441–461.
Article MATH MathSciNet Google Scholar
Ben-David, S. (2001). A priori generalization bounds for kernel based learning. NIPS Workshop on kernel based learning.
Ben-Israel, A., & Greville, T. N. E. (1974). Generalized inverses: Theory and applications. New York: Wiley
Google Scholar
Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. [http://www.ics.uci.edu/mlearn/MLRepository.html]
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. Proceedings of the fifth annual workshop on computational learning theory (pp. 144–152).
Cortes, C., & Vapnik, V. (1995). Support-vector networks Machine Learning, 20(3), 273–297.
MATH Google Scholar
Dasgupta, S., & Gupta, A. (1999). An elementary proof of the Johnson-Lindenstrauss lemma. Tech Report, UC Berkeley.
Freund, Y., & Schapire, R. E. (1999). Large margin classification using the perceptron algorithm, Machine Learning, 37(3), 277–296.
Article MATH Google Scholar
Gunn, S. R. (1997). Support vector machines for classification and regression. Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton.
Goldreich, O., Goldwasser, S., & Micali, S. (1986). How to construct random functions. Journal of the ACM, 33(4), 792–807.
Article MathSciNet Google Scholar
Indyk, P., & Motwani, R. (1998). Approximate nearest neighbors: Towards removing the curse of dimensionality. Proceedings of the 30th annual ACM symposium on theory of computing (pp. 604–613).
Herbrich, R. (2002). Learning kernel classifiers Cambridge: MIT Press.
Google Scholar
Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26 189–206.
MATH MathSciNet Google Scholar
Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2(4) 285–318.
Google Scholar
Muller, K. R., Mika, S., Ratsch, G., Tsuda, K., & Scholkopf, B. (2001). An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2), 181–201.
Article Google Scholar
Nevo, Z., & El-Yaniv, R. (2003). On online learning of decision lists. The Journal of Machine Learning Research, 3, 271–301.
Google Scholar
Scholkopf, B., Burges, C. J. C., & Mika, S. (1999). Advances in kernel methods: Support vector learning. MIT Press.
Shawe-Taylor, J., Bartlett, P. L., Williamson, R. C., Anthony, & M. (1998). Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5), 1926–1940.
Article MATH MathSciNet Google Scholar
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge University Press.
Scholkopf, B., Tsuda, K., & Vert, J.-P. (2004). Kernel methods in computational biology. MIT Press.
Smola, A. J., Bartlett, P., Scholkopf, B., & Schuurmans D. (2000). (Eds.), Advances in large margin classifiers. MIT Press.
Scholkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge: MIT University Press.
Google Scholar
Vapnik, V. N. (1998). Statistical learning theory New York: John Wiley and Sons Inc.,
http://www.isis.ecs.soton.ac.uk/resources/svminfo/

Download references

Author information

Authors and Affiliations

Computer Science Department, Carnegie Mellon University, Pittsburgh
Maria-Florina Balcan & Avrim Blum
Department of Mathematics, MIT, Cambridge
Santosh Vempala

Authors

Maria-Florina Balcan
View author publications
You can also search for this author in PubMed Google Scholar
Avrim Blum
View author publications
You can also search for this author in PubMed Google Scholar
Santosh Vempala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Avrim Blum.

Additional information

Editor: Philip Long

A preliminary version of this paper appeared in Proceedings of the 15th International Conference on Algorithmic Learning Theory. Springer LNAI 3244, pp. 194–205, 2004.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Balcan, MF., Blum, A. & Vempala, S. Kernels as features: On kernels, margins, and low-dimensional mappings. Mach Learn 65, 79–94 (2006). https://doi.org/10.1007/s10994-006-7550-1

Download citation

Received: 08 February 2005
Revised: 13 January 2006
Accepted: 23 January 2006
Published: 03 May 2006
Issue Date: October 2006
DOI: https://doi.org/10.1007/s10994-006-7550-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Kernels as features: On kernels, margins, and low-dimensional mappings

Abstract

Article PDF

Similar content being viewed by others

Kernel Methods

Nyström-SGD: Fast Learning of Kernel-Classifiers with Conditioned Stochastic Gradient Descent

Kernel Learning with Hilbert-Schmidt Independence Criterion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Kernels as features: On kernels, margins, and low-dimensional mappings

Abstract

Article PDF

Similar content being viewed by others

Kernel Methods

Nyström-SGD: Fast Learning of Kernel-Classifiers with Conditioned Stochastic Gradient Descent

Kernel Learning with Hilbert-Schmidt Independence Criterion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation