# Kernels as features: On kernels, margins, and low-dimensional mappings

- 542 Downloads
- 44 Citations

## Abstract

Kernel functions are typically viewed as providing an implicit mapping of points into a high-dimensional space, with the ability to gain much of the power of that space without incurring a high cost if the result is linearly-separable by a large margin γ. However, the Johnson-Lindenstrauss lemma suggests that in the presence of a large margin, a kernel function can also be viewed as a mapping to a *low*-dimensional space, one of dimension only \(\tilde{O}(1/\gamma^2)\). In this paper, we explore the question of whether one can efficiently produce such low-dimensional mappings, using only black-box access to a kernel function. That is, given just a program that computes *K*(*x*,*y*) on inputs *x*,*y* of our choosing, can we efficiently construct an explicit (small) set of features that effectively capture the power of the implicit high-dimensional space? We answer this question in the affirmative if our method is also allowed black-box access to the underlying data distribution (i.e., unlabeled examples). We also give a lower bound, showing that if we do not have access to the distribution, then this is not possible for an *arbitrary* black-box kernel function; we leave as an open problem, however, whether this can be done for standard kernel functions such as the polynomial kernel. Our positive result can be viewed as saying that designing a good kernel function is much like designing a good feature space. Given a kernel, by running it in a black-box manner on random unlabeled examples, we can *efficiently* generate an explicit set of \(\tilde{O}(1/\gamma^2)\) features, such that if the data was linearly separable with margin γ under the kernel, then it is approximately separable in this new feature space.

## Keywords

Kernel Function Mach Learn Target Function Large Margin Polynomial Kernel## References

- Achlioptas, D. (2003). Database-friendly random projections.
*Journal of Computer and System Sciences*,*66*(4), 671–687.MATHMathSciNetCrossRefGoogle Scholar - Arriaga, R. I., & Vempala, S. (1999). An algorithmic theory of learning, robust concepts and random projection. Proceedings of the 40th foundations of computer science.
*Journal version to appear in Machine Learning*, 616–623.Google Scholar - Bartlett, P., & Shawe-Taylor, J. (1999). Generalization performance of support vector machines and other pattern classifiers.
*Advances in kernel methods: Support vector learning*(pp. 43–54). MIT Press.Google Scholar - Ben-David, S., Eiron, N., & Simon, H.U. (2003). Limitations of learning via embeddings in euclidean half-spaces
*Journal of Machine Learning Research*,*3*, 441–461.MATHMathSciNetCrossRefGoogle Scholar - Ben-David, S. (2001). A priori generalization bounds for kernel based learning. NIPS Workshop on kernel based learning.Google Scholar
- Ben-Israel, A., & Greville, T. N. E. (1974).
*Generalized inverses: Theory and applications*. New York: WileyGoogle Scholar - Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. [http://www.ics.uci.edu/mlearn/MLRepository.html]
- Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers.
*Proceedings of the fifth annual workshop on computational learning theory*(pp. 144–152).Google Scholar - Cortes, C., & Vapnik, V. (1995). Support-vector networks
*Machine Learning, 20*(3), 273–297.MATHGoogle Scholar - Dasgupta, S., & Gupta, A. (1999). An elementary proof of the Johnson-Lindenstrauss lemma. Tech Report, UC Berkeley.Google Scholar
- Freund, Y., & Schapire, R. E. (1999). Large margin classification using the perceptron algorithm,
*Machine Learning*,*37*(3), 277–296.MATHCrossRefGoogle Scholar - Gunn, S. R. (1997). Support vector machines for classification and regression. Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton.Google Scholar
- Goldreich, O., Goldwasser, S., & Micali, S. (1986). How to construct random functions.
*Journal of the ACM*,*33*(4), 792–807.MathSciNetCrossRefGoogle Scholar - Indyk, P., & Motwani, R. (1998). Approximate nearest neighbors: Towards removing the curse of dimensionality.
*Proceedings of the 30th annual ACM symposium on theory of computing*(pp. 604–613).Google Scholar - Herbrich, R. (2002).
*Learning kernel classifiers*Cambridge: MIT Press.Google Scholar - Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz mappings into a Hilbert space.
*Contemporary Mathematics, 26*189–206.MATHMathSciNetGoogle Scholar - Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm.
*Machine Learning*,*2*(4) 285–318.Google Scholar - Muller, K. R., Mika, S., Ratsch, G., Tsuda, K., & Scholkopf, B. (2001). An introduction to kernel-based learning algorithms.
*IEEE Transactions on Neural Networks, 12*(2), 181–201.CrossRefGoogle Scholar - Nevo, Z., & El-Yaniv, R. (2003). On online learning of decision lists.
*The Journal of Machine Learning Research*, 3, 271–301.Google Scholar - Scholkopf, B., Burges, C. J. C., & Mika, S. (1999).
*Advances in kernel methods*:*Support vector learning*. MIT Press.Google Scholar - Shawe-Taylor, J., Bartlett, P. L., Williamson, R. C., Anthony, & M. (1998). Structural risk minimization over data-dependent hierarchies.
*IEEE Transactions on Information Theory, 44*(5), 1926–1940.MATHMathSciNetCrossRefGoogle Scholar - Shawe-Taylor, J., & Cristianini, N. (2004).
*Kernel methods for pattern analysis*. Cambridge University Press.Google Scholar - Scholkopf, B., Tsuda, K., & Vert, J.-P. (2004).
*Kernel methods in computational biology*. MIT Press.Google Scholar - Smola, A. J., Bartlett, P., Scholkopf, B., & Schuurmans D. (2000). (Eds.),
*Advances in large margin classifiers*. MIT Press.Google Scholar - Scholkopf, B., & Smola, A. J. (2002).
*Learning with kernels*:*Support vector machines, regularization, optimization, and beyond*. Cambridge: MIT University Press.Google Scholar - Vapnik, V. N. (1998).
*Statistical learning theory New York: John Wiley and Sons Inc*.,Google Scholar