On Kernels, Margins, and Low-Dimensional Mappings

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Kernel functions are typically viewed as providing an implicit mapping of points into a high-dimensional space, with the ability to gain much of the power of that space without incurring a high cost if data is separable in that space by a large margin γ. However, the Johnson-Lindenstrauss lemma suggests that in the presence of a large margin, a kernel function can also be viewed as a mapping to a low-dimensional space, one of dimension only \(\tilde{O}(1/\gamma^2)\) . In this paper, we explore the question of whether one can efficiently compute such implicit low-dimensional mappings, using only black-box access to a kernel function. We answer this question in the affirmative if our method is also allowed black-box access to the underlying distribution (i.e., unlabeled examples). We also give a lower bound, showing this is not possible for an arbitrary black-box kernel function, if we do not have access to the distribution. We leave open the question of whether such mappings can be found efficiently without access to the distribution for standard kernel functions such as the polynomial kernel.

Our positive result can be viewed as saying that designing a good kernel function is much like designing a good feature space. Given a kernel, by running it in a black-box manner on random unlabeled examples, we can generate an explicit set of \(\tilde{O}(1/\gamma^2)\) features, such that if the data was linearly separable with margin γ under the kernel, then it is approximately separable in this new feature space.