On Kernels, Margins, and Low-Dimensional Mappings

  • Maria-Florina Balcan
  • Avrim Blum
  • Santosh Vempala
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3244)

Abstract

Kernel functions are typically viewed as providing an implicit mapping of points into a high-dimensional space, with the ability to gain much of the power of that space without incurring a high cost if data is separable in that space by a large margin γ. However, the Johnson-Lindenstrauss lemma suggests that in the presence of a large margin, a kernel function can also be viewed as a mapping to a low-dimensional space, one of dimension only \(\tilde{O}(1/\gamma^2)\). In this paper, we explore the question of whether one can efficiently compute such implicit low-dimensional mappings, using only black-box access to a kernel function. We answer this question in the affirmative if our method is also allowed black-box access to the underlying distribution (i.e., unlabeled examples). We also give a lower bound, showing this is not possible for an arbitrary black-box kernel function, if we do not have access to the distribution. We leave open the question of whether such mappings can be found efficiently without access to the distribution for standard kernel functions such as the polynomial kernel.

Our positive result can be viewed as saying that designing a good kernel function is much like designing a good feature space. Given a kernel, by running it in a black-box manner on random unlabeled examples, we can generate an explicit set of \(\tilde{O}(1/\gamma^2)\) features, such that if the data was linearly separable with margin γ under the kernel, then it is approximately separable in this new feature space.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Achlioptas, D.: Database-friendly Random Projections. In: Symposium on Principles of Database Systems (2001)Google Scholar
  2. 2.
    Arriaga, R.I., Vempala, S.: An algorithmic theory of learning, Robust concepts and random projection. In: Proc. of the 40th Foundations of Computer Science, pp. 616–623 (1999)Google Scholar
  3. 3.
    Bartlett, P., Shawe-Taylor, J.: Generalization Performance of Support Vector Machines and Other Pattern Classifiers. In: Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge (1999)Google Scholar
  4. 4.
    Ben-David, S., Eiron, N., Simon, H.U.: Limitations of Learning Via Embeddings in Euclidean Half-Spaces. Journal of Machine Learning Research 3, 441–461 (2002)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Ben-David, S.: A Priori Generalization Bounds for Kernel Based Learning. In: NIPS 2001 Workshop on Kernel Based Learning,Google Scholar
  6. 6.
    Boser, B.E., Guyon, I.M., Vapnik, V.N.: A Training Algorithm for Optimal Margin Classifiers. In: Proceedings of the Fifth AnnualWorkshop on Computational Learning Theory (1992)Google Scholar
  7. 7.
    Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)MATHGoogle Scholar
  8. 8.
    Dasgupta, S., Gupta, A.: An elementary proof of the Johnson-Lindenstrauss Lemma”, Tech Report, UC Berkeley (1999)Google Scholar
  9. 9.
    Freund, Y., Schapire, R.E.: Large Margin Classification Using the Perceptron Algorithm. Machine Learning 37(3), 277–296 (1999)MATHCrossRefGoogle Scholar
  10. 10.
    Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. In: Conference in modern analysis and probability, pp. 189–206 (1984)Google Scholar
  11. 11.
    Muller, K.R., Mika, S., Ratsch, G., Tsuda, K., Scholkopf, B.: An Introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks 12, 181–201 (2001)CrossRefGoogle Scholar
  12. 12.
    Shawe-Taylor, J., Bartlett, P.L., Williamson, R.C., Anthony, M.: Structural Risk Minimization over Data-Dependent Hierarchies. IEEE Trans. on Information Theory 44(5), 1926–1940 (1998)MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Smola, J., Bartlett, P., Scholkopf, B., Schuurmans, D. (eds.): Advances in Large Margin Classifiers. MIT Press, Cambridge (2000)MATHGoogle Scholar
  14. 14.
    Scholkopf, B., Smola, A.J.: Learning with kernels. Support Vector Machines, Regularization, Optimization, and Beyond. MIT University Press, Cambridge (2002)Google Scholar
  15. 15.
    Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons Inc., New York (1998)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Maria-Florina Balcan
    • 1
  • Avrim Blum
    • 1
  • Santosh Vempala
    • 2
  1. 1.Computer Science DepartmentCarnegie Mellon University 
  2. 2.Department of MathematicsMIT 

Personalised recommendations