Advertisement

Abstract

Random projection is a simple technique that has had a number of applications in algorithm design. In the context of machine learning, it can provide insight into questions such as “why is a learning problem easier if data is separable by a large margin?” and “in what sense is choosing a kernel much like choosing a set of features?” This talk is intended to provide an introduction to random projection and to survey some simple learning algorithms and other applications to learning based on it. I will also discuss how, given a kernel as a black-box function, we can use various forms of random projection to extract an explicit small feature space that captures much of what the kernel is doing. This talk is based in large part on work in [BB05, BBV04] joint with Nina Balcan and Santosh Vempala.

Keywords

Kernel Function Learning Problem Large Margin Random Projection True Error 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [Ach03]
    Achlioptas, D.: Database-friendly random projections. Journal of Computer and System Sciences 66(4), 671–687 (2003)MathSciNetCrossRefMATHGoogle Scholar
  2. [AV99]
    Arriaga, R.I., Vempala, S.: An algorithmic theory of learning, robust concepts and random projection. In: Proceedings of the 40th Annual IEEE Symposium on Foundation of Computer Science, pp. 616–623 (1999)Google Scholar
  3. [BB05]
    Balcan, M.-F., Blum, A.: A PAC-style model for learning from labeled and unlabeled data. In: Proceedings of the 18th Annual Conference on Computational Learning Theory (COLT), pp. 111–126 (2005)Google Scholar
  4. [BB06]
    Balcan, M.-F., Blum, A.: On a theory of kernels as similarity functions (mansucript, 2006)Google Scholar
  5. [BBV04]
    Balcan, M.F., Blum, A., Vempala, S.: Kernels as features: On kernels, margins, and low-dimensional mappings. In: Ben-David, S., Case, J., Maruoka, A. (eds.) ALT 2004. LNCS, vol. 3244, pp. 194–205. Springer, Heidelberg (2004), An extended version is available at: http://www.cs.cmu.edu/~avrim/Papers/
  6. [BGV92]
    Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory (1992)Google Scholar
  7. [BIG74]
    Ben-Israel, A., Greville, T.N.E.: Generalized Inverses: Theory and Applications. Wiley, New York (1974)MATHGoogle Scholar
  8. [Blo62]
    Block, H.D.: The perceptron: A model for brain functioning. Reviews of Modern Physics 34, 123–135 (1962), Reprinted in Neurocomputing, Anderson and RosenfeldGoogle Scholar
  9. [BST99]
    Bartlett, P., Shawe-Taylor, J.: Generalization performance of support vector machines and other pattern classifiers. In: Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge (1999)Google Scholar
  10. [CV95]
    Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)MATHGoogle Scholar
  11. [Das00]
    Dasgupta, S.: Experiments with random projection. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 143–151 (2000)Google Scholar
  12. [DG02]
    Dasgupta, S., Gupta, A.: An elementary proof of the Johnson-Lindenstrauss Lemma. Random Structures & Algorithms 22(1), 60–65 (2002)CrossRefMATHGoogle Scholar
  13. [EK00]
    Rabani, Y., Kushilevitz, E., Ostrovsky, R.: Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Computing 30(2), 457–474 (2000)MathSciNetCrossRefMATHGoogle Scholar
  14. [FM03]
    Fradkin, D., Madigan, D.: Experiments with random projections for machine learning. In: KDD 2003: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 517–522 (2003)Google Scholar
  15. [FS97]
    Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)MathSciNetCrossRefMATHGoogle Scholar
  16. [FS99]
    Freund, Y., Schapire, R.E.: Large margin classification using the Perceptron algorithm. Machine Learning 37(3), 277–296 (1999)CrossRefMATHGoogle Scholar
  17. [GBN05]
    Goal, N., Bebis, G., Nefian, A.: Face recognition experiments with random projection. In: Proceedings SPIE, vol. 5779, pp. 426–437 (2005)Google Scholar
  18. [GW95]
    Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM, 1115–1145 (1995)Google Scholar
  19. [IM98]
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing, pp. 604–613 (1998)Google Scholar
  20. [JL84]
    Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. In: Conference in Modern Analysis and Probability, pp. 189–206 (1984)Google Scholar
  21. [Lit89]
    Littlestone, N.: From on-line to batch learning. In: COLT 1989: Proceedings of the 2nd Annual Workshop on Computational Learning Theory, pp. 269–284 (1989)Google Scholar
  22. [MMR+01]
    Muller, K.R., Mika, S., Ratsch, G., Tsuda, K., Scholkopf, B.: An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks 12, 181–201 (2001)CrossRefGoogle Scholar
  23. [MP69]
    Minsky, M., Papert, S.: Perceptrons: An Introduction to Computational Geometry. The MIT Press, Cambridge (1969)MATHGoogle Scholar
  24. [Nov62]
    Novikoff, A.B.J.: On convergence proofs on perceptrons. In: Proceedings of the Symposium on the Mathematical Theory of Automata, vol. XII, pp. 615–622 (1962)Google Scholar
  25. [Sch90]
    Schapire, R.E.: The strength of weak learnability. Machine Learning 5(2), 197–227 (1990)Google Scholar
  26. [Sch00]
    Schulman, L.: Clustering for edge-cost minimization. In: Proceedings of the 32nd Annual ACM Symposium on Theory of Computing, pp. 547–555 (2000)Google Scholar
  27. [STBWA98]
    Shawe-Taylor, J., Bartlett, P.L., Williamson, R.C., Anthony, M.: Structural risk minimization over data-dependent hierarchies. IEEE Trans. on Information Theory 44(5), 1926–1940 (1998)MathSciNetCrossRefMATHGoogle Scholar
  28. [Vap98]
    Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons Inc., New York (1998)MATHGoogle Scholar
  29. [Vem98]
    Vempala, S.: Random projection: A new approach to VLSI layout. In: Proceedings of the 39th Annual IEEE Symposium on Foundation of Computer Science, pp. 389–395 (1998)Google Scholar
  30. [Vem04]
    Vempala, S.: The Random Projection Method. In: American Mathematical Society. DIMACS: Series in Discrete Mathematics and Theoretical Computer Science (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Avrim Blum
    • 1
  1. 1.Department of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations