Advanced Lectures on Machine Learning pp 41-64

Part of the Lecture Notes in Computer Science book series (LNCS, volume 2600)

A Short Introduction to Learning with Kernels

  • Bernhard Schölkopf
  • Alexander J. Smola
Chapter

Abstract

We briefly describe the main ideas of statistical learning theory, support vector machines, and kernel feature spaces. This includes a derivation of the support vector optimization problem for classification and regression, the v-trick, various kernels and an overview over applications of kernel methods.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    M. A. Aizerman, É. M. Braverman, and L. I. Rozonoér. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821–837, 1964.Google Scholar
  2. 2.
    Noga Alon, Shai Ben-David, Nicolo Cesa-Bianchi, and David Haussler. Scalesensitive dimensions, uniform convergence, and learnability. Journal of the ACM, 44(4):615–631, 1997.MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68:337–404, 1950.MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    P. L. Bartlett and J. Shawe-Taylor. Generalization performance of support vector machines and other pattern classifiers. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods-Support Vector Learning, pages 43–54, Cambridge, MA, 1999. MIT Press.Google Scholar
  5. 5.
    C. Berg, J. P. R. Christensen, and P. Ressel. Harmonic Analysis on Semigroups. Springer, New York, 1984.MATHGoogle Scholar
  6. 6.
    D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, 1995.Google Scholar
  7. 7.
    V. Blanz, B. Schölkopf, H. Bültho., C. Burges, V. Vapnik, and T. Vetter. Comparison of view-based object recognition algorithms using realistic 3D models. In C. von der Malsburg, W. von Seelen, J. C. Vorbrüggen, and B. Sendhoff, editors, Artificial Neural Networks ICANN’96, pages 251–256, Berlin, 1996. Springer Lecture Notes in Computer Science, Vol. 1112.Google Scholar
  8. 8.
    B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the Annual Conference on Computational Learning Theory, pages 144–152, Pittsburgh, PA, July 1992. ACM Press.Google Scholar
  9. 9.
    C. J. C. Burges and B. Schölkopf. Improving the accuracy and speed of support vector learning machines. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 375–381, Cambridge, MA, 1997. MIT Press.Google Scholar
  10. 10.
    C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995.MATHGoogle Scholar
  11. 11.
    D. DeCoste and B. Schölkopf. Training invariant support vector machines. Machine Learning, 2002. Accepted for publication. Also: Technical Report JPL-MLTR-00-1, Jet Propulsion Laboratory, Pasadena, CA, 2000.Google Scholar
  12. 12.
    D. Haussler. Convolutional kernels on discrete structures. Technical Report UCSCCRL-99-10, Computer Science Department, UC Santa Cruz, 1999.Google Scholar
  13. 13.
    J. Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society, London, A 209:415–446, 1909.CrossRefGoogle Scholar
  14. 14.
    E. Osuna, R. Freund, and F. Girosi. An improved training algorithm for support vector machines. In J. Principe, L. Gile, N. Morgan, and E. Wilson, editors, Neural Networks for Signal Processing VII-Proceedings of the 1997 IEEE Workshop, pages 276–285, New York, 1997. IEEE.Google Scholar
  15. 15.
    J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods-Support Vector Learning, pages 185–208, Cambridge, MA, 1999. MIT Press.Google Scholar
  16. 16.
    T. Poggio. On optimal nonlinear associative recall. Biological Cybernetics, 19:201–209, 1975.MathSciNetMATHCrossRefGoogle Scholar
  17. 17.
    B. Schölkopf. Support Vector Learning. R. Oldenbourg Verlag, München, 1997. Doktorarbeit, TU Berlin. Download: http://www.kernel-machines.org.
  18. 18.
    B. Schölkopf, C. Burges, and V. Vapnik. Extracting support data for a given task. In U. M. Fayyad and R. Uthurusamy, editors, Proceedings, First International Conference on Knowledge Discovery & Data Mining, Menlo Park, 1995. AAAI Press.Google Scholar
  19. 19.
    B. Schölkopf, C. J. C. Burges, and A. J. Smola. Advances in Kernel Methods-Support Vector Learning. MIT Press, Cambridge, MA, 1999.Google Scholar
  20. 20.
    B. Schölkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), 2001.Google Scholar
  21. 21.
    B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10:1299–1319, 1998.CrossRefGoogle Scholar
  22. 22.
    B. Schölkopf, A. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. Neural Computation, 12:1207–1245, 2000.CrossRefGoogle Scholar
  23. 23.
    B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.Google Scholar
  24. 24.
    A. Smola, B. Schölkopf, and K.-R. Müller. The connection between regularization operators and support vector kernels. Neural Networks, 11:637–649, 1998.CrossRefGoogle Scholar
  25. 25.
    A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans. Advances in Large Margin Classifiers. MIT Press, Cambridge, MA, 2000.MATHGoogle Scholar
  26. 26.
    A. J. Smola, Z. L. Óvári, and R. C. Williamson. Regularization with dot-product kernels. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 308–314. MIT Press, 2001.Google Scholar
  27. 27.
    A. J. Smola and B. Schölkopf. On a kernel-based method for pattern recognition, regression, approximation and operator inversion. Algorithmica, 22:211–231, 1998.MATHCrossRefMathSciNetGoogle Scholar
  28. 28.
    V. Vapnik. Estimation of Dependences Based on Empirical Data [in Russian]. Nauka, Moscow, 1979. (English translation: Springer, New York, 1982).Google Scholar
  29. 29.
    V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995.MATHGoogle Scholar
  30. 30.
    V. Vapnik and A. Chervonenkis. Theory of Pattern Recognition [in Russian]. Nauka, Moscow, 1974. (German Translation: W. Wapnik & A. Tscherwonenkis, Theorie der Zeichenerkennung, Akademie-Verlag, Berlin, 1979).Google Scholar
  31. 31.
    V. Vapnik and A. Lerner. Pattern recognition using generalized portrait method. Automation and Remote Control, 24:774–780, 1963.Google Scholar
  32. 32.
    G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.Google Scholar
  33. 33.
    C. Watkins. Dynamic alignment kernels. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 39–50, Cambridge, MA, 2000. MIT Press.Google Scholar
  34. 34.
    J. Weston, O. Chapelle, A. Elissee., B. Schölkopf, and V. Vapnik. Kernel dependency estimation. Technical Report 98, Max Planck Institute for Biological Cybernetics, 2002.Google Scholar
  35. 35.
    R. C. Williamson, A. J. Smola, and B. Schölkopf. Generalization bounds for regularization networks and support vector machines via entropy numbers of compact operators. IEEE Transaction on Information Theory, 2001.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Bernhard Schölkopf
    • 1
  • Alexander J. Smola
    • 2
  1. 1.Max Planck Institut für Biologische KybernetikTübingenGermany
  2. 2.RSISE, The Australian National UniversityCanberraAustralia

Personalised recommendations