Encyclopedia of Machine Learning

2010 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Support Vector Machines

  • Xinhua Zhang
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30164-8_804


Support vector machines (SVMs) are a class of linear algorithms that can be used for  classification,  regression, density estimation, novelty detection, and other applications. In the simplest case of two-class classification, SVMs find a hyperplane that separates the two classes of data with as wide a margin as possible. This leads to good generalization accuracy on unseen data, and supports specialized optimization methods that allow SVM to learn from a large amount of data.

Motivation and Background

Over the past decade, maximum margin models especially SVMs have become popular in machine learning. This technique was developed in three major steps. First, assuming that the two classes of training examples can be separated by a hyperplane, Vapnik and Lerner proposed in 1963 that the optimal hyperplane is the one that separates the training examples with the widest margin. From the 1960s to 1990s, Vapnik and Chervonenkis developed the Vapnik–Chervonenkis theory, which...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Bakir, G., Hofmann, T., Schölkopf, B., Smola, A., Taskar, B., & Vishwanathan, S. V. N. (2007). Predicting structured data. Cambridge: MIT Press.Google Scholar
  2. Borgwardt, K. M. (2007). Graph Kernels. Ph.D. thesis, Ludwig-Maximilians-University, Munich, Germany.Google Scholar
  3. Boser, B., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In D. Haussler (Ed.), Proceedings of annual conference computational learning theory (pp. 144–152). Pittsburgh: ACM Press.Google Scholar
  4. Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20(3), 273–297.zbMATHGoogle Scholar
  5. Haussler, D. (1999). Convolution kernels on discrete structures (Tech. Rep. UCS-CRL-99-10). University of California, Santa Cruz.Google Scholar
  6. Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the European conference on machine learning (pp. 137–142). Berlin: Springer.Google Scholar
  7. Jordan, M. I., Bartlett, P. L., & McAuliffe, J. D. (2003). Convexity, classification, and risk bounds (Tech. Rep. 638). University of California, Berkeley.Google Scholar
  8. Lampert, C. H. (2009). Kernel methods in computer vision. Foundations and Trends in Computer Graphics and Vision, 4(3), 193–285.CrossRefGoogle Scholar
  9. Platt, J. C. (1999a). Fast training of support vector machines using sequential minimal optimization. In Advances in kernel methods—support vector learning (pp. 185–208). Cambridge, MA: MIT Press.Google Scholar
  10. Platt, J. C. (1999b). Probabilities for sv machines. In A. J. Smola, P. L. Bartlett, B. Schölkopf, & D. Schuurmans, (Eds.), Advances in large margin classifiers (pp. 61–74). Cambridge: MIT Press.Google Scholar
  11. Schölkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT Press.Google Scholar
  12. Schölkopf, B., Tsuda, K., & Vert, J.-P. (2004). Kernel methods in computational biology. Cambridge: MIT Press.Google Scholar
  13. Shawe-Taylor, J., & Cristianini, N. (2000). Margin distribution and soft margin. In A. J. Smola, P. L. Bartlett, B. Schölkopf, & D. Schuurmans, (Eds.), Advances in large margin classifiers (pp. 349–358). Cambridge: MIT Press.Google Scholar
  14. Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.Google Scholar
  15. Shawe-Taylor, J., Bartlett, P. L., Williamson, R. C., & Anthony, M. (1998). Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5), 1926–1940.MathSciNetzbMATHCrossRefGoogle Scholar
  16. Smola, A., Vishwanathan, S. V. N., & Le, Q. (2007). Bundle methods for machine learning. In D. Koller, & Y. Singer, (Eds.), Advances in neural information processing systems (Vol. 20). Cambridge: MIT Press.Google Scholar
  17. Taskar, B. (2004). Learning structured prediction models: A large margin approach. Ph.D. thesis, Stanford University.Google Scholar
  18. Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.MathSciNetGoogle Scholar
  19. Vapnik, V. (1998). Statistical learning theory. New York: Wiley.zbMATHGoogle Scholar
  20. Wahba, G. (1990). Spline models for observational data. CBMS-NSF regional conference series in applied mathematics (Vol. 59). Philadelphia: SIAM.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Xinhua Zhang

There are no affiliations available