Skip to main content

Support Vector Machines

  • Reference work entry
Encyclopedia of Machine Learning

Definition

Support vector machines (SVMs) are a class of linear algorithms that can be used for classification, regression, density estimation, novelty detection, and other applications. In the simplest case of two-class classification, SVMs find a hyperplane that separates the two classes of data with as wide a margin as possible. This leads to good generalization accuracy on unseen data, and supports specialized optimization methods that allow SVM to learn from a large amount of data.

Motivation and Background

Over the past decade, maximum margin models especially SVMs have become popular in machine learning. This technique was developed in three major steps. First, assuming that the two classes of training examples can be separated by a hyperplane, Vapnik and Lerner proposed in 1963 that the optimal hyperplane is the one that separates the training examples with the widest margin. From the 1960s to 1990s, Vapnik and Chervonenkis developed the Vapnik–Chervonenkis theory, which...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Recommended Reading

  • Bakir, G., Hofmann, T., Schölkopf, B., Smola, A., Taskar, B., & Vishwanathan, S. V. N. (2007). Predicting structured data. Cambridge: MIT Press.

    Google Scholar 

  • Borgwardt, K. M. (2007). Graph Kernels. Ph.D. thesis, Ludwig-Maximilians-University, Munich, Germany.

    Google Scholar 

  • Boser, B., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In D. Haussler (Ed.), Proceedings of annual conference computational learning theory (pp. 144–152). Pittsburgh: ACM Press.

    Google Scholar 

  • Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20(3), 273–297.

    MATH  Google Scholar 

  • Haussler, D. (1999). Convolution kernels on discrete structures (Tech. Rep. UCS-CRL-99-10). University of California, Santa Cruz.

    Google Scholar 

  • Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the European conference on machine learning (pp. 137–142). Berlin: Springer.

    Google Scholar 

  • Jordan, M. I., Bartlett, P. L., & McAuliffe, J. D. (2003). Convexity, classification, and risk bounds (Tech. Rep. 638). University of California, Berkeley.

    Google Scholar 

  • Lampert, C. H. (2009). Kernel methods in computer vision. Foundations and Trends in Computer Graphics and Vision, 4(3), 193–285.

    Article  Google Scholar 

  • Platt, J. C. (1999a). Fast training of support vector machines using sequential minimal optimization. In Advances in kernel methods—support vector learning (pp. 185–208). Cambridge, MA: MIT Press.

    Google Scholar 

  • Platt, J. C. (1999b). Probabilities for sv machines. In A. J. Smola, P. L. Bartlett, B. Schölkopf, & D. Schuurmans, (Eds.), Advances in large margin classifiers (pp. 61–74). Cambridge: MIT Press.

    Google Scholar 

  • Schölkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT Press.

    Google Scholar 

  • Schölkopf, B., Tsuda, K., & Vert, J.-P. (2004). Kernel methods in computational biology. Cambridge: MIT Press.

    Google Scholar 

  • Shawe-Taylor, J., & Cristianini, N. (2000). Margin distribution and soft margin. In A. J. Smola, P. L. Bartlett, B. Schölkopf, & D. Schuurmans, (Eds.), Advances in large margin classifiers (pp. 349–358). Cambridge: MIT Press.

    Google Scholar 

  • Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.

    Google Scholar 

  • Shawe-Taylor, J., Bartlett, P. L., Williamson, R. C., & Anthony, M. (1998). Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5), 1926–1940.

    Article  MathSciNet  MATH  Google Scholar 

  • Smola, A., Vishwanathan, S. V. N., & Le, Q. (2007). Bundle methods for machine learning. In D. Koller, & Y. Singer, (Eds.), Advances in neural information processing systems (Vol. 20). Cambridge: MIT Press.

    Google Scholar 

  • Taskar, B. (2004). Learning structured prediction models: A large margin approach. Ph.D. thesis, Stanford University.

    Google Scholar 

  • Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.

    MathSciNet  Google Scholar 

  • Vapnik, V. (1998). Statistical learning theory. New York: Wiley.

    MATH  Google Scholar 

  • Wahba, G. (1990). Spline models for observational data. CBMS-NSF regional conference series in applied mathematics (Vol. 59). Philadelphia: SIAM.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

Zhang, X. (2011). Support Vector Machines. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_804

Download citation

Publish with us

Policies and ethics