Supervised Learning by Support Vector Machines

Steidl, Gabriele

doi:10.1007/978-0-387-92920-0_22

Gabriele Steidl

5818 Accesses
3 Citations

Abstract

During the last 2 decades support vector machine learning has become a very active field of research with a large amount of both sophisticated theoretical results and exciting real-word applications. This chapter gives a brief introduction into the basic concepts of supervised support vector learning and touches some recent developments in this broad field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 679.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References and Further Reading

Aizerman M, Braverman E, Rozonoer L (1964) Uncovering shared structures in multiclassification. Int Conf Mach Learn 25: 821–837
MathSciNet Google Scholar
Amit Y, Fink M, Srebro N, Ullman S (2007) Theoretocal foundations of the potential function method in pattern recognition learning. Automat Rem Contr 25:17–24
Google Scholar
Anthony M, Bartlett PL (1999) Neural network learning: theoretical foundations. Cambridge University Press, Cambridge
Book MATH Google Scholar
Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272
Article Google Scholar
Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404
Article MathSciNet MATH Google Scholar
Bartlett PL, Jordan MI, McAuliffe JD (2006) Convexity, classification, and risk bounds. J Am Stat Assoc 101:138–156
Article MathSciNet MATH Google Scholar
Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Methods Softw 1:23–34
Article Google Scholar
Berlinet A, Thomas-Agnan C (2004) Reproducing kernel Hilbert spaces in probability and statistics. Kluwer, Dordrecht
Book MATH Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, Heidelberg
MATH Google Scholar
Björck A (1996) Least squares problems. SIAM, Philadelphia
Book MATH Google Scholar
Bonnans JF, Shapiro A (2000) Perturbation analysis of optimization problems. Springer, New York
MATH Google Scholar
Boser GE, Guyon I, Vapnik V (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual ACM workshop on computational learning theory, Madison, pp 144–152
Google Scholar
Bottou L, Chapelle L, DeCoste O, Weston J (eds) (2007) Large scale kernel machines. MIT Press, Cambridge
Google Scholar
Boucheron S, Bousquet O, Lugosi G (2005) Theory of classification: a survey on some recent advances. ESAIM Probab Stat 9:323–375
Article MathSciNet MATH Google Scholar
Bousquet O, Elisseeff A (2001) Algorithmic stability and generalization performance. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in neural information processing systems 13. MIT Press, Cambridge, pp 196–202
Google Scholar
Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the 15th international conference on machine learning, Morgan Kaufmann, San Francisco, pp 82–90
Google Scholar
Brown M, Grundy W, Lin D, Cristianini N, Sugnet C, Furey T, Ares M, Haussler D (2000) Knowledge-based analysis of microarray gene-expression data by using support vector machines. Proc Natl Acad Sci 97(1): 262–267
Article Google Scholar
Buhmann MD (2003) Radial basis functions. Cambridge University Press, Cambridge
Book MATH Google Scholar
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167
Article Google Scholar
Cai J-F, Candès EJ, Shen Z (2008) A singular value thresholding algorithm for matrix completion. Technical report, UCLA computational and applied mathematics
Google Scholar
Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75
Article MathSciNet Google Scholar
Chang C-C, Lin C-J (2004) LIBSVM: a library for support vector machines. www.csie.ntu.edu.tw/cjlin/papers/libsvm.ps.gz
Chapelle O, Haffner P, Vapnik VN (1999) SVMs for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064
Article Google Scholar
Chen P-H, Fan R-E, Lin C-J (2006) A study on SMO-type decomposition methods for support vector machines. IEEE Trans Neural Netw 17:893–908
Article Google Scholar
Collobert R, Bengio S (2001) Support vector machines for large scale regression problems. J Mach Learn Res 1:143–160
MathSciNet Google Scholar
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297
MATH Google Scholar
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge
Google Scholar
Cucker F, Smale S (2002) On the mathematical foundations of learning. Bull Am Math Soc 39:1–49
Article MathSciNet MATH Google Scholar
Cucker F, Zhou DX (2007) Learning theory: an approximation point of view. Cambridge University Press, Cambridge
Book Google Scholar
Devroye L, Gyrfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, New York
MATH Google Scholar
Devroye LP (1982) Any discrimination rule can have an arbitrarily bad probability of error for finite sample size. IEEE Trans Pattern Anal Mach Intell 4:154–157
Article MATH Google Scholar
Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artfic Int Res 2:263–286
MATH Google Scholar
Dinuzzo F, Neve M, Nicolao GD, Gianazza UP (2007) On the representer theorem and equivalent degrees of freedom of SVR. J Mach Learn Res 8:2467–2495
MathSciNet MATH Google Scholar
Duda RO, Hart PE, Stork D (2001) Pattern classification, 2nd edn. Wiley, New York
MATH Google Scholar
Edmunds DE, Triebel H (1996) Function spaces, entropy numbers, differential operators. Cambridge University Press, Cambridge
Book MATH Google Scholar
Elisseeff A, Evgeniou A, Pontil M (2005) Stability of randomised learning algorithms. J Mach Learn Res 6:55–79
MathSciNet MATH Google Scholar
Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13(1):1–50
Article MathSciNet MATH Google Scholar
Fan R-E, Chen P-H, Lin C-J (2005) Working set selection using second order information for training support vector machines. J Mach Learn Res 6:1889–1918
MathSciNet MATH Google Scholar
Fasshauer GE (2007) Meshfree approximation methods with MATLAB. World Scientific, New Jersey
MATH Google Scholar
Fazel M, Hindi H, Boyd SP (2001) A rank minimization heuristic with application to minimum order system approximation. In: Proceedings of the American control conference, Arlington, pp 4734–4739
Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugenics 7:179–188
Article Google Scholar
Flake GW, Lawrence S (1999) Efficient SVM regression training with SMO. Technical report, NEC Research Institute
Google Scholar
Gauss CF (1963) Theory of the motion of the heavenly bodies moving about the sun in conic sections. (trans: Davis CH). Dover, New York; first published 1809
Google Scholar
Girosi F (1998) An equivalence between sparse approximation and support vector machines. Neural Comput 10(6):1455–1480
Article Google Scholar
Golub GH, Loan CFV (1996) Matrix computation, 3rd edn. John Hopkins University Press, Baltimore
Google Scholar
Gyrfi L, Kohler M, Krzyżak A, Walk H (2002) A distribution-free theory of non-parametric regression. Springer, New York
Book Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York
MATH Google Scholar
Herbrich R (2001) Learning Kernel classifiers: theory and algorithms. MIT Press, Cambridge
Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
MathSciNet MATH Google Scholar
Huang T, Kecman V, Kopriva I, Friedman J (2006) Kernel based algorithms for mining huge data sets: supervised semi-supervised and unsupervised learning. Springer, Berlin
MATH Google Scholar
Jaakkola TS, Haussler D (1999) Probabilistic kerbnel regression models. In: Proceedings of the 1999 conference on artificial inteligence and statistics
Google Scholar
Joachims T (1999) Making large-scale SVM learning practical. In: Schlkopf B, Burges C, Smola A (eds) Advances in Kernel methods-support vector learning. MIT Press, Cambridge, pp 41–56
Google Scholar
Joachims T (2002) Learning to classify text using support vector machines. Kluwer, Boston
Book Google Scholar
Kailath T (1971) RKHS approach to detection and estimation problems: Part I: deterministic signals in Gaussian noise. IEEE Trans Inform Theory 17(5):530–549
Article MathSciNet MATH Google Scholar
Keerthi SS, Shevade SK, Battacharyya C, Murthy KRK (2001) Improvements to Platt’s SMO algorithm for SMV classifier design. Neural Comput 13:637–649
Article MATH Google Scholar
Kimeldorf GS, Wahba G (1971) Some results on Tchebycheffian spline functions. J Math Anal Appl 33:82–95
Article MathSciNet MATH Google Scholar
Kolmogorov AN, Tikhomirov VM (1961) ε-entropy and ε-capacity of sets in functional spaces. Am Math Soc Trans 17:277–364
Google Scholar
Kondor RI, Lafferty J (2002) Diffusion kernels on graphs and other discrete structures. In: Kauffman M (ed) Proceedings of the international conference on machine learning, Morgan Kaufman, San Mateo
Google Scholar
Krige DG (1951) A statistical approach to some basic mine valuation problems on the witwatersrand. J Chem Met Mining Soc S Africa 52(6):119–139
Google Scholar
Kuhn HW, Tucker AW (1951) Nonlinear programming. In: Proceedings of the Berkley symposium on mathematical statistics and probability, University of California Press, Berkeley, pp 482–492
Google Scholar
Laplace PS (1816) Théorie Analytique des Probabilités, 3rd edn. Courier, Paris
Google Scholar
LeCun Y, Jackel LD, Bottou L, Brunot A, Cortes C, Denker JS, Drucker H, Guyon I, Müller U, Säckinger E, Simard P, Vapnik V (1995) Comparison of learning algorithms for handwritten digit recognition. In: Fogelman-Souleé F, Gallinari P (eds) Proceedings of ICANN’95, vol 2. EC2 & Cie, Paris, pp 53–60
Google Scholar
Legendre AM (1805) Nouvelles Méthodes pour la Determination des Orbites des Cométes. Courier, Paris
Google Scholar
Leopold E, Kinderman J (2002) Text categogization with support vector machines how to represent text in input space? Mach Learn 46(1–3):223–244
Google Scholar
Lin CJ (2001) On the convergence of the decomposition method for support vector machines. IEEE Trans Neural Netw 12:1288–1298
Article Google Scholar
Lu Z, Monteiro RDC, Yuan M (2008) Convex optimization methods for dimension reduction and coefficient estimation in multivariate linear regression. Submitted to Math Program
Google Scholar
Ma S, Goldfarb D, Chen L (2008) Fixed point and Bregman iterative methods for matrix rank minimization. Technical report 08-78, UCLA Computational and applied mathematics
Google Scholar
Mangasarian OL (1994) Nonlinear programming. SIAM, Madison
Book MATH Google Scholar
Mangasarian OL, Musicant DR (1999) Successive overrelaxation for support vector machines. IEEE Trans Neural Netw 10:1032–1037
Article Google Scholar
Matheron G (1963) Principles of geostatistics. Econ Geol 58:1246–1266
Article Google Scholar
Micchelli CA (1986) Interpolation of scattered data: distance matices and conditionally positive definite functions. Constr Approx 2:11–22
Article MathSciNet MATH Google Scholar
Micchelli CA, Pontil M (2005) On learning vector-valued functions. Neural Comput 17: 177–204
Article MathSciNet MATH Google Scholar
Mitchell TM (1997) Machine learning. McGraw-Hill, Boston
MATH Google Scholar
Mukherjee S, Niyogi P, Poggio T, Rifkin R (2006) Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv Comput Math 25:161–193
Article MathSciNet MATH Google Scholar
Neumann J, Schnörr C, Steidl G (2005) Efficient wavelet adaptation for hybrid wavelet–large margin classifiers. Pattern Recogn 38: 1815–1830
Article MATH Google Scholar
Obozinski G, Taskar B, Jordan MI (2009) Joint covariate selection and joint subspace selection for multiple classification problems. Stat Comput (in press)
Google Scholar
Osuna E, Freund R, Girosi F (1997) Training of support vector machines: an application to face detection. In: Proceedings of the CVPR’97, IEEE Computer Society, Washington, pp 130–136
Google Scholar
Parzen E (1970) Statistical inference on time series by RKHS methods. Technical report, Department of Statistics, Stanford University
Google Scholar
Pinkus A (1996) N-width in approximation theory. Springer, Berlin
Google Scholar
Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel methods – support vector learning. MIT Press, Cambridge, pp 185–208
Google Scholar
Poggio T, Girosi F (1990) Networks for approximation and learning. Proc IEEE 78(9):1481–1497
Article Google Scholar
Pong TK, Tseng P, Ji S, Ye J (2009) Trace norm regularization: reformulations, algorithms and multi-task learning. University of Washington, preprint
Google Scholar
Povzner AY (1950) A class of Hilbert function spaces. Doklady Akademii Nauk USSR 68: 817–820
MathSciNet Google Scholar
Rosenblatt F (1959) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65: 386–408
Article Google Scholar
Schoenberg IJ (1938) Metric spaces and completely monotone functions. Ann Math 39: 811–841
Article MathSciNet Google Scholar
Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: Helmbold D, Williamson B (eds) Proceedings of the 14th annual conference on computational learning theory. Springer, New York, pp 416–426
Google Scholar
Schölkopf B, Smola AJ (2002) Learning with Kernels: support vector machnes, regularization, optimization, and beyond. MIT Press, Cambridge
Google Scholar
Shawe-Taylor J, Cristianini N (2009) Kernel methods for pattern analysis, 4th edn. Cambridge University Press, New York
Google Scholar
Smola AJ, Schölkopf B, Müller KR (1998) The connection between regularization operators and support vector kernels. Neural Netw 11: 637–649
Article Google Scholar
Spellucci P (1993) Numerische verfahren der nichtlinearen optimierung. Birkhäuser, Basel/Boston/Berlin
Book MATH Google Scholar
Srebro N, Rennie JDM, Jaakkola TS (2005) Maximum-margin matrix factorization. In NIPS, MIT Press, Cambridge, pp 1329–1336
Google Scholar
Steinwart I (2003) Sparseness of support vector machines. J Mach Learn Res 4:1071–1105
MathSciNet Google Scholar
Steinwart I, Christmann A (2008) Support vector machines. Springer, New York
MATH Google Scholar
Stone C (1977) Consistent nonparametric regression. Ann Stat 5:595–645
Article MATH Google Scholar
Strauss DJ, Steidl G (2002) Hybrid wavelet-support vector classification of waveforms. J Comput Appl Math 148:375–400
Article MathSciNet MATH Google Scholar
Strauss DJ, Steidl G, Delb D (2003) Feature extraction by shape-adapted local discriminant bases. Signal Process 83:359–376
Article MATH Google Scholar
Sutton RS, Barton AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Google Scholar
Suykens JAK, Gestel TV, Brabanter JD, Moor BD, Vandewalle J (2002) Least squares support vector machines. World Scientific, Singapore
Book MATH Google Scholar
Suykens JAK, Vandevalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Article Google Scholar
Tao PD, An LTH (1998) A d.c. optimization algorithm for solving the trust-region subproblem. SIAM J Optimiz 8(2):476–505
Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1): 267–288
MathSciNet MATH Google Scholar
Tikhonov AN, Arsenin VY (1977) Solution of ill-posed problems. Winston, Washington
Google Scholar
Toh K-C, Yun S (2009) An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Technical report, Department of Mathematics, National University of Singapore, Singapore
Google Scholar
Tsypkin Y (1971) Adaptation and learning in automatic systems. Academic, New York
Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Vapnik VN (1982) Estimation of dependicies based on empirical data. Springer, New York
Google Scholar
Vapnik VN, Chervonenkis A (1974) Theory of pattern regognition (in Russian). Nauka, Moscow; German translation: Theorie der Zeichenerkennung, Akademie-Verlag, Berlin, 1979 edition
Google Scholar
Vapnik VN, Lerner A (1963) Pattern recognition using generalized portrait method. Automat Rem Contr 24:774–780
Google Scholar
Vidyasagar M (2002) A theory of learning and generalization: with applications to neural networks and control systems. 2nd edn. Springer, London
Google Scholar
Viola P, Jones M (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154
Article Google Scholar
Vito ED, Rosasco L, Caponnetto A, Piana M, Verri A (2004) Some properties of regularized kernel methods. J Mach Learn Res 5:1363–1390
MATH Google Scholar
Wahba G (1990) Spline models for observational data. SIAM, New York
Book MATH Google Scholar
Weimer M, Karatzoglou A, Smola A (2008) Improving maximum margin matrix factorization. Mach Learn 72(3):263–276
Article Google Scholar
Wendland H (2005) Scattered data approximation. Cambridge University Press, Cambridge
MATH Google Scholar
Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of the zero-norm with linear models and kernel methods. J Mach Learn Res 3: 1439–1461
MATH Google Scholar
Weston J, Watkins C (1999) Multi-class support vector machines. In: Verlysen M (ed) Proceedings of ESANN’99, D-Facto Publications, Brussels
Google Scholar
Wolfe P (1961) Duality theorem for nonlinear programming. Q Appl Math 19:239–244
MathSciNet MATH Google Scholar
Zdenek D (2009) Optimal quadratic programming algorithms with applications to variational inequalities. Springer, New York
MATH Google Scholar
Zhang T (2004) Statistical behaviour and consistency of classification methods based on convex risk minimization. Ann Stat 32:56–134
Article MATH Google Scholar
Zoutendijk G (1960) Methods of feasible directions. A study in linear and nonlinear programming. Elsevier, Amsterdam
Google Scholar

Download references

Authors

Gabriele Steidl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computational Science Center, University of Vienna, Nordbergstrasse 15, Vienna, Austria
Otmar Scherzer
RICAM, Austrian Academy of Sciences, Linz, Austria
Otmar Scherzer

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Steidl, G. (2011). Supervised Learning by Support Vector Machines. In: Scherzer, O. (eds) Handbook of Mathematical Methods in Imaging. Springer, New York, NY. https://doi.org/10.1007/978-0-387-92920-0_22

Download citation

DOI: https://doi.org/10.1007/978-0-387-92920-0_22
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-92919-4
Online ISBN: 978-0-387-92920-0
eBook Packages: Mathematics and StatisticsReference Module Computer Science and Engineering

Publish with us

Policies and ethics