Abstract
During the last 2 decades support vector machine learning has become a very active field of research with a large amount of both sophisticated theoretical results and exciting real-word applications. This chapter gives a brief introduction into the basic concepts of supervised support vector learning and touches some recent developments in this broad field.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References and Further Reading
Aizerman M, Braverman E, Rozonoer L (1964) Uncovering shared structures in multiclassification. Int Conf Mach Learn 25: 821–837
Amit Y, Fink M, Srebro N, Ullman S (2007) Theoretocal foundations of the potential function method in pattern recognition learning. Automat Rem Contr 25:17–24
Anthony M, Bartlett PL (1999) Neural network learning: theoretical foundations. Cambridge University Press, Cambridge
Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272
Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404
Bartlett PL, Jordan MI, McAuliffe JD (2006) Convexity, classification, and risk bounds. J Am Stat Assoc 101:138–156
Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Methods Softw 1:23–34
Berlinet A, Thomas-Agnan C (2004) Reproducing kernel Hilbert spaces in probability and statistics. Kluwer, Dordrecht
Bishop CM (2006) Pattern recognition and machine learning. Springer, Heidelberg
Björck A (1996) Least squares problems. SIAM, Philadelphia
Bonnans JF, Shapiro A (2000) Perturbation analysis of optimization problems. Springer, New York
Boser GE, Guyon I, Vapnik V (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual ACM workshop on computational learning theory, Madison, pp 144–152
Bottou L, Chapelle L, DeCoste O, Weston J (eds) (2007) Large scale kernel machines. MIT Press, Cambridge
Boucheron S, Bousquet O, Lugosi G (2005) Theory of classification: a survey on some recent advances. ESAIM Probab Stat 9:323–375
Bousquet O, Elisseeff A (2001) Algorithmic stability and generalization performance. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in neural information processing systems 13. MIT Press, Cambridge, pp 196–202
Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the 15th international conference on machine learning, Morgan Kaufmann, San Francisco, pp 82–90
Brown M, Grundy W, Lin D, Cristianini N, Sugnet C, Furey T, Ares M, Haussler D (2000) Knowledge-based analysis of microarray gene-expression data by using support vector machines. Proc Natl Acad Sci 97(1): 262–267
Buhmann MD (2003) Radial basis functions. Cambridge University Press, Cambridge
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167
Cai J-F, Candès EJ, Shen Z (2008) A singular value thresholding algorithm for matrix completion. Technical report, UCLA computational and applied mathematics
Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75
Chang C-C, Lin C-J (2004) LIBSVM: a library for support vector machines. www.csie.ntu.edu.tw/cjlin/papers/libsvm.ps.gz
Chapelle O, Haffner P, Vapnik VN (1999) SVMs for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064
Chen P-H, Fan R-E, Lin C-J (2006) A study on SMO-type decomposition methods for support vector machines. IEEE Trans Neural Netw 17:893–908
Collobert R, Bengio S (2001) Support vector machines for large scale regression problems. J Mach Learn Res 1:143–160
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge
Cucker F, Smale S (2002) On the mathematical foundations of learning. Bull Am Math Soc 39:1–49
Cucker F, Zhou DX (2007) Learning theory: an approximation point of view. Cambridge University Press, Cambridge
Devroye L, Gyrfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, New York
Devroye LP (1982) Any discrimination rule can have an arbitrarily bad probability of error for finite sample size. IEEE Trans Pattern Anal Mach Intell 4:154–157
Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artfic Int Res 2:263–286
Dinuzzo F, Neve M, Nicolao GD, Gianazza UP (2007) On the representer theorem and equivalent degrees of freedom of SVR. J Mach Learn Res 8:2467–2495
Duda RO, Hart PE, Stork D (2001) Pattern classification, 2nd edn. Wiley, New York
Edmunds DE, Triebel H (1996) Function spaces, entropy numbers, differential operators. Cambridge University Press, Cambridge
Elisseeff A, Evgeniou A, Pontil M (2005) Stability of randomised learning algorithms. J Mach Learn Res 6:55–79
Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13(1):1–50
Fan R-E, Chen P-H, Lin C-J (2005) Working set selection using second order information for training support vector machines. J Mach Learn Res 6:1889–1918
Fasshauer GE (2007) Meshfree approximation methods with MATLAB. World Scientific, New Jersey
Fazel M, Hindi H, Boyd SP (2001) A rank minimization heuristic with application to minimum order system approximation. In: Proceedings of the American control conference, Arlington, pp 4734–4739
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugenics 7:179–188
Flake GW, Lawrence S (1999) Efficient SVM regression training with SMO. Technical report, NEC Research Institute
Gauss CF (1963) Theory of the motion of the heavenly bodies moving about the sun in conic sections. (trans: Davis CH). Dover, New York; first published 1809
Girosi F (1998) An equivalence between sparse approximation and support vector machines. Neural Comput 10(6):1455–1480
Golub GH, Loan CFV (1996) Matrix computation, 3rd edn. John Hopkins University Press, Baltimore
Gyrfi L, Kohler M, Krzyżak A, Walk H (2002) A distribution-free theory of non-parametric regression. Springer, New York
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York
Herbrich R (2001) Learning Kernel classifiers: theory and algorithms. MIT Press, Cambridge
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Huang T, Kecman V, Kopriva I, Friedman J (2006) Kernel based algorithms for mining huge data sets: supervised semi-supervised and unsupervised learning. Springer, Berlin
Jaakkola TS, Haussler D (1999) Probabilistic kerbnel regression models. In: Proceedings of the 1999 conference on artificial inteligence and statistics
Joachims T (1999) Making large-scale SVM learning practical. In: Schlkopf B, Burges C, Smola A (eds) Advances in Kernel methods-support vector learning. MIT Press, Cambridge, pp 41–56
Joachims T (2002) Learning to classify text using support vector machines. Kluwer, Boston
Kailath T (1971) RKHS approach to detection and estimation problems: Part I: deterministic signals in Gaussian noise. IEEE Trans Inform Theory 17(5):530–549
Keerthi SS, Shevade SK, Battacharyya C, Murthy KRK (2001) Improvements to Platt’s SMO algorithm for SMV classifier design. Neural Comput 13:637–649
Kimeldorf GS, Wahba G (1971) Some results on Tchebycheffian spline functions. J Math Anal Appl 33:82–95
Kolmogorov AN, Tikhomirov VM (1961) ε-entropy and ε-capacity of sets in functional spaces. Am Math Soc Trans 17:277–364
Kondor RI, Lafferty J (2002) Diffusion kernels on graphs and other discrete structures. In: Kauffman M (ed) Proceedings of the international conference on machine learning, Morgan Kaufman, San Mateo
Krige DG (1951) A statistical approach to some basic mine valuation problems on the witwatersrand. J Chem Met Mining Soc S Africa 52(6):119–139
Kuhn HW, Tucker AW (1951) Nonlinear programming. In: Proceedings of the Berkley symposium on mathematical statistics and probability, University of California Press, Berkeley, pp 482–492
Laplace PS (1816) Théorie Analytique des Probabilités, 3rd edn. Courier, Paris
LeCun Y, Jackel LD, Bottou L, Brunot A, Cortes C, Denker JS, Drucker H, Guyon I, Müller U, Säckinger E, Simard P, Vapnik V (1995) Comparison of learning algorithms for handwritten digit recognition. In: Fogelman-Souleé F, Gallinari P (eds) Proceedings of ICANN’95, vol 2. EC2 & Cie, Paris, pp 53–60
Legendre AM (1805) Nouvelles Méthodes pour la Determination des Orbites des Cométes. Courier, Paris
Leopold E, Kinderman J (2002) Text categogization with support vector machines how to represent text in input space? Mach Learn 46(1–3):223–244
Lin CJ (2001) On the convergence of the decomposition method for support vector machines. IEEE Trans Neural Netw 12:1288–1298
Lu Z, Monteiro RDC, Yuan M (2008) Convex optimization methods for dimension reduction and coefficient estimation in multivariate linear regression. Submitted to Math Program
Ma S, Goldfarb D, Chen L (2008) Fixed point and Bregman iterative methods for matrix rank minimization. Technical report 08-78, UCLA Computational and applied mathematics
Mangasarian OL (1994) Nonlinear programming. SIAM, Madison
Mangasarian OL, Musicant DR (1999) Successive overrelaxation for support vector machines. IEEE Trans Neural Netw 10:1032–1037
Matheron G (1963) Principles of geostatistics. Econ Geol 58:1246–1266
Micchelli CA (1986) Interpolation of scattered data: distance matices and conditionally positive definite functions. Constr Approx 2:11–22
Micchelli CA, Pontil M (2005) On learning vector-valued functions. Neural Comput 17: 177–204
Mitchell TM (1997) Machine learning. McGraw-Hill, Boston
Mukherjee S, Niyogi P, Poggio T, Rifkin R (2006) Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv Comput Math 25:161–193
Neumann J, Schnörr C, Steidl G (2005) Efficient wavelet adaptation for hybrid wavelet–large margin classifiers. Pattern Recogn 38: 1815–1830
Obozinski G, Taskar B, Jordan MI (2009) Joint covariate selection and joint subspace selection for multiple classification problems. Stat Comput (in press)
Osuna E, Freund R, Girosi F (1997) Training of support vector machines: an application to face detection. In: Proceedings of the CVPR’97, IEEE Computer Society, Washington, pp 130–136
Parzen E (1970) Statistical inference on time series by RKHS methods. Technical report, Department of Statistics, Stanford University
Pinkus A (1996) N-width in approximation theory. Springer, Berlin
Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel methods – support vector learning. MIT Press, Cambridge, pp 185–208
Poggio T, Girosi F (1990) Networks for approximation and learning. Proc IEEE 78(9):1481–1497
Pong TK, Tseng P, Ji S, Ye J (2009) Trace norm regularization: reformulations, algorithms and multi-task learning. University of Washington, preprint
Povzner AY (1950) A class of Hilbert function spaces. Doklady Akademii Nauk USSR 68: 817–820
Rosenblatt F (1959) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65: 386–408
Schoenberg IJ (1938) Metric spaces and completely monotone functions. Ann Math 39: 811–841
Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: Helmbold D, Williamson B (eds) Proceedings of the 14th annual conference on computational learning theory. Springer, New York, pp 416–426
Schölkopf B, Smola AJ (2002) Learning with Kernels: support vector machnes, regularization, optimization, and beyond. MIT Press, Cambridge
Shawe-Taylor J, Cristianini N (2009) Kernel methods for pattern analysis, 4th edn. Cambridge University Press, New York
Smola AJ, Schölkopf B, Müller KR (1998) The connection between regularization operators and support vector kernels. Neural Netw 11: 637–649
Spellucci P (1993) Numerische verfahren der nichtlinearen optimierung. Birkhäuser, Basel/Boston/Berlin
Srebro N, Rennie JDM, Jaakkola TS (2005) Maximum-margin matrix factorization. In NIPS, MIT Press, Cambridge, pp 1329–1336
Steinwart I (2003) Sparseness of support vector machines. J Mach Learn Res 4:1071–1105
Steinwart I, Christmann A (2008) Support vector machines. Springer, New York
Stone C (1977) Consistent nonparametric regression. Ann Stat 5:595–645
Strauss DJ, Steidl G (2002) Hybrid wavelet-support vector classification of waveforms. J Comput Appl Math 148:375–400
Strauss DJ, Steidl G, Delb D (2003) Feature extraction by shape-adapted local discriminant bases. Signal Process 83:359–376
Sutton RS, Barton AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Suykens JAK, Gestel TV, Brabanter JD, Moor BD, Vandewalle J (2002) Least squares support vector machines. World Scientific, Singapore
Suykens JAK, Vandevalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Tao PD, An LTH (1998) A d.c. optimization algorithm for solving the trust-region subproblem. SIAM J Optimiz 8(2):476–505
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1): 267–288
Tikhonov AN, Arsenin VY (1977) Solution of ill-posed problems. Winston, Washington
Toh K-C, Yun S (2009) An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Technical report, Department of Mathematics, National University of Singapore, Singapore
Tsypkin Y (1971) Adaptation and learning in automatic systems. Academic, New York
Vapnik V (1998) Statistical learning theory. Wiley, New York
Vapnik VN (1982) Estimation of dependicies based on empirical data. Springer, New York
Vapnik VN, Chervonenkis A (1974) Theory of pattern regognition (in Russian). Nauka, Moscow; German translation: Theorie der Zeichenerkennung, Akademie-Verlag, Berlin, 1979 edition
Vapnik VN, Lerner A (1963) Pattern recognition using generalized portrait method. Automat Rem Contr 24:774–780
Vidyasagar M (2002) A theory of learning and generalization: with applications to neural networks and control systems. 2nd edn. Springer, London
Viola P, Jones M (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154
Vito ED, Rosasco L, Caponnetto A, Piana M, Verri A (2004) Some properties of regularized kernel methods. J Mach Learn Res 5:1363–1390
Wahba G (1990) Spline models for observational data. SIAM, New York
Weimer M, Karatzoglou A, Smola A (2008) Improving maximum margin matrix factorization. Mach Learn 72(3):263–276
Wendland H (2005) Scattered data approximation. Cambridge University Press, Cambridge
Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of the zero-norm with linear models and kernel methods. J Mach Learn Res 3: 1439–1461
Weston J, Watkins C (1999) Multi-class support vector machines. In: Verlysen M (ed) Proceedings of ESANN’99, D-Facto Publications, Brussels
Wolfe P (1961) Duality theorem for nonlinear programming. Q Appl Math 19:239–244
Zdenek D (2009) Optimal quadratic programming algorithms with applications to variational inequalities. Springer, New York
Zhang T (2004) Statistical behaviour and consistency of classification methods based on convex risk minimization. Ann Stat 32:56–134
Zoutendijk G (1960) Methods of feasible directions. A study in linear and nonlinear programming. Elsevier, Amsterdam
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Steidl, G. (2011). Supervised Learning by Support Vector Machines. In: Scherzer, O. (eds) Handbook of Mathematical Methods in Imaging. Springer, New York, NY. https://doi.org/10.1007/978-0-387-92920-0_22
Download citation
DOI: https://doi.org/10.1007/978-0-387-92920-0_22
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-92919-4
Online ISBN: 978-0-387-92920-0
eBook Packages: Mathematics and StatisticsReference Module Computer Science and Engineering