Supervised Learning by Support Vector Machines

Steidl, Gabriele

doi:10.1007/978-3-642-27795-5_22-5

Supervised Learning by Support Vector Machines

Gabriele Steidl²

Living reference work entry
First Online: 19 November 2014

496 Accesses

Abstract

During the last two decades, support vector machine learning has become a very active field of research with a large amount of both sophisticated theoretical results and exciting real-world applications. This paper gives a brief introduction into the basic concepts of supervised support vector learning and touches some recent developments in this broad field.

This is a preview of subscription content, log in via an institution.

References

Aizerman, M., Braverman, E., Rozonoer, L.: Uncovering shared structures in multiclass classification. In: International Conference on Machine Learning, pp. 821–837 (1964)
Google Scholar
Amit, Y., Fink, M., Srebro, N., Ullman, S.: Theoretocal foundations of the potential function method in pattern recognition learning. Autom. Remote Control 25, 17–24 (2007)
Google Scholar
Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)
MATH Google Scholar
Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)
Google Scholar
Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68, 337–404 (1950)
MATH MathSciNet Google Scholar
Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Convexity, classification, and risk bounds. J. Am. Stat. Assoc. 101, 138–156 (2006)
MATH MathSciNet Google Scholar
Bennett, K.P., Mangasarian, O.L.: Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Softw. 1, 23–34 (1992)
Google Scholar
Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer, Dordrecht (2004)
MATH Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Björck, A.: Least Squares Problems. SIAM, Philadelphia (1996)
MATH Google Scholar
Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer, New York (2000)
MATH Google Scholar
Boser, G.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, Madison, pp. 144–152 (1992)
Google Scholar
Bottou, L., Chapelle, L., DeCoste, O., Weston, J. (eds.): Large Scale Kernel Machines. MIT, Cambridge (2007)
Google Scholar
Boucheron, S., Bousquet, O., Lugosi, G.: Theory of classification: a survey on some recent advances. ESAIM Probab. Stat. 9, 323–375 (2005)
MATH MathSciNet Google Scholar
Bousquet, O., Elisseeff, A.: Algorithmic stability and generalization performance. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, pp. 196–202. MIT, Cambridge (2001)
Google Scholar
Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Proceedings of the 15th International Conference on Machine Learning, Madison, pp. 82–90. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Brown, M., Grundy, W., Lin, D., Cristianini, N., Sugnet, C., Furey, T., Ares, M., Haussler, D.: Knowledge-based analysis of microarray gene-expression data by using support vector machines. Proc. Natl. Acad. Sci. 97(1), 262–267 (2000)
Google Scholar
Buhmann, M.D.: Radial Basis Functions. Cambridge University Press, Cambridge (2003)
MATH Google Scholar
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)
Google Scholar
Cai, J.-F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. Technical report, UCLA Computational and Applied Mathematics (2008)
Google Scholar
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
MathSciNet Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. www.csie.ntu.edu.tw/~cjlin/papers/libsvm.ps.gz (2004)
Chapelle, O., Haffner, P., Vapnik, V.N.: SVMs for histogram-based image classification. IEEE Trans. Neural Netw. 10(5), 1055–1064 (1999)
Google Scholar
Chen, P.-H., Fan, R.-E., Lin, C.-J.: A study on SMO-type decomposition methods for support vector machines. IEEE Trans. Neural Netw. 17, 893–908 (2006)
Google Scholar
Collobert, R., Bengio, S.: Support vector machines for large scale regression problems. J. Mach. Learn. Res. 1, 143–160 (2001)
MathSciNet Google Scholar
Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Google Scholar
Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Am. Math. Soc. 39, 1–49 (2002)
MATH MathSciNet Google Scholar
Cucker, F., Zhou, D.X.: Learning Theory: An Approximation Point of View. Cambridge University Press, Cambridge (2007)
Google Scholar
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996)
MATH Google Scholar
Devroye, L.P.: Any discrimination rule can have an arbitrarily bad probability of error for finite sample size. IEEE Trans. Pattern Anal. Mach. Intell. 4, 154–157 (1982)
MATH Google Scholar
Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 2, 263–286 (1995)
MATH Google Scholar
Dinuzzo, F., Neve, M., Nicolao, G.D., Gianazza, U.P.: On the representer theorem and equivalent degrees of freedom of SVR. J. Mach. Learn. Res. 8, 2467–2495 (2007)
MATH MathSciNet Google Scholar
Duda, R.O., Hart, P.E., Stork, D.: Pattern Classification, 2nd edn. Wiley, New York (2001)
MATH Google Scholar
Edmunds, D.E., Triebel, H.: Function Spaces, Entropy Numbers, Differential Operators. Cambridge University Press, Cambridge (1996)
MATH Google Scholar
Elisseeff, A., Evgeniou, A., Pontil, M.: Stability of randomised learning algorithms. J. Mach. Learn. Res. 6, 55–79 (2005)
MATH MathSciNet Google Scholar
Evgeniou, T., Pontil, M., Poggio, T.: Regularization networks and support vector machines. Adv. Comput. Math. 13(1), 1–50 (2000)
MATH MathSciNet Google Scholar
Fan, R.-E., Chen, P.-H., Lin, C.-J.: Working set selection using second order information for training support vector machines. J. Mach. Learn. Res. 6, 1889–1918 (2005)
MATH MathSciNet Google Scholar
Fasshauer, G.E.: Meshfree Approximation Methods with MATLAB. World Scientific, Hackensack (2007)
MATH Google Scholar
Fazel, M., Hindi, H., Boyd, S.P.: A rank minimization heuristic with application to minimum order system approximation. In: Proceedings of the American Control Conference, pp. 4734–4739 (2001)
Google Scholar
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936)
Google Scholar
Flake, G.W., Lawrence, S.: Efficient SVM regression training with SMO. Technical report, NEC Research Institute (1999)
Google Scholar
Gauss, C.F.: Theory of the Motion of the Heavenly Bodies Moving about the Sun in Conic Sections (C. H. Davis, Trans.) Dover, New York (1963). First published 1809
Google Scholar
Girosi, F.: An equivalence between sparse approximation and support vector machines. Neural Comput. 10(6), 1455–1480 (1998)
Google Scholar
Golub, G.H., Loan, C.F.V.: Matrix Computation, 3rd edn. John Hopkins University Press, Baltimore (1996)
Google Scholar
Györfi, L., Kohler, M., Krzyżak, A., Walk, H.: A Distribution-Free Theory of Nonparametric Regression. Springer, New York (2002)
MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)
MATH Google Scholar
Herbrich, R.: Learning Kernel Classifiers: Theory and Algorithms. MIT, Cambridge (2001)
Google Scholar
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
MATH MathSciNet Google Scholar
Huang, T., Kecman, V., Kopriva, I., Friedman, J.: Kernel Based Algorithms for Mining Huge Data Sets: Supervised, Semi-supervised and Unsupervised Learning. Springer, Berlin (2006)
MATH Google Scholar
Jaakkola, T.S., Haussler, D.: Probabilistic kerbnel regression models. In: Proceedings of the 1999 Conference on Artificial Intelligence and Statistics, Fort Lauderdale (1999)
Google Scholar
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods-Support Vector Learning, pp. 41–56. MIT, Cambridge (1999)
Google Scholar
Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer Academic, Boston (2002)
Google Scholar
Kailath, T.: RKHS approach to detection and estimation problems: part i: deterministic signals in Gaussian noise. IEEE Trans. Inf. Theory 17(5), 530–549 (1971)
MATH MathSciNet Google Scholar
Keerthi, S.S., Shevade, S.K., Battacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO algorithm for SMV classifier design. Neural Comput. 13, 637–649 (2001)
MATH Google Scholar
Kimeldorf, G.S., Wahba, G.: Some results on Tchebycheffian spline functions. J. Math. Anal. Appl. 33, 82–95 (1971)
MATH MathSciNet Google Scholar
Kolmogorov, A.N., Tikhomirov, V.M.: \(\varepsilon\)-entropy and \(\varepsilon\)-capacity of sets in functional spaces. Am. Math. Soc. Transl. 17, 277–364 (1961)
Google Scholar
Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete structures. In: Kauffman, M. (ed.) Proceedings of the International Conference on Machine Learning (2002)
Google Scholar
Krige, D.G.: A statistical approach to some basic mine valuation problems on the Witwatersrand. J. Chem. Metall. Min. Soc. S. Afr. 52(6), 119–139 (1951)
Google Scholar
Kuhn, H.W., Tucker, A.W.: Nonlinear programming. In: Proceedings Berkley Symposium on Mathematical Statistics and Probability, pp. 482–492. University of California Press (1951)
Google Scholar
Laplace, P.S.: Théorie Analytique des Probabilités, 3rd edn. Courier, Paris (1816)
Google Scholar
LeCun, Y., Jackel, L.D., Bottou, L., Brunot, A., Cortes, C., Denker, J.S., Drucker, H., Guyon, I., Müller, U., Säckinger, E., Simard, P., Vapnik, V.: Comparison of learning algorithms for handwritten digit recognition. In: Fogelman-Souleé, F., Gallinari, P. (eds.) Proceedings ICANN’95, Paris, vol. 2, pp. 53–60 (1995)
Google Scholar
Legendre, A.M.: Nouvelles Méthodes pour la Determination des Orbites des Cométes. Courier, Paris (1805)
Google Scholar
Leopold, E., Kinderman, J.: Text categogization with support vector machines how to represent text in input space? Mach. Learn. 46(1–3), 223–244 (2002)
Google Scholar
Lin, C.J.: On the convergence of the decomposition method for support vector machines. IEEE Trans. Neural Netw. 12, 1288–1298 (2001)
Google Scholar
Lu, Z., Monteiro, R.D.C., Yuan, M.: Convex optimization methods for dimension reduction and coefficient estimation in multivariate linear regression. Math. Program., 163–194 (2012)
Google Scholar
Ma, S., Goldfarb, D., Chen, L.: Fixed point and Bregman iterative methods for matrix rank minimization. Technical report 08-78, UCLA Computational and Applied Mathematics (2008)
Google Scholar
Mangasarian, O.L.: Nonlinear Programming. SIAM, Madison (1994)
MATH Google Scholar
Mangasarian, O.L., Musicant, D.R.: Successive overrelaxation for support vector machines. IEEE Trans. Neural Netw. 10, 1032–1037 (1999)
Google Scholar
Matheron, G.: Principles of geostatistics. Econ. Geol. 58, 1246–1266 (1963)
Google Scholar
Micchelli, C.A.: Interpolation of scattered data: distance matices and conditionally positive definite functions. Constr. Approx. 2, 11–22 (1986)
MATH MathSciNet Google Scholar
Micchelli, C.A., Pontil, M.: On learning vector-valued functions. Neural Comput. 17, 177–204 (2005)
MATH MathSciNet Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, Boston (1997)
MATH Google Scholar
Mukherjee, S., Niyogi, P., Poggio, T., Rifkin, R.: Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv. Comput. Math. 25, 161–193 (2006)
MATH MathSciNet Google Scholar
Neumann, J., Schnörr, C., Steidl, G.: Efficient wavelet adaptation for hybrid wavelet–large margin classifiers. Pattern Recognit. 38, 1815–1830 (2005)
MATH Google Scholar
Obozinski, G., Taskar, B., Jordan, M.I.: Joint covariate selection and joint subspace selection for multiple classification problems. Stat. Comput. 20, 231–252 (2010)
MathSciNet Google Scholar
Osuna, E., Freund, R., Girosi, F.: Training of support vector machines: an application to face detection. In: Proceedings of the CVPR’97, San Juan, pp. 130–136. IEEE Computer Society, Washington, DC (1997)
Google Scholar
Parzen, E.: Statistical inference on time series by RKHS methods. Technical report, Department of Statistics, Stanford University (1970)
Google Scholar
Pinkus, A.: N-width in Approximation Theory. Springer, Berlin-Heidelberg-New York-Tokyo (1996)
Google Scholar
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods – Support Vector Learning, pp. 185–208. MIT, Cambridge (1999)
Google Scholar
Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990)
Google Scholar
Pong, T.K., Tseng, P., Ji, S., Ye, J.: Trace norm regularization: reformulations, algorithms and multi-task learning. SIAM J. Optim. 20, 3465–3489 (2010)
Google Scholar
Povzner, A.Y.: A class of Hilbert function spaces. Dokl. Akad. Nauk USSR 68, 817–820 (1950)
MathSciNet Google Scholar
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1959)
Google Scholar
Schoenberg, I.J.: Metric spaces and completely monotone functions. Ann. Math. 39, 811–841 (1938)
MathSciNet Google Scholar
Schölkopf, B., Herbrich, R., Smola, A.J.: A generalized representer theorem. In: Helmbold, D., Williamson, B. (eds.) Proceedings of the 14th Annual Conference on Computational Learning Theory, Amsterdam, pp. 416–426. Springer, New York (2001)
Google Scholar
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT, Cambridge (2002)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis, 4th edn. Cambridge University Press, Cambridge (2009)
Google Scholar
Smola, A.J., Schölkopf, B., Müller, K.R.: The connection between regularization operators and support vector kernels. Neural Netw. 11, 637–649 (1998)
Google Scholar
Spellucci, P.: Numerische Verfahren der Nichtlinearen Optimierung. Birkhäuser, Basel/Boston/Berlin (1993)
MATH Google Scholar
Srebro, N., Rennie, J.D.M., Jaakkola, T.S.: Maximum-margin matrix factorization. In: NIPS, pp. 1329–1336 (2005)
Google Scholar
Steinwart, I.: Sparseness of support vector machines. J. Mach. Learn. Res. 4, 1071–1105 (2003)
MathSciNet Google Scholar
Steinwart, I., Christmann, A.: Support Vector Machines. Springer, New York (2008)
MATH Google Scholar
Stone, C.: Consistent nonparametric regression. Ann. Stat. 5, 595–645 (1977)
MATH Google Scholar
Strauss, D.J., Steidl, G.: Hybrid wavelet-support vector classification of waveforms. J. Comput. Appl. Math. 148, 375–400 (2002)
MATH MathSciNet Google Scholar
Strauss, D.J., Steidl, G., Delb, D.: Feature extraction by shape-adapted local discriminant bases. Signal Process. 83, 359–376 (2003)
MATH Google Scholar
Sutton, R.S., Barton, A.G.: Reinforcement Learning: An Introduction. MIT, Cambridge (1998)
Google Scholar
Suykens, J.A.K., Gestel, T.V., Brabanter, J.D., Moor, B.D., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific, Singapore (2002)
MATH Google Scholar
Suykens, J.A.K., Vandevalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
Google Scholar
Tao, P.D., An, L.T.H.: A D.C. optimization algorithm for solving the trust-region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)
MATH MathSciNet Google Scholar
Tikhonov, A.N., Arsenin, V.Y.: Solution of Ill-Posed Problems. Winston, Washington, DC (1977)
Google Scholar
Toh, K.-C., Yun, S.: An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Technical report, Department of Mathematics, National University of Singapore, Singapore (2009)
Google Scholar
Tsypkin, Y.: Adaptation and Learning in Automatic Systems. Academic, New York (1971)
Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Vapnik, V.N.: Estimation of Dependicies Based on Empirical Data. Springer, New York (1982)
Google Scholar
Vapnik, V.N., Chervonenkis, A.: Theory of Pattern Recognition (in Russian). Nauka, Moscow (1974) (German translation: Theorie der Zeichenerkennung, Akademie-Verlag, Berlin, 1979 edition)
Google Scholar
Vapnik, V.N., Lerner, A.: Pattern recognition using generalized portrait method. Autom. Remote Control 24, 774–780 (1963)
Google Scholar
Vidyasagar, M.: A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems, 2nd edn. Springer, London (2002)
Google Scholar
Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)
Google Scholar
Vito, E.D., Rosasco, L., Caponnetto, A., Piana, M., Verri, A.: Some properties of regularized kernel methods. J. Mach. Learn. Res. 5, 1363–1390 (2004)
MATH Google Scholar
Wahba, G.: Spline Models for Observational Data. SIAM, New York (1990)
MATH Google Scholar
Weimer, M., Karatzoglou, A., Smola, A.: Improving maximum margin matrix factorization. Mach. Learn. 72(3), 263–276 (2008)
Google Scholar
Wendland, H.: Scattered Data Approximation. Cambridge University Press, Cambridge (2005)
MATH Google Scholar
Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.: Use of the zero-norm with linear models and kernel methods. J. Mach. Learn. Res. 3, 1439–1461 (2003)
MATH MathSciNet Google Scholar
Weston, J., Watkins, C.: Multi-class support vector machines. In: Verlysen, M. (ed.) Proceedings ESANN’99, Brussels. D-Facto Publications (1999)
Google Scholar
Wolfe, P.: Duality theorem for nonlinear programming. Q. Appl. Math. 19, 239–244 (1961)
MATH MathSciNet Google Scholar
Zdenek, D.: Optimal Quadratic Programming Algorithms with Applications to Variational Inequalities. Springer, New York (2009)
MATH Google Scholar
Zhang, T.: Statistical behaviour and consistency of classification methods based on convex risk minimization. Ann. Stat. 32, 56–134 (2004)
MATH Google Scholar
Zoutendijk, G.: Methods of Feasible Directions. A Study in Linear and Nonlinear Programming. Elsevier, Amsterdam (1960)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of Kaiserslautern, Kaiserslautern, Germany
Gabriele Steidl

Authors

Gabriele Steidl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriele Steidl .

Editor information

Editors and Affiliations

Computational Science Center, University of Vienna, Vienna, Austria
Otmar Scherzer

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Steidl, G. (2014). Supervised Learning by Support Vector Machines. In: Scherzer, O. (eds) Handbook of Mathematical Methods in Imaging. Springer, New York, NY. https://doi.org/10.1007/978-3-642-27795-5_22-5

Download citation

DOI: https://doi.org/10.1007/978-3-642-27795-5_22-5
Received: 14 August 2014
Accepted: 14 August 2014
Published: 19 November 2014
Publisher Name: Springer, New York, NY
Online ISBN: 978-3-642-27795-5
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics