Abstract
During the last two decades, support vector machine learning has become a very active field of research with a large amount of both sophisticated theoretical results and exciting real-world applications. This paper gives a brief introduction into the basic concepts of supervised support vector learning and touches some recent developments in this broad field.
This is a preview of subscription content, log in via an institution.
References
Aizerman, M., Braverman, E., Rozonoer, L.: Uncovering shared structures in multiclass classification. In: International Conference on Machine Learning, pp. 821–837 (1964)
Amit, Y., Fink, M., Srebro, N., Ullman, S.: Theoretocal foundations of the potential function method in pattern recognition learning. Autom. Remote Control 25, 17–24 (2007)
Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)
Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)
Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68, 337–404 (1950)
Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Convexity, classification, and risk bounds. J. Am. Stat. Assoc. 101, 138–156 (2006)
Bennett, K.P., Mangasarian, O.L.: Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Softw. 1, 23–34 (1992)
Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer, Dordrecht (2004)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
Björck, A.: Least Squares Problems. SIAM, Philadelphia (1996)
Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer, New York (2000)
Boser, G.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, Madison, pp. 144–152 (1992)
Bottou, L., Chapelle, L., DeCoste, O., Weston, J. (eds.): Large Scale Kernel Machines. MIT, Cambridge (2007)
Boucheron, S., Bousquet, O., Lugosi, G.: Theory of classification: a survey on some recent advances. ESAIM Probab. Stat. 9, 323–375 (2005)
Bousquet, O., Elisseeff, A.: Algorithmic stability and generalization performance. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, pp. 196–202. MIT, Cambridge (2001)
Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Proceedings of the 15th International Conference on Machine Learning, Madison, pp. 82–90. Morgan Kaufmann, San Francisco (1998)
Brown, M., Grundy, W., Lin, D., Cristianini, N., Sugnet, C., Furey, T., Ares, M., Haussler, D.: Knowledge-based analysis of microarray gene-expression data by using support vector machines. Proc. Natl. Acad. Sci. 97(1), 262–267 (2000)
Buhmann, M.D.: Radial Basis Functions. Cambridge University Press, Cambridge (2003)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)
Cai, J.-F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. Technical report, UCLA Computational and Applied Mathematics (2008)
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. www.csie.ntu.edu.tw/~cjlin/papers/libsvm.ps.gz (2004)
Chapelle, O., Haffner, P., Vapnik, V.N.: SVMs for histogram-based image classification. IEEE Trans. Neural Netw. 10(5), 1055–1064 (1999)
Chen, P.-H., Fan, R.-E., Lin, C.-J.: A study on SMO-type decomposition methods for support vector machines. IEEE Trans. Neural Netw. 17, 893–908 (2006)
Collobert, R., Bengio, S.: Support vector machines for large scale regression problems. J. Mach. Learn. Res. 1, 143–160 (2001)
Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 273–297 (1995)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Am. Math. Soc. 39, 1–49 (2002)
Cucker, F., Zhou, D.X.: Learning Theory: An Approximation Point of View. Cambridge University Press, Cambridge (2007)
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996)
Devroye, L.P.: Any discrimination rule can have an arbitrarily bad probability of error for finite sample size. IEEE Trans. Pattern Anal. Mach. Intell. 4, 154–157 (1982)
Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 2, 263–286 (1995)
Dinuzzo, F., Neve, M., Nicolao, G.D., Gianazza, U.P.: On the representer theorem and equivalent degrees of freedom of SVR. J. Mach. Learn. Res. 8, 2467–2495 (2007)
Duda, R.O., Hart, P.E., Stork, D.: Pattern Classification, 2nd edn. Wiley, New York (2001)
Edmunds, D.E., Triebel, H.: Function Spaces, Entropy Numbers, Differential Operators. Cambridge University Press, Cambridge (1996)
Elisseeff, A., Evgeniou, A., Pontil, M.: Stability of randomised learning algorithms. J. Mach. Learn. Res. 6, 55–79 (2005)
Evgeniou, T., Pontil, M., Poggio, T.: Regularization networks and support vector machines. Adv. Comput. Math. 13(1), 1–50 (2000)
Fan, R.-E., Chen, P.-H., Lin, C.-J.: Working set selection using second order information for training support vector machines. J. Mach. Learn. Res. 6, 1889–1918 (2005)
Fasshauer, G.E.: Meshfree Approximation Methods with MATLAB. World Scientific, Hackensack (2007)
Fazel, M., Hindi, H., Boyd, S.P.: A rank minimization heuristic with application to minimum order system approximation. In: Proceedings of the American Control Conference, pp. 4734–4739 (2001)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936)
Flake, G.W., Lawrence, S.: Efficient SVM regression training with SMO. Technical report, NEC Research Institute (1999)
Gauss, C.F.: Theory of the Motion of the Heavenly Bodies Moving about the Sun in Conic Sections (C. H. Davis, Trans.) Dover, New York (1963). First published 1809
Girosi, F.: An equivalence between sparse approximation and support vector machines. Neural Comput. 10(6), 1455–1480 (1998)
Golub, G.H., Loan, C.F.V.: Matrix Computation, 3rd edn. John Hopkins University Press, Baltimore (1996)
Györfi, L., Kohler, M., Krzyżak, A., Walk, H.: A Distribution-Free Theory of Nonparametric Regression. Springer, New York (2002)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)
Herbrich, R.: Learning Kernel Classifiers: Theory and Algorithms. MIT, Cambridge (2001)
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Huang, T., Kecman, V., Kopriva, I., Friedman, J.: Kernel Based Algorithms for Mining Huge Data Sets: Supervised, Semi-supervised and Unsupervised Learning. Springer, Berlin (2006)
Jaakkola, T.S., Haussler, D.: Probabilistic kerbnel regression models. In: Proceedings of the 1999 Conference on Artificial Intelligence and Statistics, Fort Lauderdale (1999)
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods-Support Vector Learning, pp. 41–56. MIT, Cambridge (1999)
Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer Academic, Boston (2002)
Kailath, T.: RKHS approach to detection and estimation problems: part i: deterministic signals in Gaussian noise. IEEE Trans. Inf. Theory 17(5), 530–549 (1971)
Keerthi, S.S., Shevade, S.K., Battacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO algorithm for SMV classifier design. Neural Comput. 13, 637–649 (2001)
Kimeldorf, G.S., Wahba, G.: Some results on Tchebycheffian spline functions. J. Math. Anal. Appl. 33, 82–95 (1971)
Kolmogorov, A.N., Tikhomirov, V.M.: \(\varepsilon\)-entropy and \(\varepsilon\)-capacity of sets in functional spaces. Am. Math. Soc. Transl. 17, 277–364 (1961)
Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete structures. In: Kauffman, M. (ed.) Proceedings of the International Conference on Machine Learning (2002)
Krige, D.G.: A statistical approach to some basic mine valuation problems on the Witwatersrand. J. Chem. Metall. Min. Soc. S. Afr. 52(6), 119–139 (1951)
Kuhn, H.W., Tucker, A.W.: Nonlinear programming. In: Proceedings Berkley Symposium on Mathematical Statistics and Probability, pp. 482–492. University of California Press (1951)
Laplace, P.S.: Théorie Analytique des Probabilités, 3rd edn. Courier, Paris (1816)
LeCun, Y., Jackel, L.D., Bottou, L., Brunot, A., Cortes, C., Denker, J.S., Drucker, H., Guyon, I., Müller, U., Säckinger, E., Simard, P., Vapnik, V.: Comparison of learning algorithms for handwritten digit recognition. In: Fogelman-Souleé, F., Gallinari, P. (eds.) Proceedings ICANN’95, Paris, vol. 2, pp. 53–60 (1995)
Legendre, A.M.: Nouvelles Méthodes pour la Determination des Orbites des Cométes. Courier, Paris (1805)
Leopold, E., Kinderman, J.: Text categogization with support vector machines how to represent text in input space? Mach. Learn. 46(1–3), 223–244 (2002)
Lin, C.J.: On the convergence of the decomposition method for support vector machines. IEEE Trans. Neural Netw. 12, 1288–1298 (2001)
Lu, Z., Monteiro, R.D.C., Yuan, M.: Convex optimization methods for dimension reduction and coefficient estimation in multivariate linear regression. Math. Program., 163–194 (2012)
Ma, S., Goldfarb, D., Chen, L.: Fixed point and Bregman iterative methods for matrix rank minimization. Technical report 08-78, UCLA Computational and Applied Mathematics (2008)
Mangasarian, O.L.: Nonlinear Programming. SIAM, Madison (1994)
Mangasarian, O.L., Musicant, D.R.: Successive overrelaxation for support vector machines. IEEE Trans. Neural Netw. 10, 1032–1037 (1999)
Matheron, G.: Principles of geostatistics. Econ. Geol. 58, 1246–1266 (1963)
Micchelli, C.A.: Interpolation of scattered data: distance matices and conditionally positive definite functions. Constr. Approx. 2, 11–22 (1986)
Micchelli, C.A., Pontil, M.: On learning vector-valued functions. Neural Comput. 17, 177–204 (2005)
Mitchell, T.M.: Machine Learning. McGraw-Hill, Boston (1997)
Mukherjee, S., Niyogi, P., Poggio, T., Rifkin, R.: Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv. Comput. Math. 25, 161–193 (2006)
Neumann, J., Schnörr, C., Steidl, G.: Efficient wavelet adaptation for hybrid wavelet–large margin classifiers. Pattern Recognit. 38, 1815–1830 (2005)
Obozinski, G., Taskar, B., Jordan, M.I.: Joint covariate selection and joint subspace selection for multiple classification problems. Stat. Comput. 20, 231–252 (2010)
Osuna, E., Freund, R., Girosi, F.: Training of support vector machines: an application to face detection. In: Proceedings of the CVPR’97, San Juan, pp. 130–136. IEEE Computer Society, Washington, DC (1997)
Parzen, E.: Statistical inference on time series by RKHS methods. Technical report, Department of Statistics, Stanford University (1970)
Pinkus, A.: N-width in Approximation Theory. Springer, Berlin-Heidelberg-New York-Tokyo (1996)
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods – Support Vector Learning, pp. 185–208. MIT, Cambridge (1999)
Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990)
Pong, T.K., Tseng, P., Ji, S., Ye, J.: Trace norm regularization: reformulations, algorithms and multi-task learning. SIAM J. Optim. 20, 3465–3489 (2010)
Povzner, A.Y.: A class of Hilbert function spaces. Dokl. Akad. Nauk USSR 68, 817–820 (1950)
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1959)
Schoenberg, I.J.: Metric spaces and completely monotone functions. Ann. Math. 39, 811–841 (1938)
Schölkopf, B., Herbrich, R., Smola, A.J.: A generalized representer theorem. In: Helmbold, D., Williamson, B. (eds.) Proceedings of the 14th Annual Conference on Computational Learning Theory, Amsterdam, pp. 416–426. Springer, New York (2001)
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT, Cambridge (2002)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis, 4th edn. Cambridge University Press, Cambridge (2009)
Smola, A.J., Schölkopf, B., Müller, K.R.: The connection between regularization operators and support vector kernels. Neural Netw. 11, 637–649 (1998)
Spellucci, P.: Numerische Verfahren der Nichtlinearen Optimierung. Birkhäuser, Basel/Boston/Berlin (1993)
Srebro, N., Rennie, J.D.M., Jaakkola, T.S.: Maximum-margin matrix factorization. In: NIPS, pp. 1329–1336 (2005)
Steinwart, I.: Sparseness of support vector machines. J. Mach. Learn. Res. 4, 1071–1105 (2003)
Steinwart, I., Christmann, A.: Support Vector Machines. Springer, New York (2008)
Stone, C.: Consistent nonparametric regression. Ann. Stat. 5, 595–645 (1977)
Strauss, D.J., Steidl, G.: Hybrid wavelet-support vector classification of waveforms. J. Comput. Appl. Math. 148, 375–400 (2002)
Strauss, D.J., Steidl, G., Delb, D.: Feature extraction by shape-adapted local discriminant bases. Signal Process. 83, 359–376 (2003)
Sutton, R.S., Barton, A.G.: Reinforcement Learning: An Introduction. MIT, Cambridge (1998)
Suykens, J.A.K., Gestel, T.V., Brabanter, J.D., Moor, B.D., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific, Singapore (2002)
Suykens, J.A.K., Vandevalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
Tao, P.D., An, L.T.H.: A D.C. optimization algorithm for solving the trust-region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)
Tikhonov, A.N., Arsenin, V.Y.: Solution of Ill-Posed Problems. Winston, Washington, DC (1977)
Toh, K.-C., Yun, S.: An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Technical report, Department of Mathematics, National University of Singapore, Singapore (2009)
Tsypkin, Y.: Adaptation and Learning in Automatic Systems. Academic, New York (1971)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Vapnik, V.N.: Estimation of Dependicies Based on Empirical Data. Springer, New York (1982)
Vapnik, V.N., Chervonenkis, A.: Theory of Pattern Recognition (in Russian). Nauka, Moscow (1974) (German translation: Theorie der Zeichenerkennung, Akademie-Verlag, Berlin, 1979 edition)
Vapnik, V.N., Lerner, A.: Pattern recognition using generalized portrait method. Autom. Remote Control 24, 774–780 (1963)
Vidyasagar, M.: A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems, 2nd edn. Springer, London (2002)
Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)
Vito, E.D., Rosasco, L., Caponnetto, A., Piana, M., Verri, A.: Some properties of regularized kernel methods. J. Mach. Learn. Res. 5, 1363–1390 (2004)
Wahba, G.: Spline Models for Observational Data. SIAM, New York (1990)
Weimer, M., Karatzoglou, A., Smola, A.: Improving maximum margin matrix factorization. Mach. Learn. 72(3), 263–276 (2008)
Wendland, H.: Scattered Data Approximation. Cambridge University Press, Cambridge (2005)
Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.: Use of the zero-norm with linear models and kernel methods. J. Mach. Learn. Res. 3, 1439–1461 (2003)
Weston, J., Watkins, C.: Multi-class support vector machines. In: Verlysen, M. (ed.) Proceedings ESANN’99, Brussels. D-Facto Publications (1999)
Wolfe, P.: Duality theorem for nonlinear programming. Q. Appl. Math. 19, 239–244 (1961)
Zdenek, D.: Optimal Quadratic Programming Algorithms with Applications to Variational Inequalities. Springer, New York (2009)
Zhang, T.: Statistical behaviour and consistency of classification methods based on convex risk minimization. Ann. Stat. 32, 56–134 (2004)
Zoutendijk, G.: Methods of Feasible Directions. A Study in Linear and Nonlinear Programming. Elsevier, Amsterdam (1960)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this entry
Cite this entry
Steidl, G. (2014). Supervised Learning by Support Vector Machines. In: Scherzer, O. (eds) Handbook of Mathematical Methods in Imaging. Springer, New York, NY. https://doi.org/10.1007/978-3-642-27795-5_22-5
Download citation
DOI: https://doi.org/10.1007/978-3-642-27795-5_22-5
Received:
Accepted:
Published:
Publisher Name: Springer, New York, NY
Online ISBN: 978-3-642-27795-5
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering