Statistics and Computing

, Volume 14, Issue 3, pp 199–222 | Cite as

A tutorial on support vector regression

  • Alex J. Smola
  • Bernhard Schölkopf

Abstract

In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.

machine learning support vector machines regression estimation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aizerman M.A., Braverman É.M., and Rozonoér L.I. 1964. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control 25: 821-837.Google Scholar
  2. Aronszajn N. 1950. Theory of reproducing kernels. Transactions of the American Mathematical Society 68: 337-404.Google Scholar
  3. Bazaraa M.S., Sherali H.D., and Shetty C.M. 1993. Nonlinear Programming: Theory and Algorithms, 2nd edition, Wiley.Google Scholar
  4. Bellman R.E. 1961. Adaptive Control Processes. Princeton University Press, Princeton, NJ.Google Scholar
  5. Bennett K. 1999. Combining support vector and mathematical programming methods for induction. In: Schölkopf B., Burges C.J.C., and Smola A.J., (Eds.), Advances in Kernel Methods-SV Learning, MIT Press, Cambridge, MA, pp. 307-326.Google Scholar
  6. Bennett K.P. and Mangasarian O.L. 1992. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software 1: 23-34.Google Scholar
  7. Berg C., Christensen J.P.R., and Ressel P. 1984. Harmonic Analysis on Semigroups. Springer, New York.Google Scholar
  8. Bertsekas D.P. 1995. Nonlinear Programming. Athena Scientific, Belmont, MA.Google Scholar
  9. Bishop C.M. 1995. Neural Networks for Pattern Recognition. Clarendon Press, Oxford.Google Scholar
  10. Blanz V., Schölkopf B., Bülthoff H., Burges C., Vapnik V., and Vetter T. 1996. Comparison of view-based object recognition algorithms using realistic 3D models. In: von der Malsburg C., von Seelen W., Vorbrüggen J.C., and Sendhoff B. (Eds.), Artificial Neural Networks ICANN'96, Berlin. Springer Lecture Notes in Computer Science, Vol. 1112, pp. 251-256.Google Scholar
  11. Bochner S. 1959. Lectures on Fourier integral. Princeton Univ. Press, Princeton, New Jersey.Google Scholar
  12. Boser B.E., Guyon I.M., and Vapnik V.N. 1992. Atraining algorithm for optimal margin classifiers. In: Haussler D. (Ed.), Proceedings of the Annual Conference on Computational Learning Theory. ACM Press, Pittsburgh, PA, pp. 144-152.Google Scholar
  13. Bradley P.S., Fayyad U.M., and Mangasarian O.L. 1998. Data mining: Overview and optimization opportunities. Technical Report 98-01, University ofWisconsin, Computer Sciences Department, Madison, January. INFORMS Journal on Computing, to appear.Google Scholar
  14. Bradley P.S. and Mangasarian O.L. 1998. Feature selection via concave minimization and support vector machines. In: Shavlik J. (Ed.), Proceedings of the International Conference on Machine Learning, Morgan Kaufmann Publishers, San Francisco, California, pp. 82-90. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-03.ps.Z.Google Scholar
  15. Bunch J.R. and Kaufman L. 1977. Some stable methods for calculating inertia and solving symmetric linear systems. Mathematics of Computation 31: 163-179.Google Scholar
  16. Bunch J.R. and Kaufman L. 1980. A computational method for the indefinite quadratic programming problem. Linear Algebra and Its Applications, pp. 341-370, December.Google Scholar
  17. Bunch J.R., Kaufman L., and Parlett B. 1976. Decomposition of a symmetric matrix. Numerische Mathematik 27: 95-109.Google Scholar
  18. Burges C.J.C. 1996. Simplified support vector decision rules. In L. Saitta (Ed.), Proceedings of the International Conference on Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA, pp. 71-77.Google Scholar
  19. Burges C.J.C. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2): 121-167.CrossRefGoogle Scholar
  20. Burges C.J.C. 1999. Geometry and invariance in kernel based methods. In Schölkopf B., Burges C.J.C., and Smola A.J., (Eds.), Advances in Kernel Methods-Support Vector Learning, MIT Press, Cambridge, MA, pp. 89-116.Google Scholar
  21. Burges C.J.C. and SchölkopfB. 1997. Improving the accuracy and speed of support vector learning machines. In Mozer M.C., Jordan M.I., and Petsche T., (Eds.), Advances in Neural Information Processing Systems 9, MIT Press, Cambridge, MA, pp. 375-381.Google Scholar
  22. Chalimourda A., Schölkopf B., and Smola A.J. 2004. Experimentally optimal νin support vector regression for different noise models and parameter settings. Neural Networks 17(1): 127-141.Google Scholar
  23. Chang C.-C., Hsu C.-W., and Lin C.-J. 1999. The analysis of decomposition methods for support vector machines. In Proceeding of IJCAI99, SVM Workshop.Google Scholar
  24. Chang C.C. and Lin C.J. 2001. Training ν-support vector classi-fiers: Theory and algorithms. Neural Computation 13(9): 2119-2147.Google Scholar
  25. Chen S., Donoho D., and Saunders M. 1999. Atomic decomposition by basis pursuit. Siam Journal of Scientific Computing 20(1): 33-61.Google Scholar
  26. Cherkassky V. and Mulier F. 1998. Learning from Data. JohnWiley and Sons, New York.Google Scholar
  27. Cortes C. and Vapnik V. 1995. Support vector networks. Machine Learning 20: 273-297.CrossRefGoogle Scholar
  28. Cox D. and O'Sullivan F. 1990. Asymptotic analysis of penalized likelihood and related estimators. Annals of Statistics 18: 1676-1695.CPLEX Optimization Inc. Using the CPLEX callable library. Manual, 1994.Google Scholar
  29. Cristianini N. and Shawe-Taylor J. 2000. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK.Google Scholar
  30. Cristianini N., Campbell C., and Shawe-Taylor J. 1998. Multiplicative updatings for support vector learning. NeuroCOLT Technical Report NC-TR-98-016, Royal Holloway College.Google Scholar
  31. Dantzig G.B. 1962. Linear Programming and Extensions. Princeton Univ. Press, Princeton, NJ.Google Scholar
  32. Devroye L., Györfi L., and Lugosi G. 1996. A Probabilistic Theory of Pattern Recognition. Number 31 in Applications of mathematics.Springer, New York.Google Scholar
  33. Drucker H., Burges C.J.C., Kaufman L., Smola A., and Vapnik V. 1997.Support vector regression machines. In: Mozer M.C., Jordan M.I., and Petsche T. (Eds.), Advances in Neural Information Processing Systems 9, MIT Press, Cambridge, MA, pp. 155-161.Google Scholar
  34. Efron B. 1982. The jacknife, the bootstrap, and other resampling plans. SIAM, Philadelphia.Google Scholar
  35. Efron B. and Tibshirani R.J. 1994. An Introduction to the Bootstrap. Chapman and Hall, New York.Google Scholar
  36. El-Bakry A., Tapia R., Tsuchiya R., and ZhangY. 1996. On the formulation and theory of the Newton interior-point method for nonlinear programming. J. Optimization Theory and Applications 89: 507-541.Google Scholar
  37. Fletcher R. 1989. Practical Methods of Optimization. John Wiley and Sons, New York.Google Scholar
  38. Girosi F. 1998. An equivalence between sparse approximation and support vector machines. Neural Computation 10(6): 1455-1480.CrossRefGoogle Scholar
  39. Girosi F., Jones M., and Poggio T. 1993. Priors, stabilizers and basis functions: From regularization to radial, tensor and additive splines. A.I. Memo No. 1430, Artificial Intelligence Laboratory, Massachusetts Institute of Technology.Google Scholar
  40. Guyon I., Boser B., and Vapnik V. 1993. Automatic capacity tuning of very large VC-dimension classifiers. In: Hanson S.J., Cowan J.D., and Giles C.L. (Eds.), Advances in Neural Information Processing Systems 5. Morgan Kaufmann Publishers, pp. 147-155.Google Scholar
  41. Härdle W. 1990. Applied nonparametric regression, volume 19 of Econometric Society Monographs. Cambridge University Press.Google Scholar
  42. Hastie T.J. and Tibshirani R.J. 1990. Generalized Additive Models, volume 43 of Monographs on Statistics and Applied Probability. Chapman and Hall, London.Google Scholar
  43. Haykin S. 1998. Neural Networks: A Comprehensive Foundation. 2nd edition. Macmillan, New York.Google Scholar
  44. Hearst M.A., Schölkopf B., Dumais S., Osuna E., and Platt J. 1998. Trends and controversies-support vector machines. IEEE Intelligent Systems 13: 18-28.Google Scholar
  45. Herbrich R. 2002. LearningKernel Classifiers: Theory and Algorithms. MIT Press.Google Scholar
  46. Huber P.J. 1972. Robust statistics: A review. Annals of Statistics 43: 1041.Google Scholar
  47. Huber P.J. 1981. Robust Statistics. John Wiley and Sons, New York. IBM Corporation. 1992. IBM optimization subroutine library guide and reference. IBM Systems Journal, 31, SC23-0519.Google Scholar
  48. Jaakkola T.S. and Haussler D. 1999. Probabilistic kernel regression models. In: Proceedings of the 1999 Conference on AI and Statistics.Google Scholar
  49. Joachims T. 1999. Making large-scale SVM learning practical. In: Schölkopf B., Burges C.J.C., and Smola A.J. (Eds.), Advances in Kernel Methods-Support Vector Learning, MIT Press, Cambridge, MA, pp. 169-184.Google Scholar
  50. Karush W. 1939. Minima of functions of several variables with inequalities as side constraints. Master's thesis, Dept. of Mathematics, Univ. of Chicago.Google Scholar
  51. Kaufman L. 1999. Solving the quadratic programming problem arising in support vector classification. In: Schölkopf B., Burges C.J.C., and Smola A.J. (Eds.), Advances in Kernel Methods-Support Vector Learning, MIT Press, Cambridge, MA, pp. 147-168Google Scholar
  52. Keerthi S.S., Shevade S.K., Bhattacharyya C., and Murthy K.R.K. 1999. Improvements to Platt's SMO algorithm for SVM classifier design. Technical Report CD-99-14, Dept. of Mechanical and Production Engineering, Natl. Univ. Singapore, Singapore.Google Scholar
  53. Keerthi S.S., Shevade S.K., Bhattacharyya C., and Murty K.R.K. 2001. Improvements to platt's SMO algorithm for SVM classifier design. Neural Computation 13: 637-649.Google Scholar
  54. Kimeldorf G.S. and Wahba G. 1970. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Annals of Mathematical Statistics 41: 495-502.Google Scholar
  55. Kimeldorf G.S. and Wahba G. 1971. Some results on Tchebycheffian spline functions. J. Math. Anal. Applic. 33: 82-95.Google Scholar
  56. Kowalczyk A. 2000. Maximal margin perceptron. In: Smola A.J., Bartlett P.L., Schölkopf B., and Schuurmans D. (Eds.), Advances in Large Margin Classifiers, MIT Press, Cambridge, MA, pp. 75-113.Google Scholar
  57. Kuhn H.W. and Tucker A.W. 1951. Nonlinear programming. In: Proc. 2nd Berkeley Symposium on Mathematical Statistics and Probabilistics, Berkeley. University of California Press, pp. 481-492.Google Scholar
  58. Lee Y.J. and Mangasarian O.L. 2001. SSVM: A smooth support vector machine for classification. Computational optimization and Applications 20(1): 5-22.Google Scholar
  59. Li M. and Vitányi P. 1993. An introduction to Kolmogorov Complexity and its applications. Texts and Monographs in Computer Science. Springer, New York.Google Scholar
  60. Lin C.J. 2001. On the convergence of the decomposition method for support vector machines. IEEE Transactions on Neural Networks 12(6): 1288-1298.PubMedGoogle Scholar
  61. Lustig I.J., Marsten R.E., and Shanno D.F. 1990. On implementing Mehrotra's predictor-corrector interior point method for linear programming. Princeton Technical Report SOR90-03., Dept. of Civil Engineering and Operations Research, Princeton University.Google Scholar
  62. Lustig I.J., Marsten R.E., and Shanno D.F. 1992. On implementing Mehrotra's predictor-corrector interior point method for linear programming. SIAM Journal on Optimization 2(3): 435-449.Google Scholar
  63. MacKay D.J.C. 1991. Bayesian Methods for Adaptive Models. PhD thesis, Computation and Neural Systems, California Institute of Technology, Pasadena, CA.Google Scholar
  64. Mangasarian O.L. 1965. Linear and nonlinear separation of patterns by linear programming. Operations Research 13: 444-452.Google Scholar
  65. Mangasarian O.L. 1968. Multi-surface method of pattern separation. IEEE Transactions on Information Theory IT-14: 801-807.Google Scholar
  66. Mangasarian O.L. 1969. Nonlinear Programming. McGraw-Hill, New York.Google Scholar
  67. Mattera D. and Haykin S. 1999. Support vector machines for dynamic reconstruction of a chaotic system. In: SchölkopfB., Burges C.J.C., and Smola A.J. (Eds.), Advances in Kernel Methods-Support Vector Learning, MIT Press, Cambridge, MA, pp. 211-242.Google Scholar
  68. McCormick G.P. 1983. Nonlinear Programming: Theory, Algorithms, and Applications. John Wiley and Sons, New York.Google Scholar
  69. Megiddo N. 1989. Progressin Mathematical Programming, chapter Pathways to the optimal set in linear programming, Springer, New York, NY, pp. 131-158.Google Scholar
  70. Mehrotra S. and Sun J. 1992. On the implementation of a (primal-dual) interior point method. SIAM Journal on Optimization 2(4): 575-601.Google Scholar
  71. Mercer J. 1909. Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society, London A 209: 415-446.Google Scholar
  72. Micchelli C.A. 1986. Algebraic aspects of interpolation. Proceedings of Symposia in Applied Mathematics 36: 81-102.Google Scholar
  73. Morozov V.A. 1984. Methods for Solving Incorrectly Posed Problems. Springer.Google Scholar
  74. Müller K.-R., Smola A., Rätsch G., Schölkopf B., Kohlmorgen J., and Vapnik V. 1997. Predicting time series with support vector machines. In: Gerstner W., Germond A., Hasler M., and Nicoud J.-D. (Eds.), Artificial Neural Networks ICANN'97, Berlin. Springer Lecture Notes in Computer Science Vol. 1327 pp. 999-1004.Google Scholar
  75. Murtagh B.A. and Saunders M.A. 1983. MINOS 5.1 user's guide. Technical Report SOL 83-20R, Stanford University, CA, USA, Revised 1987.Google Scholar
  76. Neal R. 1996. Bayesian Learning in Neural Networks. Springer.Google Scholar
  77. Nilsson N.J. 1965. Learning machines: Foundations ofTrainable Pattern Classifying Systems. McGraw-Hill.Google Scholar
  78. Nyquist. H. 1928. Certain topics in telegraph transmission theory. Trans. A.I.E.E., pp. 617-644.Google Scholar
  79. Osuna E., Freund R., and Girosi F. 1997. An improved training algorithm for support vector machines. In Principe J., Gile L., Morgan N., and Wilson E. (Eds.), Neural Networks for Signal Processing VII-Proceedings of the 1997 IEEEWorkshop, pp. 276-285, New York, IEEE.Google Scholar
  80. Osuna E. and Girosi F. 1999. Reducing the run-time complexity in support vector regression. In: Schölkopf B., Burges C.J.C., and Smola A. J. (Eds.), Advances in Kernel Methods-Support Vector Learning, pp. 271-284, Cambridge, MA, MIT Press.Google Scholar
  81. Ovari Z. 2000.Kernels, eigenvalues and support vector machines. Honours thesis, Australian National University, Canberra.Google Scholar
  82. Platt J. 1999. Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B., Burges C.J.C., and Smola A.J. (Eds.) Advances in Kernel Methods-Support Vector Learning, pp. 185-208, Cambridge, MA, MIT Press.Google Scholar
  83. Poggio T. 1975. On optimal nonlinear associative recall. Biological Cybernetics, 19: 201-209.Google Scholar
  84. Rasmussen C. 1996. Evaluation of Gaussian Processes and Other Methods for Non-Linear Regression. PhD thesis, Department of Computer Science, University of Toronto, ftp://ftp.cs.toronto.edu/pub/carl/thesis.ps.gz.Google Scholar
  85. Rissanen J. 1978. Modeling by shortest data description. Automatica, 14: 465-471.CrossRefGoogle Scholar
  86. Saitoh S. 1988. Theory of Reproducing Kernels and its Applications. Longman Scientific & Technical, Harlow, England.Google Scholar
  87. Saunders C., Stitson M.O., Weston J., Bottou L., Schölkopf B., and Smola A. 1998. Support vector machine-reference manual.Technical Report CSD-TR-98-03, Department of Computer Science, Royal Holloway, University of London, Egham, UK. SVM available at http://svm.dcs.rhbnc.ac.uk/.Google Scholar
  88. Schoenberg I. 1942. Positive definite functions on spheres. Duke Math. J., 9: 96-108.Google Scholar
  89. Schölkopf B. 1997. Support Vector Learning. R. Oldenbourg Verlag, München. Doktorarbeit, TU Berlin. Download: http://www.kernel-machines.org.Google Scholar
  90. Schölkopf B., Burges C., and Vapnik V. 1995. Extracting support data for a given task. In: Fayyad U.M. and Uthurusamy R. (Eds.), Proceedings, First International Conference on Knowledge Discovery & Data Mining, Menlo Park, AAAI Press.Google Scholar
  91. Schölkopf B., Burges C., and Vapnik V. 1996. Incorporating invariances in support vector learning machines. In: von der Malsburg C., von Seelen W., Vorbrüggen J. C., and Sendhoff B. (Eds.), Artificial Neural Networks ICANN'96, pp. 47-52, Berlin, Springer Lecture Notes in Computer Science, Vol. 1112.Google Scholar
  92. Schölkopf B., Burges C.J.C., and Smola A.J. 1999a. (Eds.) Advances in Kernel Methods-Support Vector Learning. MIT Press, Cambridge, MA.Google Scholar
  93. Schölkopf B., Herbrich R., Smola A.J., and Williamson R.C. 2001. A generalized representer theorem. Technical Report 2000-81, NeuroCOLT, 2000.To appear in Proceedings of the Annual Conference on Learning Theory, Springer (2001).Google Scholar
  94. Schölkopf B., Mika S., Burges C., Knirsch P., Müller K.-R., Rätsch G., and Smola A. 1999b. Input space vs. feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10(5): 1000-1017.Google Scholar
  95. Schölkopf B., Platt J., Shawe-Taylor J., Smola A.J., and Williamson R.C.2001. Estimating the support of a high-dimensional distribution. Neural Computation,13(7): 1443-1471.CrossRefGoogle Scholar
  96. Schölkopf B., Simard P., Smola A., and Vapnik V. 1998a. Prior knowledge in support vector kernels. In: Jordan M.I., Kearns M.J., and Solla S.A. (Eds.) Advances in Neural Information Processing Systems 10, MIT Press. Cambridge, MA, pp. 640-646.Google Scholar
  97. Schölkopf B., Smola A., and Müller K.-R. 1998b. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10: 1299-1319.Google Scholar
  98. Schölkopf B., Smola A., Williamson R.C., and Bartlett P.L. 2000. New support vector algorithms. Neural Computation, 12: 1207-1245.Google Scholar
  99. Schölkopf B. and Smola A.J. 2002. Learning with Kernels. MIT Press.Google Scholar
  100. Schölkopf B., Sung K., Burges C., Girosi F., Niyogi P., Poggio T., and Vapnik V. 1997. Comparing support vector machines with Gaussian kernels to radial basis function classifiers. IEEE Transactions on Signal Processing, 45: 2758-2765.CrossRefGoogle Scholar
  101. Shannon C.E. 1948. A mathematical theory of communication. Bell System Technical Journal, 27: 379-423, 623-656.Google Scholar
  102. Shawe-Taylor J., Bartlett P.L., Williamson R.C., and Anthony M. 1998. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5): 1926-1940.Google Scholar
  103. Smola A., Murata N., Schölkopf B., and Müller K.-R. 1998a. Asymptotically optimal choice of ε-loss for support vector machines. In: Niklasson L., Bodén M., and Ziemke T. (Eds.) Proceedings of the International Conference on Artificial Neural Networks, Perspectives in Neural Computing, pp. 105-110, Berlin, Springer.Google Scholar
  104. Smola A., Schölkopf B., and Müller K.-R. 1998b. The connection between regularization operators and support vector kernels. Neural Networks, 11: 637-649.Google Scholar
  105. Smola A., Schölkopf B., and Müller K.-R. 1998c. General cost functions for support vector regression. In: Downs T., Frean M., and Gallagher M. (Eds.) Proc. of the Ninth Australian Conf. on Neural Networks, pp. 79-83, Brisbane, Australia. University of Queensland.Google Scholar
  106. Smola A., Schölkopf B., and Rätsch G. 1999. Linear programs for automatic accuracy control in regression. In: Ninth International Conference on Artificial Neural Networks, Conference Publications No. 470, pp. 575-580, London. IEE.Google Scholar
  107. Smola. A.J. 1996. Regression estimation with support vector learning machines. Diplomarbeit, Technische Universität München.Google Scholar
  108. Smola A.J. 1998. Learning with Kernels. PhD thesis, Technische Universit ät Berlin. GMD Research Series No. 25.Google Scholar
  109. Smola A.J., Elisseeff A., Schölkopf B., and Williamson R.C. 2000. Entropy numbers for convex combinations and MLPs. In Smola A.J., Bartlett P.L., Schölkopf B., and Schuurmans D. (Eds.) Advances in Large Margin Classifiers, MIT Press, Cambridge, MA, pp. 369-387.Google Scholar
  110. Smola A.J., óvári Z.L., and Williamson R.C. 2001. Regularization with dot-product kernels. In: Leen T.K., Dietterich T.G., and Tresp V. (Eds.) Advances in Neural Information Processing Systems 13, MIT Press, pp. 308-314.Google Scholar
  111. Smola A.J. and Schölkopf B. 1998a. On a kernel-based method for pattern recognition, regression, approximation and operator inversion. Algorithmica, 22: 211-231.Google Scholar
  112. Smola A.J. and Schölkopf B. 1998b. A tutorial on support vector regression. NeuroCOLT Technical Report NC-TR-98-030, Royal Holloway College, University of London, UK.Google Scholar
  113. Smola A.J. and Schölkopf B. 2000. Sparse greedy matrix approximation for machine learning. In: Langley P. (Ed.), Proceedings of the International Conference on Machine Learning, Morgan Kaufmann Publishers, San Francisco, pp. 911-918.Google Scholar
  114. Stitson M., Gammerman A., Vapnik V., Vovk V., Watkins C., and Weston J. 1999. Support vector regression with ANOVA decomposition kernels. In: Schölkopf B., Burges C.J.C., and Smola A.J. (Eds.), Advances in Kernel Methods-Support Vector Learning, MIT Press Cambridge, MA, pp. 285-292.Google Scholar
  115. Stone C.J. 1985. Additive regression and other nonparametric models. Annals of Statistics, 13: 689-705.Google Scholar
  116. Stone M. 1974. Cross-validatory choice and assessment of statistical predictors (with discussion). Journal of the Royal Statistical Society, B36: 111-147.Google Scholar
  117. Sreet W.N. and Mangasarian O.L. 1995. Improved generalization via tolerant training. Technical Report MP-TR-95-11, University of Wisconsin, Madison.Google Scholar
  118. Tikhonov A.N. and Arsenin V.Y. 1977. Solution of Ill-posed problems. V. H. Winston and Sons.Google Scholar
  119. Tipping M.E. 2000. The relevance vector machine. In: Solla S.A., Leen T.K., and Müller K.-R. (Eds.), Advances in Neural Information Processing Systems 12, MIT Press, Cambridge, MA, pp. 652-658.Google Scholar
  120. Vanderbei R.J. 1994. LOQO: An interior point code for quadratic programming.TR SOR-94-15, Statistics and Operations Research, Princeton Univ., NJ.Google Scholar
  121. Vanderbei R.J. 1997. LOQO user's manual-version 3.10. Technical Report SOR-97-08, Princeton University, Statistics and Operations Research, Code available at http://www.princeton.edu/ ~rvdb/.Google Scholar
  122. Vapnik V. 1995. The Nature of Statistical Learning Theory. Springer, New York.Google Scholar
  123. Vapnik V. 1998. Statistical Learning Theory. John Wiley and Sons, New York.Google Scholar
  124. Vapnik. V. 1999. Three remarks on the support vector method of function estimation. In: Schölkopf B., Burges C.J.C., and Smola A.J. (Eds.), Advances in Kernel Methods-Support Vector Learning, MIT Press, Cambridge, MA, pp. 25-42.Google Scholar
  125. Vapnik V. and Chervonenkis A. 1964. A note on one class of perceptrons. Automation and Remote Control, 25.Google Scholar
  126. Vapnik V. and Chervonenkis A. 1974. Theory of Pattern Recognition [in Russian]. Nauka, Moscow. (German Translation: Wapnik W. & Tscherwonenkis A., Theorie der Zeichenerkennung, Akademie-Verlag, Berlin, 1979).Google Scholar
  127. Vapnik V., Golowich S., and Smola A. 1997. Support vector method for function approximation, regression estimation, and signal processing. In: Mozer M.C., Jordan M.I., and Petsche T. (Eds.) Advances in Neural Information Processing Systems 9, MA, MIT Press, Cambridge. pp. 281-287.Google Scholar
  128. Vapnik V. and Lerner A. 1963. Pattern recognition using generalized portrait method. Automation and Remote Control, 24: 774-780.Google Scholar
  129. Vapnik V.N. 1982. Estimation of Dependences Based on Empirical Data. Springer, Berlin.Google Scholar
  130. Vapnik V.N. and Chervonenkis A.Y. 1971. On the uniformconvergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2): 264-281.Google Scholar
  131. Wahba G. 1980. Spline bases, regularization, and generalized cross-validation for solving approximation problems with large quantities of noisy data. In: Ward J. and Cheney E. (Eds.), Proceedings of the International Conference on Approximation theory in honour of George Lorenz, Academic Press, Austin, TX, pp. 8-10.Google Scholar
  132. Wahba G. 1990. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia.Google Scholar
  133. Wahba G. 1999. Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. In: Schölkopf B., Burges C.J.C., and Smola A.J. (Eds.), Advances in Kernel Methods-Support Vector Learning, MIT Press, Cambridge, MA. pp. 69-88.Google Scholar
  134. Weston J., Gammerman A., Stitson M., Vapnik V., Vovk V., and Watkins C. 1999. Support vector density estimation. In: Schölkopf B., Burges C.J.C., and Smola A.J. (Eds.) Advances in Kernel Methods-Support Vector Learning, MIT Press, Cambridge, MA. pp. 293-306.Google Scholar
  135. Williams C.K.I. 1998. Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In: Jordan M.I. (Ed.), Learning and Inference in Graphical Models, Kluwer Academic, pp. 599-621.Google Scholar
  136. Williamson R.C., Smola A.J., and Schölkopf B. 1998. Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators. Technical Report 19, NeuroCOLT, http://www.neurocolt.com. Published in IEEE Transactions on Information Theory, 47(6): 2516-2532 (2001).Google Scholar
  137. Yuille A. and Grzywacz N. 1988. The motion coherence theory. In: Proceedings of the International Conference on Computer Vision, IEEE Computer Society Press, Washington, DC, pp. 344-354.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Alex J. Smola
    • 1
  • Bernhard Schölkopf
    • 2
  1. 1.RSISEAustralian National UniversityCanberraAustralia
  2. 2.Max-Planck-Institut für biologische KybernetikTübingenGermany

Personalised recommendations