Machine Learning

, Volume 46, Issue 1–3, pp 71–89 | Cite as

A Probabilistic Framework for SVM Regression and Error Bar Estimation

  • J.B. Gao
  • S.R. Gunn
  • C.J. Harris
  • M. Brown


In this paper, we elaborate on the well-known relationship between Gaussian Processes (GP) and Support Vector Machines (SVM) under some convex assumptions for the loss functions. This paper concentrates on the derivation of the evidence and error bar approximation for regression problems. An error bar formula is derived based on the ∈-insensitive loss function.

support vector machine (SVM) Gaussian process ∈-loss function error bar estimation 


  1. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19:6, 716-723.Google Scholar
  2. Evgeniou, T., Pontil, M., & Poggio, T. (1999). A unified framework for regularization networks and support vector machines. A.I. Memo 1654, AI Lab, MIT, Massachusetts.Google Scholar
  3. Gao, J., Gunn, S., Harris, C., & Brown, M. (2000). A variational approach for support vector regression based on probabilistic framework. Reaserch Report, ISIS Research Group, Department of Electronics and Computer Science, University of Southampton.Google Scholar
  4. Girosi, F. (1998). An equivalence between sparse approximation and support vector machines. Neural Computation, 10:6, 1455-1480.Google Scholar
  5. Gunn, S. (1998). Support vector machines for classification and regression. Technical Report, ISIS, Department of Electronics and Computer Science, University of Southampton.Google Scholar
  6. Gunn, S., Brown, M., & Bossley, K. (1997). Network performance assessment for neurofuzzy data modelling. In Lecture notes in computer science (vol. 1280, pp. 313-323). Boston: Academic Press.Google Scholar
  7. Kwok, J. T.-Y. (1999). Moderating the outputs of support vector machine classifiers. IEEE-NN, 10:5, 1018.Google Scholar
  8. MacKay, D. (1991). Bayesian modelling and neural networks. Ph.D. Thesis, California Institute of Technology, Pasadena, CA.Google Scholar
  9. MacKay, D. (1992). Bayesian interpolation. Neural Computation, 4:3, 415-447.Google Scholar
  10. MacKay, D. (1997). Gaussian processes, a replacement for neural networks. NIPS tutorial 1997, Cambridge University.Google Scholar
  11. Murata, N., Yoshizawa, S., & Amari, S. (1994). Network information criterion-determining the number of hidden units for artificial neural network models. IEEE Transactions on Neural Networks, 5, 865-872.Google Scholar
  12. Neal, R. (1996). Bayesian learning for neural networks, Lecture Notes in Statistics. New York: Springer.Google Scholar
  13. Poggio, T. & Girosi, F. (1998). A sparse representation for function approximation. Neural Computation, 10, 1445-1454.Google Scholar
  14. Smola, A. (1998). Learning with kernels. Ph.D. Thesis, Technischen Universität Berlin, Berlin, Germany.Google Scholar
  15. Sollich, P. (1999a). Approximate learning curves for Gaussian processes. In ICANN99: Ninth International Conference on Artificial Neural Networks (pp. 437-442). London.Google Scholar
  16. Sollich, P. (1999b). Bayesian methods for support vector machines: Evidence and error bars. Technical Report, King's College London, London, UK. Accepted by Machine Learning.Google Scholar
  17. Sollich, P. (1999c). Probabilistic interpretations and Bayesian methods for support vector machines. Technical Report, King's College London, London, UK.Google Scholar
  18. Sollich, P. (2000). Probabilistic methods for support vector machines. In S. Solla, T. Leen, & K. Möller (Eds.), Advnaces in neural information processing systems (pp. 349-355). Cambridge, MA: MIT Press.Google Scholar
  19. Tikhonov, A. & Arsenin, V. (1977). Solution of ill-posed problems. Washington, D.C.: W.H. Winston.Google Scholar
  20. Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.Google Scholar
  21. Vapnik, V. (1998). Statistical learning theory. New York: Wiley.Google Scholar
  22. Wahba, G. (1990). Splines models for observational data, vol. 59 of Series in Applied Mathematics. Philadelphia: SIAM Press.Google Scholar
  23. Wahba, G. (1999). Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods–support vector learning (pp. 68-88). Cambridge, MA: MIT Press.Google Scholar
  24. Williams, C. (1997). Computing with infinite networks. In M. Mozer, M. Jordan, & T. Petsche (Eds.), Neural information processing systems (vol. 9, pp. 295-301). Cambridge, MA: MIT Press.Google Scholar
  25. Williams, C. (1998). Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In M. Jordan (Ed.), Learning in graphical models (pp. 599-621). Cambridge, MA: MIT Press.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • J.B. Gao
    • 1
  • S.R. Gunn
    • 1
  • C.J. Harris
    • 1
  • M. Brown
    • 2
  1. 1.Image, Speech and Intelligent System Research Group, Department of Electronics and Computer ScienceUniversity of SouthamptonSouthamptonUK
  2. 2.Data Exploitation Group, D0170HS&T, IBM Hursley LaboratoryWinchesterUK

Personalised recommendations