Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities

Sollich, Peter

doi:10.1023/A:1012489924661

Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities

Published: January 2002

Volume 46, pages 21–52, (2002)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities

Download PDF

Peter Sollich¹

6633 Accesses
125 Citations
1 Altmetric
Explore all metrics

Abstract

I describe a framework for interpreting Support Vector Machines (SVMs) as maximum a posteriori (MAP) solutions to inference problems with Gaussian Process priors. This probabilistic interpretation can provide intuitive guidelines for choosing a ‘good’ SVM kernel. Beyond this, it allows Bayesian methods to be used for tackling two of the outstanding challenges in SVM classification: how to tune hyperparameters—the misclassification penalty C, and any parameters specifying the ernel—and how to obtain predictive class probabilities rather than the conventional deterministic class label predictions. Hyperparameters can be set by maximizing the evidence; I explain how the latter can be defined and properly normalized. Both analytical approximations and numerical methods (Monte Carlo chaining) for estimating the evidence are discussed. I also compare different methods of estimating class probabilities, ranging from simple evaluation at the MAP or at the posterior average to full averaging over the posterior. A simple toy application illustrates the various concepts and techniques.

References

Barber, D. & Bishop, C. M. (1997). Bayesian model comparison by Monte Carlo chaining. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.). Advances in neural information processing systems NIPS 9 (pp. 333-339). Cambridge, MA: MIT Press.
Google Scholar
Barber, D. & Williams, C. K. I. (1997). Gaussian processes for Bayesian classification via hybrid Monte Carlo. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.). Advances in neural information processing systems NIPS 9 (pp. 340-346). Cambridge, MA: MIT Press.
Google Scholar
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:2, 121-167.
Google Scholar
Chapelle, O. & Vapnik, V. N. (2000). Model selection for support vector machines. In S. A. Solla, T. K. Leen, & K.-R. Müller (Eds.). Advances in neural information processing systems 12 (pp. 230-236). Cambridge, MA: MIT Press.
Google Scholar
Cristianini, N., Campbell, C., & Shawe-Taylor, J. (1999). Dynamically adapting kernels in support vector machines. In M. Kearns, S. A. Solla, & D. Cohn (Eds.). Advances in neural information processing systems 11 (pp. 204-210). Cambridge, MA: MIT Press.
Google Scholar
Cristianini, N. & Shawe-Taylor, J. (2000). An introduction to support vector machines. Cambridge: Cambridge University Press.
Google Scholar
Csató, L., Fokoué, E., Opper, M., Schottky, B., & Winther, O. (2000). Efficient Approaches to Gaussian Process Classification. In: S. Solla, T. Leen, & K.-R. Müller (Eds.). Advances in neural information processing systems 12 (pp. 251-257). Cambridge, MA: MIT Press.
Google Scholar
Gold, C. & Sollich, P. (2001). In preparation.
Herbrich, R., Graepel,T., & Campbell, C. (1999). Bayesian learning in reproducing kernel Hilbert spaces.Technical Report TR 99-11, Technical University Berlin. http://stat.cs.tu-berlin.de/~ralfh/publications.html.
Jaakkola, T. & Haussler, D. (1999). Probabilistic kernel regression models. In D. Heckerman & J. Whittaker (Eds.). Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics. San Francisco, CA.
Kwok, J. T. Y. (1999a). Integrating the evidence framework and the support vector machine. In Proc. European Symp. Artificial Neural Networks (Bruges 1999) (pp. 177-182). Brussels.
Kwok, J. T. Y. (1999b). Moderating the outputs of support vector machine classifiers. IEEE Trans. Neural Netw., 10:5, 1018-1031.
Google Scholar
MacKay, D. J. C. (1992). The evidence framework applied to classification networks. Neural Computation, 4, 720-736.
Google Scholar
Neal, R. M. (1993). Probabilistic inference using Markov Chain Monte Carlo methods. Technical Report CRGTR-93-1, University of Toronto.
Opper, M. & Winther, O. (2000). Gaussian process classification and SVM: Mean field results and leave-one-out estimator. In A. J. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.). Advances in large margin classifiers (pp. 43-65). Cambridge, MA: MIT Press.
Google Scholar
Pontil, M., Mukherjee, S., & Girosi, F. (1998). On the noise model of support vector machine regression. Technical Report CBCL Paper 168, AI Memo 1651, Massachusetts Institute of Technology, Cambridge, MA.
Google Scholar
Ripley, B. D. (1996). Pattern recognition and neural networks. Cambridge: Cambridge University Press.
Google Scholar
Schölkopf, B., Bartlett, P., Smola, A., & Williamson, R. (1999). Shrinking the tube: a newsupport vector regression algorithm. In M. Kearns, S. A. Solla, & D. Cohn (Eds.). Advances in neural information processing systems 11 (pp. 330-336). Cambridge, MA: MIT Press.
Google Scholar
Schölkopf, B., Burges, C., & Smola, A. J. (1998). Advances in kernel methods: support vector machines. Cambridge, MA: MIT Press.
Google Scholar
Seeger, M. (2000). Bayesian model selection for support vector machines, Gaussian processes and other kernel classifiers. In S. Solla, T. Leen, & K.-R. Müller (Eds.). Advances in neural information processing systems 12 (pp. 603-609). Cambridge, MA.
Smola, A. J. & Schölkopf, B. (1998). A tutorial on support vector regression. Neuro COLT Technical Report TR-1998-030; Available from http://svm.first.gmd.de/.
Smola, A. J., Schölkopf, B., & Müller, K. R. (1998). The connection between regularization operators and support vector kernels. Neural Networks, 11:4, 637-649.
Google Scholar
Sollich, P. (1999). Probabilistic interpretation and Bayesian methods for support vector machines. In ICANN99-Ninth International Conference on Artificial Neural Networks (pp. 91-96). London.
Sollich, P. (2000). Probabilistic methods for support vector machines. In S. A. Solla, T. K. Leen, & K.-R. Müller (Eds.). Advances in neural information processing systems 12 (pp. 349-355). Cambridge, MA: MIT Press.
Google Scholar
Tipping, M. E. (2000). The relevance vector machine. In S. Solla, T. Leen, & K.-R. Müller (Eds.). Advances in neural information processing systems 12 (pp. 652-658). Cambridge, MA: MIT Press.
Google Scholar
Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.
Google Scholar
Wahba, G. (1998). Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. In B. Schölkopf, C. Burges, & A. J. Smola (Eds.). Advances in kernel methods: support vector machines (pp. 69-88). Cambridge, MA: MIT Press.
Google Scholar
Williams, C. K. I. (1998). Prediction with Gaussian processes: from linear regression to linear prediction and beyond. In M. I. Jordan (Ed.). Learning and inference in graphical models (pp. 599-621). Dordrecht: Kluwer Academic.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, King's College London, Strand, London, WC2R 2LS, UK
Peter Sollich

Authors

Peter Sollich
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sollich, P. Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities. Machine Learning 46, 21–52 (2002). https://doi.org/10.1023/A:1012489924661

Download citation

Issue Date: January 2002
DOI: https://doi.org/10.1023/A:1012489924661

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities

Abstract

Article PDF

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities

Abstract

Article PDF

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation