Bayesian Kernel Methods

Smola, Alexander J.; Schölkopf, Bernhard

doi:10.1007/3-540-36434-X_3

Alexander J. Smola³ &
Bernhard Schölkopf⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2600))

3585 Accesses
4 Citations

Abstract

Bayesian methods allow for a simple and intuitive representation of the function spaces used by kernel methods. This chapter describes the basic principles of Gaussian Processes, their implementation and their connection to other kernel-based Bayesian estimation methods, such as the Relevance Vector Machine.

The present article is based on 62.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

K. P. Bennett, A. Demiriz, and J. Shawe-Taylor. A column generation algorithm for boosting. In P. Langley, editor, Proceedings of the International Conference on Machine Learning, San Francisco, 2000. Morgan Kaufmann Publishers.
Google Scholar
K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1:23–34, 1992.
Article Google Scholar
C. M. Bishop. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.
Google Scholar
C. M. Bishop and M. E. Tipping. Variational relevance vector machines. In Proceedings of 16th Conference on Uncertainty in Artificial Intelligence UAI’2000, pages 46–53, 2000.
Google Scholar
P. S. Bradley and O. L. Mangasarian. Feature selection via concave minimization and support vector machines. In J. Shavlik, editor, Proceedings of the International Conference on Machine Learning, pages 82–90, San Francisco, California, 1998. Morgan Kaufmann Publishers. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-03.ps.Z.
S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit. Siam Journal of Scientific Computing, 20(1):33–61, 1999.
Article MATH MathSciNet Google Scholar
T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley and Sons, New York, 1991.
MATH Google Scholar
H. Cramér. Mathematical Methods of Statistics. Princeton University Press, 1946.
Google Scholar
L. Csató and M. Opper. Sparse representation for Gaussian process models. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 444–450. MIT Press, 2001.
Google Scholar
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society B, 39(1):1–22, 1977.
MATH MathSciNet Google Scholar
S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth. Hybrid Monte Carlo. Physics Letters B, 195:216–222, 1995.
Article Google Scholar
S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representation. Technical report, IBM Watson Research Center, New York, 2000.
Google Scholar
R. Fletcher. Practical Methods of Optimization. John Wiley and Sons, New York, 1989.
Google Scholar
G. Fung and O. L. Mangasarian. Data selection for support vector machine classifiers. In Proceedings of KDD’2000, 2000. also: Data Mining Institute Technical Report 00-02, University of Wisconsin, Madison.
Google Scholar
A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis. Chapman and Hall, London, 1995.
Google Scholar
M. Gibbs and D. J. C. Mackay. Variational Gaussian process classifiers. Technical report, Cavendish Laboratory, Cambridge, UK, 1998.
Google Scholar
M. N. Gibbs. Bayesian Gaussian Methods for Regression and Classification. PhD thesis, University of Cambridge, 1997.
Google Scholar
Mark Gibbs and David J. C. Mackay. Efficient implementation of Gaussian processes. Technical report, Cavendish Laboratory, Cambridge, UK, 1997. available at http://www.wol.ra.phy.cam.ac.uk/mng10/GP/.
P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. Academic Press, 1981.
Google Scholar
F. Girosi. Models of noise and robust estimates. A.I. Memo 1287, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 1991.
Google Scholar
D. Goldfarb and K. Scheinberg. A product-form cholesky factorization method for handling dense columns in interior point methods for linear programming. Technical report, IBM Watson Research Center, Yorktown Heights, 2001.
Google Scholar
G. H. Golub and C. F. Van Loan. Matrix Computations. John Hopkins University Press, Baltimore, MD, 3rd edition, 1996.
MATH Google Scholar
I. S. Gradshteyn and I. M. Ryzhik. Table of integrals, series, and products. Academic Press, New York, 1981.
Google Scholar
T. Graepel, R. Herbrich, P. Bollmann-Sdorra, and K. Obermayer. Classification on pairwise proximity data. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems 11, pages 438–444, Cambridge, MA, 1999. MIT Press.
Google Scholar
D. Haussler. Convolutional kernels on discrete structures. Technical Report UCSCCRL-99-10, Computer Science Department, UC Santa Cruz, 1999.
Google Scholar
R. Herbrich. Learning Kernel Classifiers: Theory and Algorithms. MIT Press, 2002.
Google Scholar
Ralf Herbrich, Thore Graepel, and Colin Campbell. Bayes point machines: Estimating the Bayes point in kernel space. In Proceedings of IJCAI Workshop Support Vector Machines, pages 23–27, 1999.
Google Scholar
R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, 1985.
MATH Google Scholar
P. J. Huber. Robust statistics: a review. Annals of Statistics, 43:1041, 1972.
Article MathSciNet MATH Google Scholar
T. Jaakkola, M. Meila, and T. Jebara. Maximum entropy discrimination. Technical Report AITR-1668, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 1999.
Google Scholar
T. S. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems 11, pages 487–493, Cambridge, MA, 1999. MIT Press.
Google Scholar
T. S. Jaakkola and M. I. Jordan. Computing upper and lower bounds on likelihoods in untractable networks. In Proceedings of the 12th Conference on Uncertainty in AI. Morgan Kaufmann Publishers, 1996.
Google Scholar
W. James and C. Stein. Estimation with quadratic loss. In Proceedings of the Fourth Berkeley Symposium on Mathematics, Statistics and Probability, volume 1, pages 361–380, Berkeley, 1960. University of California Press.
Google Scholar
T. Jebara and T. Jaakkola. Feature selection and dualities in maximum entropy discrimination. In Uncertainty In Artificial Intelligence, 2000.
Google Scholar
M. I. Jordan and C. M. Bishop. An Introduction to Probabilistic Graphical Models. MIT Press, 2002.
Google Scholar
M. I. Jordan, Z. Gharamani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. In Learning in Graphical Models, volume M.I. Jordan, pages 105–162. Kluwer Academic, 1998.
Google Scholar
M. S. Lewicki and T. J. Sejnowski. Learning nonlinear overcomplete representations for efficient coding. In M. I. Jordan, M. J. Kearns, and S. A. Solla, editors, Advances in Neural Information Processing Systems 10, pages 556–562, Cambridge, MA, 1998. MIT Press.
Google Scholar
D. G. Luenberger. Introduction to Linear and Nonlinear Programming. Addison-Wesley, Reading, MA, 1973.
MATH Google Scholar
H. Lütkepohl. Handbook of Matrices. John Wiley and Sons, Chichester, 1996.
MATH Google Scholar
D. J. C. MacKay. Bayesian Methods for Adaptive Models. PhD thesis, Computation and Neural Systems, California Institute of Technology, Pasadena, CA, 1991.
Google Scholar
D. J. C. MacKay. The evidence framework applied to classification networks. Neural Computation, 4(5):720–736, 1992.
Article Google Scholar
S. Mallat and Z. Zhang. Matching Pursuit in a time-frequency dictionary. IEEE Transactions on Signal Processing, 41:3397–3415, 1993.
Article MATH Google Scholar
O. L. Mangasarian. Linear and nonlinear separation of patterns by linear programming. Operations Research, 13:444–452, 1965.
Article MATH MathSciNet Google Scholar
B. K. Natarajan. Sparse approximate solutions to linear systems. SIAM Journal of Computing, 25(2):227–234, 1995.
Article MathSciNet Google Scholar
R. Neal. Priors for infinite networks. Technical Report CRG-TR-94-1, Dept. of Computer Science, University of Toronto, 1994.
Google Scholar
R. Neal. Bayesian Learning in Neural Networks. Springer, 1996.
Google Scholar
Radford M. Neal. Probabilistic inference using Markov chain Monte Carlo methods. Technical report, Dept. of Computer Science, University of Toronto, 1993. CRGTR-93-1.
Google Scholar
B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609, 1996.
Article Google Scholar
M. Opper and O. Winther. Mean field methods for classification with Gaussian processes. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems 11, pages 309–315, Cambridge, MA, 1999. MIT Press.
Google Scholar
M. Opper and O. Winther. Gaussian processes and SVM: mean field and leaveone-out. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 311–326, Cambridge, MA, 2000. MIT Press.
Google Scholar
J. Platt. Probabilities for SV machines. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 61–73, Cambridge, MA, 2000. MIT Press.
Google Scholar
T. Poggio. On optimal nonlinear associative recall. Biological Cybernetics, 19:201–209, 1975.
MathSciNet MATH Google Scholar
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C: The Art of Scientific Computing (2nd ed.). Cambridge University Press, Cambridge, 1992. ISBN 0-521-43108-5.
Google Scholar
C. Rasmussen. Evaluation of Gaussian Processes and Other Methods for Non-Linear Regression. PhD thesis, Department of Computer Science, University of Toronto, 1996. ftp://ftp.cs.toronto.edu/pub/carl/thesis.ps.gz.
G. Rätsch, S. Mika, and A.J. Smola. Adapting codes und embeddings for polychotomies. In Neural Information Processing Systems, volume 15. MIT Press, 2002. to appear.
Google Scholar
B. D. Ripley. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, 1996.
MATH Google Scholar
R. T. Rockafellar. Convex Analysis, volume 28 of Princeton Mathematics Series. Princeton University Press, 1970.
Google Scholar
P. Ruján and M. Marchand. Computing the Bayes kernel classifier. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 329–347, Cambridge, MA, 2000. MIT Press.
Google Scholar
Pál Ruján. Playing billiards in version space. Neural Computation, 9:99–122, 1997.
Article MATH Google Scholar
B. Schölkopf, S. Mika, C. Burges, P. Knirsch, K.-R. Müller, G. Rätsch, and A. Smola. Input space vs. feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10(5):1000–1017, 1999.
Article Google Scholar
B. Schölkopf, A. Smola, and K.-R. Müller. Kernel principal component analysis. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods-Support Vector Learning, pages 327–352. MIT Press, Cambridge, MA, 1999.
Google Scholar
B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.
Google Scholar
M. Seeger. Bayesian methods for support vector machines and Gaussian processes. Master’s thesis, University of Edinburgh, Division of Informatics, 1999.
Google Scholar
J. Skilling. Maximum Entropy and Bayesian Methods. Cambridge University Press, 1988.
Google Scholar
A. Smola, B. Schölkopf, and G. Rätsch. Linear programs for automatic accuracy control in regression. In Ninth International Conference on Artificial Neural Networks, Conference Publications No. 470, pages 575–580, London, 1999. IEE.
Google Scholar
A. J. Smola. Learning with Kernels. PhD thesis, Technische Universität Berlin, 1998. GMD Research Series No. 25.
Google Scholar
A. J. Smola and P. L. Bartlett. Sparse greedy Gaussian process regression. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 619–625. MIT Press, 2001.
Google Scholar
A. J. Smola and B. Schölkopf. Sparse greedy matrix approximation for machine learning. In P. Langley, editor, Proceedings of the International Conference on Machine Learning, pages 911–918, San Francisco, 2000. Morgan Kaufmann Publishers.
Google Scholar
A.J. Smola and S.V.N. Vishwanathan. Cholesky factorization for rank-k modifications of diagonal matrices. SIAM Journal of Matrix Analysis, 2002. submitted.
Google Scholar
C. Soon-Ong, A.J. Smola, and R.C. Williamson. Superkernels. In Neural Information Processing Systems, volume 15. MIT Press, 2002. to appear.
Google Scholar
D. J. Spiegelhalter and S. L. Lauritzen. Sequential updating of conditional probabilities on directed graphical structures. Networks, 20:579–605, 1990.
Article MATH MathSciNet Google Scholar
J. Stoer and R. Bulirsch. Introduction to Numerical Analysis. Springer, New York, second edition, 1993.
MATH Google Scholar
M. Tipping. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1:211–244, 2001.
Article MATH MathSciNet Google Scholar
V. Tresp. A Bayesian committee machine. Neural Computation, 12(11):2719–2741, 2000.
Article Google Scholar
V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995.
MATH Google Scholar
V. Vapnik, S. Golowich, and A. Smola. Support vector method for function approximation, regression estimation, and signal processing. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 281–287, Cambridge, MA, 1997. MIT Press.
Google Scholar
G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.
Google Scholar
C. Watkins. Dynamic alignment kernels. CSD-TR-98-11, Royal Holloway, University of London, Egham, Surrey, UK, 1999.
Google Scholar
G. N. Watson. A Treatise on the Theory of Bessel Functions. Cambridge University Press, Cambridge, UK, 2 edition, 1958.
Google Scholar
C. K. I. Williams. Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In M. I. Jordan, editor, Learning and Inference in Graphical Models. Kluwer Academic, 1998.
Google Scholar
C. K. I. Williams. Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In Micheal Jordan, editor, Learning and Inference in Graphical Models, pages 599–621. MIT Press, 1999.
Google Scholar
C. K. I. Williams and C. E. Rasmussen. Gaussian processes for regression. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 514–520, Cambridge, MA, 1996. MIT Press.
Google Scholar
Christoper K. I. Williams and Matthias Seeger. Using the Nystrom method to speed up kernel machines. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 682–688, Cambridge, MA, 2001. MIT Press.
Google Scholar
Christopher K. I. Williams and David Barber. Bayesian classification with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI, 20(12):1342–1351, 1998.
Article Google Scholar
T. Zhang. Some sparse approximation bounds for regression problems. In Proc. 18th International Conf. on Machine Learning, pages 624–631. Morgan Kaufmann, San Francisco, CA, 2001.
Google Scholar

Download references

Author information

Authors and Affiliations

RSISE, The Australian National University, 0200, ACT, Canberra, Australia
Alexander J. Smola
Max Planck Institut für Biologische Kybernetik, 72076, Tübingen, Germany
Bernhard Schölkopf

Authors

Alexander J. Smola
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Schölkopf
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

RSISE, The Australian National University, 0200, Canberra, ACT, Australia
Shahar Mendelson
Research School for Information Sciences and Engineering, The Australian National University, 0200, Canberra, ACT, Australia
Alexander J. Smola

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Smola, A.J., Schölkopf, B. (2003). Bayesian Kernel Methods. In: Mendelson, S., Smola, A.J. (eds) Advanced Lectures on Machine Learning. Lecture Notes in Computer Science(), vol 2600. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36434-X_3

Download citation

DOI: https://doi.org/10.1007/3-540-36434-X_3
Published: 30 January 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00529-2
Online ISBN: 978-3-540-36434-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics