Switching and Learning in Feedback Systems pp 98-127

Part of the Lecture Notes in Computer Science book series (LNCS, volume 3355) | Cite as

Analysis of Some Methods for Reduced Rank Gaussian Process Regression

  • Joaquin Quiñonero-Candela
  • Carl Edward Rasmussen

Abstract

While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning the covariance function hyperparameters and the support set. We propose a method for learning hyperparameters for a given support set. We also review the Sparse Greedy GP (SGGP) approximation (Smola and Bartlett, 2001), which is a way of learning the support set for given hyperparameters based on approximating the posterior. We propose an alternative method to the SGGP that has better generalization capabilities. Finally we make experiments to compare the different ways of training a RRGP. We provide some Matlab code for learning RRGPs.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cressie, N.A.C.: Statistics for Spatial Data. John Wiley and Sons, New Jersey (1993)Google Scholar
  2. Csató, L.: Gaussian Processes – Iterative Sparse Approximation. PhD thesis, Aston University, Birmingham, United Kingdom (2002)Google Scholar
  3. Csató, L., Opper, M.: Sparse online gaussian processes. Neural Computation 14(3), 641–669 (2002)MATHCrossRefGoogle Scholar
  4. Gibbs, M., MacKay, D.J.C.: Efficient implementation of gaussian processes. Technical report, Cavendish Laboratory, Cambridge University, Cambridge, United Kingdom (1997)Google Scholar
  5. Lawrence, N., Seeger, M., Herbrich, R.: Fast sparse gaussian process methods: The informative vector machine. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Neural Information Processing Systems, vol. 15, pp. 609–616. MIT Press, Cambridge (2003)Google Scholar
  6. MacKay, D.J.C.: Bayesian non-linear modelling for the energy prediction competition. ASHRAE Transactions 100(2), 1053–1062 (1994)Google Scholar
  7. Cressie, N.A.C.: Statistics for Spatial Data. John Wiley and Sons, New Jersey (1993)Google Scholar
  8. Neal, R.M.: Bayesian Learning for Neural Networks. Lecture Notes in Statistics, vol. 118. Springer, Heidelberg (1996)MATHGoogle Scholar
  9. Press, W., Flannery, B., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes in C, 2nd edn. Cambridge University Press, Cambridge (1992)MATHGoogle Scholar
  10. Rasmussen, C.E.: Evaluation of Gaussian Processes and Other Methods for Non-linear Regression. PhD thesis, Department of Computer Science, University of Toronto, Toronto, Ontario (1996)Google Scholar
  11. Rasmussen, C.E.: Reduced rank gaussian process learning. Unpublished Manuscript (2002)Google Scholar
  12. Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)Google Scholar
  13. Schwaighofer, A., Tresp, V.: Transductive and inductive methods for approximate gaussian process regression. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15, pp. 953–960. MIT Press, Cambridge (2003)Google Scholar
  14. Seeger, M.: Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations. PhD thesis, University of Edinburgh, Edinburgh, Scotland (2003)Google Scholar
  15. Seeger, M., Williams, C., Lawrence, N.: Fast forward selection to speed up sparse gaussian process regression. In: Bishop, C.M., Frey, B.J. (eds.) Ninth International Workshop on Artificial Intelligence and Statistics, Society for Artificial Intelligence and Statistics (2003)Google Scholar
  16. Smola, A.J., Bartlett, P.L.: Sparse greedy Gaussian process regression. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems, vol. 13, pp. 619–625. MIT Press, Cambridge (2001)Google Scholar
  17. Smola, A.J., Schölkopf, B.: Sparse greedy matrix approximation for machine learning. In: Langley, P. (ed.) International Conference on Machine Learning, vol. 17, pp. 911–918. Morgan Kaufmann, San Francisco (2000)Google Scholar
  18. Tipping, M.E.: Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1, 211–244 (2001)MATHCrossRefMathSciNetGoogle Scholar
  19. Tresp, V.: A bayesian committee machine. Neural Computation 12(11), 2719–2741 (2000)CrossRefGoogle Scholar
  20. Wahba, G., Lin, X., Gao, F., Xiang, D., Klein, R., Klein, B.: The biasvariance tradeoff and the randomized GACV. In: Kerns, M.S., Solla, S.A., Cohn, D.A. (eds.) Advances in Neural Information Processing Systems, vol. 11, pp. 620–626. MIT Press, Cambridge (1999)Google Scholar
  21. Williams, C.: Computation with infinite neural networks. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9, pp. 295–301. MIT Press, Cambridge (1997a)Google Scholar
  22. Williams, C.: Prediction with gaussian processes: From linear regression to linear prediction and beyond. Technical Report NCRG/97/012, Dept of Computer Science and Applied Mathematics, Aston University, Birmingham, United Kingdom (1997b)Google Scholar
  23. Williams, C., Rasmussen, C.E., Schwaighofer, A., Tresp, V.: Observations of the nyström method for gaussiam process prediction. Technical report, University of Edinburgh, Edinburgh, Scotland (2002)Google Scholar
  24. Williams, C., Seeger, M.: Using the Nyström method to speed up kernel machines. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems, vol. 13, pp. 682–688. MIT Press, Cambridge (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Joaquin Quiñonero-Candela
    • 1
    • 2
  • Carl Edward Rasmussen
    • 2
  1. 1.Informatics and Mathematical ModellingTechnical University of DenmarkKongens LyngbyDenmark
  2. 2.Max Planck Institute for Biological CyberneticsTübingenGermany

Personalised recommendations