Machine Learning

, Volume 93, Issue 1, pp 93–114 | Cite as

A comparative evaluation of stochastic-based inference methods for Gaussian process models

Article

Abstract

Gaussian Process (GP) models are extensively used in data analysis given their flexible modeling capabilities and interpretability. The fully Bayesian treatment of GP models is analytically intractable, and therefore it is necessary to resort to either deterministic or stochastic approximations. This paper focuses on stochastic-based inference techniques. After discussing the challenges associated with the fully Bayesian treatment of GP models, a number of inference strategies based on Markov chain Monte Carlo methods are presented and rigorously assessed. In particular, strategies based on efficient parameterizations and efficient proposal mechanisms are extensively compared on simulated and real data on the basis of convergence speed, sampling efficiency, and computational cost.

Keywords

Bayesian inference Gaussian processes Markov chain Monte Carlo Hierarchical models Latent variable models 

References

  1. Amari, S., & Nagaoka, H. (2000). Translations of mathematical monographs: Vol. 191. Methods of information geometry. Oxford: Oxford University Press.MATHGoogle Scholar
  2. Asuncion, A., & Newman, D. J. (2007). UCI Machine Learning Repository.Google Scholar
  3. Christensen, O. F., Roberts, G. O., & Rosenthal, J. S. (2005). Scaling limits for the transient phase of local Metropolis–Hastings algorithms. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 67(2), 253–268.MathSciNetMATHCrossRefGoogle Scholar
  4. Chu, W., & Ghahramani, Z. (2005). Gaussian processes for ordinal regression. Journal of Machine Learning Research, 6, 1019–1041.MathSciNetMATHGoogle Scholar
  5. Cseke, B., & Heskes, T. (2011). Approximate marginals in latent Gaussian models. Journal of Machine Learning Research, 12, 417–454.MathSciNetGoogle Scholar
  6. Filippone, M., Marquand, A. F., Blain, C. R. V., Williams, S. C. R., Mourão-Miranda, J., & Girolami, M. (2012a). Probabilistic prediction of neurological disorders with a statistical assessment of neuroimaging data modalities. Annals of Applied Statistics, 6(4), 1883–1905.MathSciNetMATHCrossRefGoogle Scholar
  7. Filippone, M., Zhong, M., & Girolami, M. (2012b). On the fully Bayesian treatment of latent Gaussian models using stochastic simulations (Technical Report TR-2012-329). School of Computing Science, University of Glasgow.Google Scholar
  8. Flegal, J. M., Haran, M., & Jones, G. L. (2007). Markov chain Monte Carlo: can we trust the third significant figure? Statistical Science, 23(2), 250–260.MathSciNetCrossRefGoogle Scholar
  9. Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472.CrossRefGoogle Scholar
  10. Girolami, M., & Calderhead, B. (2011). Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 73(2), 123–214.MathSciNetCrossRefGoogle Scholar
  11. Hoffman, M. D., & Gelman, A. (2011). The No-U-Turn Sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. arXiv:1111.4246. Journal of Machine Learning Research, to appear.
  12. Knorr-Held, L., & Rue, H. (2002). On block updating in Markov random field models for disease mapping. Scandinavian Journal of Statistics, 29(4), 597–614.MathSciNetMATHCrossRefGoogle Scholar
  13. Kuss, M., & Rasmussen, C. E. (2005). Assessing approximate inference for binary Gaussian process classification. Journal of Machine Learning Research, 6, 1679–1704.MathSciNetMATHGoogle Scholar
  14. Mackay, D. J. C. (1994). Bayesian methods for backpropagation networks. In E. Domany, J. L. van Hemmen, & K. Schulten (Eds.), Models of neural networks III (pp. 211–254). Berlin: Springer. Chap. 6.Google Scholar
  15. Minka, T. P. (2001). Expectation propagation for approximate Bayesian inference. In Proceedings of the 17th conference in uncertainty in artificial intelligence (UAI ’01), San Francisco, CA, USA (pp. 362–369). San Mateo: Morgan Kaufmann.Google Scholar
  16. Møller, J., Syversveen, A. R., & Waagepetersen, R. P. (1998). Log Gaussian Cox processes. Scandinavian Journal of Statistics, 25(3), 451–482.MathSciNetCrossRefGoogle Scholar
  17. Murray, I., & Adams, R. P. (2010). Slice sampling covariance hyperparameters of latent Gaussian models. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, & A. Culotta (Eds.), NIPS (pp. 1732–1740). Red Hook: Curran Associates.Google Scholar
  18. Murray, I., Adams, R. P., & MacKay, D. J. C. (2010). Elliptical slice sampling. Journal of Machine Learning Research, 9, 541–548.Google Scholar
  19. Neal, R. (2003). Slice sampling. The Annals of Statistics, 31, 705–767.MathSciNetMATHCrossRefGoogle Scholar
  20. Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods (Technical Report CRG-TR-93-1). Dept. of Computer Science, University of Toronto.Google Scholar
  21. Neal, R. M. (1996). Lecture notes in statistics: Bayesian learning for neural networks. Berlin: Springer.MATHCrossRefGoogle Scholar
  22. Neal, R. M. (1999). Regression and classification using Gaussian process priors (with discussion). Bayesian Statistics, 6, 475–501.MathSciNetGoogle Scholar
  23. Opper, M., & Winther, O. (2000). Gaussian processes for classification: mean-field algorithms. Neural Computation, 12(11), 2655–2684.CrossRefGoogle Scholar
  24. Rasmussen, C. E., & Williams, C. (2006). Gaussian processes for machine learning. Cambridge: MIT Press.MATHGoogle Scholar
  25. Robert, C. P., & Casella, G. (2005). Monte Carlo statistical methods (Springer texts in statistics). New York: Springer.Google Scholar
  26. Roberts, G. O., & Stramer, O. (2002). Langevin diffusions and Metropolis-Hastings algorithms. Methodology and Computing in Applied Probability, 4(4), 337–357.MathSciNetMATHCrossRefGoogle Scholar
  27. Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 71(2), 319–392.MathSciNetMATHCrossRefGoogle Scholar
  28. Smith, S. P. (1995). Differentiation of the Cholesky algorithm. Journal of Computational and Graphical Statistics, 4(2), 134–147.Google Scholar
  29. Stathopoulos, V., & Filippone, M. (2011). Discussion of the paper “Riemann manifold Langevin and Hamiltonian Monte Carlo methods” by Mark Girolami and Ben Calderhead. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 73(2), 167–168.Google Scholar
  30. Thompson, M., & Neal, R. M. (2010). Covariance-adaptive slice sampling (Technical Report 1002). Department of Statistics, University of Toronto.Google Scholar
  31. Tierney, L., & Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81(393), 82–86.MathSciNetMATHCrossRefGoogle Scholar
  32. Vanhatalo, J., & Vehtari, A. (2007). Sparse log Gaussian processes via MCMC for spatial epidemiology. Journal of Machine Learning Research, 1, 73–89.Google Scholar
  33. Wilson, A. G., & Ghahramani, Z. (2010). Copula processes. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, & A. Culotta (Eds.), NIPS (pp. 2460–2468). Red Hook: Curran Associates.Google Scholar
  34. Yu, Y., & Meng, X.-L. (2011). To center or not to center: that is not the question—an ancillarity-sufficiency interweaving strategy (ASIS) for boosting MCMC efficiency. Journal of Computational and Graphical Statistics, 20(3), 531–570.MathSciNetCrossRefGoogle Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  1. 1.School of Computing ScienceUniversity of GlasgowGlasgowUK
  2. 2.Department of Biomedical EngineeringDalian University of TechnologyDalianP.R. China
  3. 3.Department of Statistical ScienceUniversity College LondonLondonUK

Personalised recommendations