Model Selection for Gaussian Process Regression

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10496)


Gaussian processes are powerful tools since they can model non-linear dependencies between inputs, while remaining analytically tractable. A Gaussian process is characterized by a mean function and a covariance function (kernel), which are determined by a model selection criterion. The functions to be compared do not just differ in their parametrization but in their fundamental structure. It is often not clear which function structure to choose, for instance to decide between a squared exponential and a rational quadratic kernel. Based on the principle of posterior agreement, we develop a general framework for model selection to rank kernels for Gaussian process regression and compare it with maximum evidence (also called marginal likelihood) and leave-one-out cross-validation. Given the disagreement between current state-of-the-art methods in our experiments, we show the difficulty of model selection and the need for an information-theoretic approach.



This research was partially supported by the Max Planck ETH Center for Learning Systems and the project SignalX.


  1. 1.
    Bachoc, F.: Cross validation and maximum likelihood estimations of hyper-parameters of Gaussian processes with model misspecification. Comput. Stat. Data Anal. 66, 55–69 (2013)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bian, A.A., Gronskiy, A., Buhmann, J.M.: Information-theoretic analysis of maxcut algorithms. Technical report, Department of Computer Science, ETH Zurich (2016).
  3. 3.
    Bian, Y., Gronskiy, A., Buhmann, J.M.: Greedy maxcut algorithms and their information content. In: IEEE Information Theory Workshop (ITW), pp. 1–5 (2015)Google Scholar
  4. 4.
    Buhmann, J.M.: Information theoretic model validation for clustering. In: IEEE International Symposium on Information Theory (ISIT), pp. 1398–1402 (2010)Google Scholar
  5. 5.
    Buhmann, J.M.: SIMBAD: emergence of pattern similarity. In: Pelillo, M. (ed.) Similarity-Based Pattern Analysis and Recognition. ACVPR, pp. 45–64. Springer, London (2013). doi: 10.1007/978-1-4471-5628-4_3 CrossRefGoogle Scholar
  6. 6.
    Cawley, G.C., Talbot, N.L.C.: On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Chapelle, O.: Some thoughts about Gaussian processes (2005).[0].pdf
  8. 8.
    Chehreghani, M.H., Busetto, A.G., Buhmann, J.M.: Information theoretic model validation for spectral clustering. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 495–503 (2012)Google Scholar
  9. 9.
    Damianou, A.C., Lawrence, N.D.: Deep Gaussian processes. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 207–215 (2013)Google Scholar
  10. 10.
    Frank, M., Buhmann, J.M.: Selecting the rank of truncated SVD by maximum approximation capacity. In: IEEE International Symposium on Information Theory (ISIT), pp. 1036–1040 (2011)Google Scholar
  11. 11.
    Gronskiy, A., Buhmann, J.: How informative are minimum spanning tree algorithms? In: IEEE International Symposium on Information Theory (ISIT), pp. 2277–2281 (2014)Google Scholar
  12. 12.
    Horn, R.A., Johnson, C.R.: Matrix Analysis, 2nd edn. Cambridge University Press, Cambridge (2012)CrossRefGoogle Scholar
  13. 13.
    Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 620–630 (1957)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Jaynes, E.T.: Information theory and statistical mechanics. ii. Phys. Rev. 108, 171–190 (1957)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Lloyd, J.R., Duvenaud, D., Grosse, R., Tenenbaum, J.B., Ghahramani, Z.: Automatic construction and natural-language description of nonparametric regression models. In: AAAI Conference on Artificial Intelligence (AAAI) pp. 1242–1250 (2014)Google Scholar
  16. 16.
    Nocedal, J.: Updating quasi-Newton matrices with limited storage. Math. Comput. 35, 773–782 (1980)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge (2006)zbMATHGoogle Scholar
  18. 18.
    Seeger, M.W.: PAC-Bayesian generalisation error bounds for Gaussian process classification. J. Mach. Learn. Res. 3, 233–269 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Tong, Y.L.: The Multivariate Normal Distribution. Springer Science & Business Media, New York (2012)Google Scholar
  20. 20.
    Zee, A.: Quantum Field Theory in a Nutshell. Princeton University Press, Princeton (2003)zbMATHGoogle Scholar
  21. 21.
    Zhu, X., Welling, M., Jin, F., Lowengrub, J.S.: Predicting simulation parameters of biological systems using a Gaussian process model. Stat. Anal. Data Min. 5, 509–522 (2012)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceETH ZurichZürichSwitzerland

Personalised recommendations