Model Selection for Gaussian Process Regression

Gorbach, Nico S.; Bian, Andrew An; Fischer, Benjamin; Bauer, Stefan; Buhmann, Joachim M.

doi:10.1007/978-3-319-66709-6_25

Nico S. Gorbach¹⁵,
Andrew An Bian¹⁵,
Benjamin Fischer¹⁵,
Stefan Bauer¹⁵ &
…
Joachim M. Buhmann¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10496))

Included in the following conference series:

German Conference on Pattern Recognition

2804 Accesses
11 Citations

Abstract

Gaussian processes are powerful tools since they can model non-linear dependencies between inputs, while remaining analytically tractable. A Gaussian process is characterized by a mean function and a covariance function (kernel), which are determined by a model selection criterion. The functions to be compared do not just differ in their parametrization but in their fundamental structure. It is often not clear which function structure to choose, for instance to decide between a squared exponential and a rational quadratic kernel. Based on the principle of posterior agreement, we develop a general framework for model selection to rank kernels for Gaussian process regression and compare it with maximum evidence (also called marginal likelihood) and leave-one-out cross-validation. Given the disagreement between current state-of-the-art methods in our experiments, we show the difficulty of model selection and the need for an information-theoretic approach.

N.S. Gorbach and A.A. Bian—These two authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bachoc, F.: Cross validation and maximum likelihood estimations of hyper-parameters of Gaussian processes with model misspecification. Comput. Stat. Data Anal. 66, 55–69 (2013)
Article MathSciNet Google Scholar
Bian, A.A., Gronskiy, A., Buhmann, J.M.: Information-theoretic analysis of maxcut algorithms. Technical report, Department of Computer Science, ETH Zurich (2016). http://people.inf.ethz.ch/ybian/docs/pa.pdf
Bian, Y., Gronskiy, A., Buhmann, J.M.: Greedy maxcut algorithms and their information content. In: IEEE Information Theory Workshop (ITW), pp. 1–5 (2015)
Google Scholar
Buhmann, J.M.: Information theoretic model validation for clustering. In: IEEE International Symposium on Information Theory (ISIT), pp. 1398–1402 (2010)
Google Scholar
Buhmann, J.M.: SIMBAD: emergence of pattern similarity. In: Pelillo, M. (ed.) Similarity-Based Pattern Analysis and Recognition. ACVPR, pp. 45–64. Springer, London (2013). doi:10.1007/978-1-4471-5628-4_3
Chapter Google Scholar
Cawley, G.C., Talbot, N.L.C.: On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010)
MathSciNet MATH Google Scholar
Chapelle, O.: Some thoughts about Gaussian processes (2005). http://is.tuebingen.mpg.de/fileadmin/user_upload/files/publications/gp_[0].pdf
Chehreghani, M.H., Busetto, A.G., Buhmann, J.M.: Information theoretic model validation for spectral clustering. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 495–503 (2012)
Google Scholar
Damianou, A.C., Lawrence, N.D.: Deep Gaussian processes. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 207–215 (2013)
Google Scholar
Frank, M., Buhmann, J.M.: Selecting the rank of truncated SVD by maximum approximation capacity. In: IEEE International Symposium on Information Theory (ISIT), pp. 1036–1040 (2011)
Google Scholar
Gronskiy, A., Buhmann, J.: How informative are minimum spanning tree algorithms? In: IEEE International Symposium on Information Theory (ISIT), pp. 2277–2281 (2014)
Google Scholar
Horn, R.A., Johnson, C.R.: Matrix Analysis, 2nd edn. Cambridge University Press, Cambridge (2012)
Book Google Scholar
Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 620–630 (1957)
Article MathSciNet MATH Google Scholar
Jaynes, E.T.: Information theory and statistical mechanics. ii. Phys. Rev. 108, 171–190 (1957)
Article MathSciNet MATH Google Scholar
Lloyd, J.R., Duvenaud, D., Grosse, R., Tenenbaum, J.B., Ghahramani, Z.: Automatic construction and natural-language description of nonparametric regression models. In: AAAI Conference on Artificial Intelligence (AAAI) pp. 1242–1250 (2014)
Google Scholar
Nocedal, J.: Updating quasi-Newton matrices with limited storage. Math. Comput. 35, 773–782 (1980)
Article MathSciNet MATH Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press, Cambridge (2006)
MATH Google Scholar
Seeger, M.W.: PAC-Bayesian generalisation error bounds for Gaussian process classification. J. Mach. Learn. Res. 3, 233–269 (2002)
Article MathSciNet MATH Google Scholar
Tong, Y.L.: The Multivariate Normal Distribution. Springer Science & Business Media, New York (2012)
Google Scholar
Zee, A.: Quantum Field Theory in a Nutshell. Princeton University Press, Princeton (2003)
MATH Google Scholar
Zhu, X., Welling, M., Jin, F., Lowengrub, J.S.: Predicting simulation parameters of biological systems using a Gaussian process model. Stat. Anal. Data Min. 5, 509–522 (2012)
Article MathSciNet Google Scholar

Download references

Acknowledgments

This research was partially supported by the Max Planck ETH Center for Learning Systems and the SystemsX.ch project SignalX.

Author information

Authors and Affiliations

Department of Computer Science, ETH Zurich, Zürich, Switzerland
Nico S. Gorbach, Andrew An Bian, Benjamin Fischer, Stefan Bauer & Joachim M. Buhmann

Authors

Nico S. Gorbach
View author publications
You can also search for this author in PubMed Google Scholar
Andrew An Bian
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Fischer
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Bauer
View author publications
You can also search for this author in PubMed Google Scholar
Joachim M. Buhmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefan Bauer .

Editor information

Editors and Affiliations

University of Basel, Basel, Switzerland
Volker Roth
University of Basel, Basel, Switzerland
Thomas Vetter

Appendices

Appendix

A Propositions of Gaussian distribution

This is a collection of properties related to Gaussian distributions for the derivations in Sect. 2.2.

Proposition 1

If

$$ \begin{bmatrix} {\varvec{t}} \\ {\varvec{u}} \end{bmatrix} \thicksim {{\mathrm{\mathcal {N}}}}\left( {\begin{bmatrix} {\varvec{\mu }} \\ {\varvec{r}} \end{bmatrix}, \begin{bmatrix} \varvec{\varSigma }&\varvec{A} \\ \varvec{A}^\intercal&\varvec{V} \end{bmatrix}}\right) $$

then

[19, Theorem 3.3.4].

Proposition 2

If $ \varvec{\varLambda }$ is symmetric positive-definite, then

[20, 14].

Proposition 3

It holds that,

where $ {\varvec{r}}= \sum _{k = 1}^{K} \varvec{\varSigma }_{k} ^ {-1} {\varvec{\mu }}_{k} $ and $ \varvec{\varLambda }= \sum _{k = 1}^{K} \varvec{\varSigma }_{k} ^ {-1} $.

Proof

We shorten to move this factor $ \gamma $ independent of $ {\varvec{x}}$ out of the integral as in

The remaining integral can be calculated by Proposition 2.

Proposition 4

If $ \varvec{\varSigma }$ is symmetric positive-definite, then $ \varvec{\varSigma }$ is invertible and $ \varvec{\varSigma } ^ {-1} $ is symmetric positive-definite [12, 430].

Proposition 5

If $ \varvec{\varSigma }$ is symmetric positive-definite and $ \varvec{A}$ has full row rank, then $ \varvec{A}\varvec{\varSigma }\varvec{A}^\intercal $ is symmetric positive-definite [12, Observation 7.1.8.(b)].

Proposition 6

For $ \varvec{A}\in \mathbb {R}^ {D \times N} $ of full row rank, the density

has the equivalent form as , where $ {\varvec{r}}= \varvec{A}\varvec{\varSigma } ^ {-1} {\varvec{\mu }}$ and $ \varvec{\varLambda }= \varvec{A}\varvec{\varSigma } ^ {-1} \varvec{A}^\intercal $.

Proof

First, we separate a factor independent of $ {\varvec{x}}$ in

Therefore,

$$ p\left( {{\varvec{x}}}\right) = \frac{\exp \left( {{\varvec{x}}^\intercal \left( {{\varvec{r}}- \frac{1}{2} \varvec{\varLambda }{\varvec{x}}}\right) }\right) }{\int _{\mathbb {R}^ {D}}^{} \exp \left( {{\varvec{x}}^\intercal \left( {{\varvec{r}}- \frac{1}{2} \varvec{\varLambda }{\varvec{x}}}\right) }\right) \,\text{ d }^ {D} {\varvec{x}}}. $$

We now calculate the integral. From Proposition 4 and Proposition 5, one can see that $\varvec{\varLambda }$ is symmetric positive-definite, so that Proposition 2 can be applied to find

Finally, one gets

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gorbach, N.S., Bian, A.A., Fischer, B., Bauer, S., Buhmann, J.M. (2017). Model Selection for Gaussian Process Regression. In: Roth, V., Vetter, T. (eds) Pattern Recognition. GCPR 2017. Lecture Notes in Computer Science(), vol 10496. Springer, Cham. https://doi.org/10.1007/978-3-319-66709-6_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-66709-6_25
Published: 15 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66708-9
Online ISBN: 978-3-319-66709-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Model Selection for Gaussian Process Regression

Abstract

Access this chapter

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix

A Propositions of Gaussian distribution

Proposition 1

Proposition 2

Proposition 3

Proof

Proposition 4

Proposition 5

Proposition 6

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation