Model Selection for Gaussian Process Regression
Gaussian processes are powerful tools since they can model non-linear dependencies between inputs, while remaining analytically tractable. A Gaussian process is characterized by a mean function and a covariance function (kernel), which are determined by a model selection criterion. The functions to be compared do not just differ in their parametrization but in their fundamental structure. It is often not clear which function structure to choose, for instance to decide between a squared exponential and a rational quadratic kernel. Based on the principle of posterior agreement, we develop a general framework for model selection to rank kernels for Gaussian process regression and compare it with maximum evidence (also called marginal likelihood) and leave-one-out cross-validation. Given the disagreement between current state-of-the-art methods in our experiments, we show the difficulty of model selection and the need for an information-theoretic approach.
This research was partially supported by the Max Planck ETH Center for Learning Systems and the SystemsX.ch project SignalX.
- 2.Bian, A.A., Gronskiy, A., Buhmann, J.M.: Information-theoretic analysis of maxcut algorithms. Technical report, Department of Computer Science, ETH Zurich (2016). http://people.inf.ethz.ch/ybian/docs/pa.pdf
- 3.Bian, Y., Gronskiy, A., Buhmann, J.M.: Greedy maxcut algorithms and their information content. In: IEEE Information Theory Workshop (ITW), pp. 1–5 (2015)Google Scholar
- 4.Buhmann, J.M.: Information theoretic model validation for clustering. In: IEEE International Symposium on Information Theory (ISIT), pp. 1398–1402 (2010)Google Scholar
- 7.Chapelle, O.: Some thoughts about Gaussian processes (2005). http://is.tuebingen.mpg.de/fileadmin/user_upload/files/publications/gp_.pdf
- 8.Chehreghani, M.H., Busetto, A.G., Buhmann, J.M.: Information theoretic model validation for spectral clustering. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 495–503 (2012)Google Scholar
- 9.Damianou, A.C., Lawrence, N.D.: Deep Gaussian processes. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 207–215 (2013)Google Scholar
- 10.Frank, M., Buhmann, J.M.: Selecting the rank of truncated SVD by maximum approximation capacity. In: IEEE International Symposium on Information Theory (ISIT), pp. 1036–1040 (2011)Google Scholar
- 11.Gronskiy, A., Buhmann, J.: How informative are minimum spanning tree algorithms? In: IEEE International Symposium on Information Theory (ISIT), pp. 2277–2281 (2014)Google Scholar
- 15.Lloyd, J.R., Duvenaud, D., Grosse, R., Tenenbaum, J.B., Ghahramani, Z.: Automatic construction and natural-language description of nonparametric regression models. In: AAAI Conference on Artificial Intelligence (AAAI) pp. 1242–1250 (2014)Google Scholar
- 19.Tong, Y.L.: The Multivariate Normal Distribution. Springer Science & Business Media, New York (2012)Google Scholar