Machine Learning

, Volume 98, Issue 3, pp 407–433 | Cite as

Asymptotic analysis of the learning curve for Gaussian process regression

Article

Abstract

This paper deals with the learning curve in a Gaussian process regression framework. The learning curve describes the generalization error of the Gaussian process used for the regression. The main result is the proof of a theorem giving the generalization error for a large class of correlation kernels and for any dimension when the number of observations is large. From this theorem, we can deduce the asymptotic behavior of the generalization error when the observation error is small. The presented proof generalizes previous ones that were limited to special kernels or to small dimensions (one or two). The theoretical results are applied to a nuclear safety problem.

Keywords

Gaussian process regression Asymptotic mean squared error  Learning curves Generalization error Convergence rate 

Notes

Acknowledgments

The authors are grateful to Dr. Yann Richet of the IRSN—Institute for Radiological Protection and Nuclear Safety—for providing the data for the industrial case through the reDICE project.

References

  1. Abramowitz, M., & Stegun, I. A. (1965). Handbook of mathematical functions. New York: Dover.MATHGoogle Scholar
  2. Berger, J. O., De Oliveira, V., & Sans, B. (2001). Objective bayesian analysis of spatially correlated data objective bayesian analysis of spatially correlated data. Journal of the American Statistical Association, 96, 1361–1374.MathSciNetCrossRefMATHGoogle Scholar
  3. Bozzini, M., & Rossini, M. (2003). Numerical differentiation of 2d functions from noisy data. Computer and Mathematics with Applications, 45, 309–327.MathSciNetCrossRefMATHGoogle Scholar
  4. Bronski, J. C. (2003). Asymptotics of Karhunen–Loève eigenvalues and tight constants for probability distributions of passive scalar transport. Communications in Mathematical Physics, 238, 563–582.MathSciNetCrossRefMATHGoogle Scholar
  5. Fang, K. T., Li, R., & Sudjianto, A. (2006). Design and modeling for computer experiments. Computer science and data analysis series. London: Chapman & Hall.MATHGoogle Scholar
  6. Fernex, F., Heulers, L., Jacquet, O., Miss, J., Richet, Y. (2005). The Moret 4b monte carlo code new features to treat complex criticality systems. In: MandC International Conference on Mathematics and Computation Supercomputing, Reactor and Nuclear and Biological Application, Avignon, France.Google Scholar
  7. Gneiting, T., Kleiber, W., & Schlater, M. (2010). Matérn cross-covariance functions for multivariate random fields. Journal of the American Statistical Association, 105, 1167–1177.MathSciNetCrossRefMATHGoogle Scholar
  8. Harville, D. A. (1997). Matrix algebra from statistician’s perspective. New York: Springer-Verlag.CrossRefMATHGoogle Scholar
  9. Laslett, G. M. (1994). Kriging and splines: An empirical comparison of their predictive performance in some applications kriging and splines: An empirical comparison of their predictive performance in some applications. Journal of the American Statistical Association, 89, 391–400.MathSciNetCrossRefGoogle Scholar
  10. Mercer, J. (1909). Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society A, 209, 441–458.CrossRefMATHGoogle Scholar
  11. Nazarov, A. I., & Nikitin, Y. Y. (2004). Exact \(\text{ l }_2\)-small ball behaviour of integrated Gaussian processes and spectral asymptotics of boundary value problems. Probability Theory and Related Fields, 129, 469–494.MathSciNetCrossRefMATHGoogle Scholar
  12. Opper, M., & Vivarelli, F. (1999). General bounds on Bayes errors for regression with Gaussian processes. Advances in Neural Information Processing Systems, 11, 302–308.Google Scholar
  13. Picheny, V. (2009). Improving accuracy and compensating for uncertainty in surrogate modeling. PhD thesis, Ecole Nationale Supérieure des Mines de Saint Etienne.Google Scholar
  14. Pusev, R. S. (2011). Small deviation asymptotics for Matérn processes and fields under weighted quadratic norm. Theory of Probability and its Applications, 55, 164–172.Google Scholar
  15. Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. Cambridge: MIT Press.MATHGoogle Scholar
  16. Ritter, K. (2000a). Almost optimal differentiation using noisy data. Journal of Approximation Theory, 86, 293–309.MathSciNetCrossRefMATHGoogle Scholar
  17. Ritter, K. (2000b). Average-case analysis of numerical problems. Berlin: Springer Verlag.CrossRefMATHGoogle Scholar
  18. Sacks, J., & Ylvisaker, D. (1981). Variance estimation for approximately linear models. Series Statistics, 12, 147–162.MathSciNetCrossRefMATHGoogle Scholar
  19. Sacks, J., Welch, W. J., Mitchell, T. J., & Wynn, H. P. (1989). Design and analysis of computer experiments. Statistical Science, 4, 409–423.MathSciNetCrossRefMATHGoogle Scholar
  20. Seeger, M. W., Kakade, S. M., & Foster, D. P. (2008). Information consistency of nonparametric Gaussian process methods. IEEE Transactions on Information Theory, 54(5), 2376–2382.Google Scholar
  21. Shanno, D. F. (1970). Conditioning of quasi-Newton methods for function minimization. Mathematics of Computation, 24, 647–656.MathSciNetCrossRefMATHGoogle Scholar
  22. Sollich, P., & Halees, A. (2002). Learning curves for Gaussian process regression: Approximations and bounds. Neural Computation, 14, 1393–1428.CrossRefMATHGoogle Scholar
  23. Stein, M. L. (1999). Interpolation of spatial data. Series in statistics. New York: Springer.CrossRefGoogle Scholar
  24. van der Vaart, A., & van Zanten, H. (2011). Information rates of nonparametric Gaussian process methods. The Journal of Machine Learning Research, 12, 2095–2119.Google Scholar
  25. Wackernagel, H. (2003). Multivariate geostatistics. Berlin: Springer-Verlag.CrossRefMATHGoogle Scholar
  26. Williams, C. K. I., & Vivarelli, F. (2000). Upper and lower bounds on the learning curve for Gaussian processes. Machine Learning, 40, 77–102.CrossRefGoogle Scholar

Copyright information

© The Author(s) 2014

Authors and Affiliations

  1. 1.Université Paris DiderotParis Cedex 13France
  2. 2.CEA, DAM, DIFArpajonFrance
  3. 3.Laboratoire de Probabilites et Modeles Aleatoires & Laboratoire Jacques-Louis LionsUniversite Paris DiderotParis Cedex 13France

Personalised recommendations