Skip to main content

Upper and Lower Bounds on the Learning Curve for Gaussian Processes

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.


In this paper we introduce and illustrate non-trivial upper and lower bounds on the learning curves for one-dimensional Guassian Processes. The analysis is carried out emphasising the effects induced on the bounds by the smoothness of the random process described by the Modified Bessel and the Squared Exponential covariance functions. We present an explanation of the early, linearly-decreasing behavior of the learning curves and the bounds as well as a study of the asymptotic behavior of the curves. The effects of the noise level and the lengthscale on the tightness of the bounds are also discussed.


  • Adler, R. J. (1981). The Geometry of Random Fields. New York: John Wiley and Sons.

    Google Scholar 

  • Akaike, H. (1974). A new look at statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.

    Google Scholar 

  • Barber, D. & Williams, C. K. I. (1997). Gaussian processes for Bayesian classification via hybrid Monte Carlo. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems, Vol. 9. MIT Press.

  • David, H. A. (1970). Order Statistics. New York: John Wiley and Sons.

    Google Scholar 

  • Gradshteyn, E. S. & Ryzhik, I. M. (1993). Table of Integrals, Series and Products, 5th ed. New York: Academic Press.

    Google Scholar 

  • Hansen, L. K. (1993). Stochastic linear learning: Exact test and training error averages. Neural Networks, 6, 393–396.

    Google Scholar 

  • Hastie, T. J. & Tibshirani, R. J. (1990). Generalized Additive Models. London: Chapman and Hall.

    Google Scholar 

  • Haussler, D. & Opper, M. (1997). Mutual information, metric entropy and cumulative relative entropy risk. The Annals of Statistics, 25(6), 2451.

    Google Scholar 

  • Ihara, S. (1993). Information Theory. Singapore: World Scientific Publishing.

    Google Scholar 

  • Matérn, B. (1986). Spatial Variation, 2nd ed. Berlin: Springer-Verlag. Lecture Notes in Statistics, Vol. 36.

    Google Scholar 

  • Michelli, C. A. & Wahba, G. (1981). Design problems for optimal surface interpolation. In Z. Ziegler (Ed.), Approximation Theory and Applications (pp. 329–348). Academic Press.

  • Murata, N., Yoshizawa, S., & Amari, S. (1994). Network information criterion–determining the number of hidden units for artificial neural network models. IEEE Transactions on Neural Networks, 5, 865–872.

    Google Scholar 

  • Neal, R. M. (1996). Bayesian Learning for Neural Networks. Springer. Lecture Notes in Statistics, Vol. 118.

  • Opper, M. (1997). Regression with gaussian processes: Average case performance. In M.W. Kwok-Yee, K. Irwin, & Y. Dit-Yan (Eds.), Theoretical Aspects of Neural Computation: A Multidisciplinary Perspective. Springer-Verlag.

  • Opper, M. & Vivarelli, F. (1999). General bounds on Bayes errors for regression with Gaussian Processes. In M. J. Kearns, S. A. Solla, & D. A. Cohn (Eds.), Advances in Neural Information Processing Systems, Vol. 11 (pp. 302–308). MIT Press.

  • Papoulis, A. (1991). Probability, Random Variables, and Stochastic Processes, 3rd ed. New York: McGraw-Hill.

    Google Scholar 

  • Plaskota, L. (1996). Noisy Information and Computational Complexity. Cambridge: Cambridge University Press.

    Google Scholar 

  • Qazaz, C. S., Williams, C. K. I., & Bishop, C. M. (1997). An upper bound on the Bayesian error bars for generalized linear regression. In S. W. Ellacott, J. C. Mason, & I. J. Anderson (Eds.), Mathematics of Neural Networks (pp. 295–299). Kluwer.

  • Rasmussen, C. E. (1996). Evaluation of Gaussian processes and other methods for non-linear regression. PhD Thesis, Department of Computer Science, University of Toronto, Toronto, Canada.

    Google Scholar 

  • Ritter, K. (1996). Almost optimal differentiation using noisy data. Journal of Approximation Theory, 86(3), 293–309.

    Google Scholar 

  • Ritter, K., Wasilkowski, G. W., & Wozniakowski, H. (1995). Multivariate integration and approximation for random fields satisfying Sacks-Ylvisaker conditions. Ann. Appl. Prob., 5, 518–540.

    Google Scholar 

  • Silverman, B. W. (1985). Some aspects of the spline smoothing approach to non-parametric regression curve filtering. Journal of the Royal Statistical Society B, 47(1), 1–52.

    Google Scholar 

  • Sollich, P. (1999). Learning curves for Gaussian processes. In M. J. Kearns, S. A. Solla, & D. A. Cohn (Eds.), Advances in Neural Information Processing Systems, Vol. 11 (pp. 344–350). MIT Press.

  • Stein, M. L. (1989). Comment on the paper by Sacks, J. et al. Design and Analysis of Computer Experiments. Statistical Science, 4(4):432–433.

    Google Scholar 

  • Valiant, L. G. (1984). A theory of the learnable. Communication of the Association for Computing Machinery, 27, 1134–1142.

    Google Scholar 

  • Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. New York: Springer-Verlag.

    Google Scholar 

  • Vivarelli, F. (1998). Studies on generalisation in Gaussian processes and Bayesian neural networks. PhD Thesis, Neural Computing Research Group, Aston University, Birmingham, United Kingdom.

    Google Scholar 

  • Whittle, P. (1963). Prediction and Regulation by Linear Least Square Methods. English Universities Press.

  • Williams, C. K. I. (1997). Computing with infinite networks. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems Vol. 9. MIT Press.

Download references

Author information

Authors and Affiliations


Rights and permissions

Reprints and Permissions

About this article

Cite this article

Williams, C.K., Vivarelli, F. Upper and Lower Bounds on the Learning Curve for Gaussian Processes. Machine Learning 40, 77–102 (2000).

Download citation

  • Issue Date:

  • DOI:

  • Gaussian processes
  • Bayesian inference
  • generalisation error
  • bounds