Upper and Lower Bounds on the Learning Curve for Gaussian Processes

Williams, Christopher K.I.; Vivarelli, Francesco

doi:10.1023/A:1007601601278

Upper and Lower Bounds on the Learning Curve for Gaussian Processes

Published: July 2000

Volume 40, pages 77–102, (2000)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Upper and Lower Bounds on the Learning Curve for Gaussian Processes

Download PDF

Christopher K.I. Williams¹ &
Francesco Vivarelli²

2124 Accesses
34 Citations
Explore all metrics

Abstract

In this paper we introduce and illustrate non-trivial upper and lower bounds on the learning curves for one-dimensional Guassian Processes. The analysis is carried out emphasising the effects induced on the bounds by the smoothness of the random process described by the Modified Bessel and the Squared Exponential covariance functions. We present an explanation of the early, linearly-decreasing behavior of the learning curves and the bounds as well as a study of the asymptotic behavior of the curves. The effects of the noise level and the lengthscale on the tightness of the bounds are also discussed.

References

Adler, R. J. (1981). The Geometry of Random Fields. New York: John Wiley and Sons.
Google Scholar
Akaike, H. (1974). A new look at statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.
Google Scholar
Barber, D. & Williams, C. K. I. (1997). Gaussian processes for Bayesian classification via hybrid Monte Carlo. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems, Vol. 9. MIT Press.
David, H. A. (1970). Order Statistics. New York: John Wiley and Sons.
Google Scholar
Gradshteyn, E. S. & Ryzhik, I. M. (1993). Table of Integrals, Series and Products, 5th ed. New York: Academic Press.
Google Scholar
Hansen, L. K. (1993). Stochastic linear learning: Exact test and training error averages. Neural Networks, 6, 393–396.
Google Scholar
Hastie, T. J. & Tibshirani, R. J. (1990). Generalized Additive Models. London: Chapman and Hall.
Google Scholar
Haussler, D. & Opper, M. (1997). Mutual information, metric entropy and cumulative relative entropy risk. The Annals of Statistics, 25(6), 2451.
Google Scholar
Ihara, S. (1993). Information Theory. Singapore: World Scientific Publishing.
Google Scholar
Matérn, B. (1986). Spatial Variation, 2nd ed. Berlin: Springer-Verlag. Lecture Notes in Statistics, Vol. 36.
Google Scholar
Michelli, C. A. & Wahba, G. (1981). Design problems for optimal surface interpolation. In Z. Ziegler (Ed.), Approximation Theory and Applications (pp. 329–348). Academic Press.
Murata, N., Yoshizawa, S., & Amari, S. (1994). Network information criterion–determining the number of hidden units for artificial neural network models. IEEE Transactions on Neural Networks, 5, 865–872.
Google Scholar
Neal, R. M. (1996). Bayesian Learning for Neural Networks. Springer. Lecture Notes in Statistics, Vol. 118.
Opper, M. (1997). Regression with gaussian processes: Average case performance. In M.W. Kwok-Yee, K. Irwin, & Y. Dit-Yan (Eds.), Theoretical Aspects of Neural Computation: A Multidisciplinary Perspective. Springer-Verlag.
Opper, M. & Vivarelli, F. (1999). General bounds on Bayes errors for regression with Gaussian Processes. In M. J. Kearns, S. A. Solla, & D. A. Cohn (Eds.), Advances in Neural Information Processing Systems, Vol. 11 (pp. 302–308). MIT Press.
Papoulis, A. (1991). Probability, Random Variables, and Stochastic Processes, 3rd ed. New York: McGraw-Hill.
Google Scholar
Plaskota, L. (1996). Noisy Information and Computational Complexity. Cambridge: Cambridge University Press.
Google Scholar
Qazaz, C. S., Williams, C. K. I., & Bishop, C. M. (1997). An upper bound on the Bayesian error bars for generalized linear regression. In S. W. Ellacott, J. C. Mason, & I. J. Anderson (Eds.), Mathematics of Neural Networks (pp. 295–299). Kluwer.
Rasmussen, C. E. (1996). Evaluation of Gaussian processes and other methods for non-linear regression. PhD Thesis, Department of Computer Science, University of Toronto, Toronto, Canada.
Google Scholar
Ritter, K. (1996). Almost optimal differentiation using noisy data. Journal of Approximation Theory, 86(3), 293–309.
Google Scholar
Ritter, K., Wasilkowski, G. W., & Wozniakowski, H. (1995). Multivariate integration and approximation for random fields satisfying Sacks-Ylvisaker conditions. Ann. Appl. Prob., 5, 518–540.
Google Scholar
Silverman, B. W. (1985). Some aspects of the spline smoothing approach to non-parametric regression curve filtering. Journal of the Royal Statistical Society B, 47(1), 1–52.
Google Scholar
Sollich, P. (1999). Learning curves for Gaussian processes. In M. J. Kearns, S. A. Solla, & D. A. Cohn (Eds.), Advances in Neural Information Processing Systems, Vol. 11 (pp. 344–350). MIT Press.
Stein, M. L. (1989). Comment on the paper by Sacks, J. et al. Design and Analysis of Computer Experiments. Statistical Science, 4(4):432–433.
Google Scholar
Valiant, L. G. (1984). A theory of the learnable. Communication of the Association for Computing Machinery, 27, 1134–1142.
Google Scholar
Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. New York: Springer-Verlag.
Google Scholar
Vivarelli, F. (1998). Studies on generalisation in Gaussian processes and Bayesian neural networks. PhD Thesis, Neural Computing Research Group, Aston University, Birmingham, United Kingdom.
Google Scholar
Whittle, P. (1963). Prediction and Regulation by Linear Least Square Methods. English Universities Press.
Williams, C. K. I. (1997). Computing with infinite networks. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems Vol. 9. MIT Press.

Download references

Author information

Authors and Affiliations

Division of Informatics, The University of Edinburgh, 5 Forrest Hill, Edinburgh, EH1 2QL, Scotland, UK
Christopher K.I. Williams
Neural Computing Research Group, Department of Computer Science and Applied Mathematics, Aston University, Birmingham, B4 7ET, UK
Francesco Vivarelli

Authors

Christopher K.I. Williams
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Vivarelli
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Williams, C.K., Vivarelli, F. Upper and Lower Bounds on the Learning Curve for Gaussian Processes. Machine Learning 40, 77–102 (2000). https://doi.org/10.1023/A:1007601601278

Download citation

Issue Date: July 2000
DOI: https://doi.org/10.1023/A:1007601601278

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Upper and Lower Bounds on the Learning Curve for Gaussian Processes

Abstract

Article PDF

Similar content being viewed by others

Convergence arguments to bridge cauchy and matérn covariance functions

Random Gradient-Free Minimization of Convex Functions

Multivariate Gaussian processes: definitions, examples and applications

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Upper and Lower Bounds on the Learning Curve for Gaussian Processes

Abstract

Article PDF

Similar content being viewed by others

Convergence arguments to bridge cauchy and matérn covariance functions

Random Gradient-Free Minimization of Convex Functions

Multivariate Gaussian processes: definitions, examples and applications

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation