Abstract
In this paper we introduce and illustrate non-trivial upper and lower bounds on the learning curves for one-dimensional Guassian Processes. The analysis is carried out emphasising the effects induced on the bounds by the smoothness of the random process described by the Modified Bessel and the Squared Exponential covariance functions. We present an explanation of the early, linearly-decreasing behavior of the learning curves and the bounds as well as a study of the asymptotic behavior of the curves. The effects of the noise level and the lengthscale on the tightness of the bounds are also discussed.
Article PDF
Similar content being viewed by others
References
Adler, R. J. (1981). The Geometry of Random Fields. New York: John Wiley and Sons.
Akaike, H. (1974). A new look at statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.
Barber, D. & Williams, C. K. I. (1997). Gaussian processes for Bayesian classification via hybrid Monte Carlo. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems, Vol. 9. MIT Press.
David, H. A. (1970). Order Statistics. New York: John Wiley and Sons.
Gradshteyn, E. S. & Ryzhik, I. M. (1993). Table of Integrals, Series and Products, 5th ed. New York: Academic Press.
Hansen, L. K. (1993). Stochastic linear learning: Exact test and training error averages. Neural Networks, 6, 393–396.
Hastie, T. J. & Tibshirani, R. J. (1990). Generalized Additive Models. London: Chapman and Hall.
Haussler, D. & Opper, M. (1997). Mutual information, metric entropy and cumulative relative entropy risk. The Annals of Statistics, 25(6), 2451.
Ihara, S. (1993). Information Theory. Singapore: World Scientific Publishing.
Matérn, B. (1986). Spatial Variation, 2nd ed. Berlin: Springer-Verlag. Lecture Notes in Statistics, Vol. 36.
Michelli, C. A. & Wahba, G. (1981). Design problems for optimal surface interpolation. In Z. Ziegler (Ed.), Approximation Theory and Applications (pp. 329–348). Academic Press.
Murata, N., Yoshizawa, S., & Amari, S. (1994). Network information criterion–determining the number of hidden units for artificial neural network models. IEEE Transactions on Neural Networks, 5, 865–872.
Neal, R. M. (1996). Bayesian Learning for Neural Networks. Springer. Lecture Notes in Statistics, Vol. 118.
Opper, M. (1997). Regression with gaussian processes: Average case performance. In M.W. Kwok-Yee, K. Irwin, & Y. Dit-Yan (Eds.), Theoretical Aspects of Neural Computation: A Multidisciplinary Perspective. Springer-Verlag.
Opper, M. & Vivarelli, F. (1999). General bounds on Bayes errors for regression with Gaussian Processes. In M. J. Kearns, S. A. Solla, & D. A. Cohn (Eds.), Advances in Neural Information Processing Systems, Vol. 11 (pp. 302–308). MIT Press.
Papoulis, A. (1991). Probability, Random Variables, and Stochastic Processes, 3rd ed. New York: McGraw-Hill.
Plaskota, L. (1996). Noisy Information and Computational Complexity. Cambridge: Cambridge University Press.
Qazaz, C. S., Williams, C. K. I., & Bishop, C. M. (1997). An upper bound on the Bayesian error bars for generalized linear regression. In S. W. Ellacott, J. C. Mason, & I. J. Anderson (Eds.), Mathematics of Neural Networks (pp. 295–299). Kluwer.
Rasmussen, C. E. (1996). Evaluation of Gaussian processes and other methods for non-linear regression. PhD Thesis, Department of Computer Science, University of Toronto, Toronto, Canada.
Ritter, K. (1996). Almost optimal differentiation using noisy data. Journal of Approximation Theory, 86(3), 293–309.
Ritter, K., Wasilkowski, G. W., & Wozniakowski, H. (1995). Multivariate integration and approximation for random fields satisfying Sacks-Ylvisaker conditions. Ann. Appl. Prob., 5, 518–540.
Silverman, B. W. (1985). Some aspects of the spline smoothing approach to non-parametric regression curve filtering. Journal of the Royal Statistical Society B, 47(1), 1–52.
Sollich, P. (1999). Learning curves for Gaussian processes. In M. J. Kearns, S. A. Solla, & D. A. Cohn (Eds.), Advances in Neural Information Processing Systems, Vol. 11 (pp. 344–350). MIT Press.
Stein, M. L. (1989). Comment on the paper by Sacks, J. et al. Design and Analysis of Computer Experiments. Statistical Science, 4(4):432–433.
Valiant, L. G. (1984). A theory of the learnable. Communication of the Association for Computing Machinery, 27, 1134–1142.
Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. New York: Springer-Verlag.
Vivarelli, F. (1998). Studies on generalisation in Gaussian processes and Bayesian neural networks. PhD Thesis, Neural Computing Research Group, Aston University, Birmingham, United Kingdom.
Whittle, P. (1963). Prediction and Regulation by Linear Least Square Methods. English Universities Press.
Williams, C. K. I. (1997). Computing with infinite networks. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems Vol. 9. MIT Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Williams, C.K., Vivarelli, F. Upper and Lower Bounds on the Learning Curve for Gaussian Processes. Machine Learning 40, 77–102 (2000). https://doi.org/10.1023/A:1007601601278
Issue Date:
DOI: https://doi.org/10.1023/A:1007601601278