Skip to main content
Log in

Some results concerning off-training-set and IID error for the Gibbs and the Bayes optimal generalizers

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In this paper we analyse the average behaviour of the Bayes-optimal and Gibbs learning algorithms. We do this both for off-training-set error and conventional IID (independent identically distributed) error (for which test sets overlap with training sets). For the IID case we provide a major extension to one of the better known results. We also show that expected IID test set error is a non-increasing function of training set size for either algorithm. On the other hand, as we show, the expected off-training-set error for both learning algorithms can increase with training set size, for non-uniform sampling distributions. We characterize the relationship the sampling distribution must have with the prior for such an increase. We show in particular that for uniform sampling distributions and either algorithm, the expected off-training-set error is a non-increasing function of training set size. For uniform sampling distributions, we also characterize the priors for which the expected error of the Bayes-optimal algorithm stays constant. In addition we show that for the Bayes-optimal algorithm, expected off-training-set error can increase with training set size when the target function is fixed, but if and only if the expected error averaged over all targets decreases with training set size. Our results hold for arbitrary noise and arbitrary loss functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bayarri, M. J. and Berger, J. O. Applications and limitations of robust Bayesian bounds and type II MLE, Purdue University Department of Statistics TR 93-11C.

  • Berger, J. (1985) Statistical Decision Theory and Bayesian Analysis. New York: Springer-Verlag.

    Google Scholar 

  • Blumer, A. et al.(1987) Occam's Razor. Information Processing Letters, 24, 377-80.

    Google Scholar 

  • Chaloner, K. and Verdinelli, I. Bayesian experimental design: a review. Minneapolis: University of Minnesota, Department of Statistics.

  • Cussens, J. A Bayesian analysis of algorithms for learning finite functions. Glasgow Caledonian University, Department of Mathematics.

  • Dietterich, T. (1990) Annual Review of Computer Science, 4, 255-306.

    Google Scholar 

  • Gustafson, P. The local Sensitivity of posterior expectations. Carnegie Mellon University, Department of Statistics.

  • Haussler, D., Kearns, M. and Schapire, R. (1994) Machine Learning, 14, 83-115.

    Google Scholar 

  • Hill, S. D. and Spall, J. C. (1994) Sensitivity of a Bayesian Analysis to the Prior Distribution. IEEE TSMC 24, 216-221.

    Google Scholar 

  • Kearns, M. J. et al.(1992) Towards efficient agnostic learning. In Proceedings of the 5th Annual Workshop on Computational Learning Theory, ACM Press. New York.

    Google Scholar 

  • Tom Mitchell (1994) Quote from Machine Learning course at Carnegie Mellon University Computer Science Department.

  • Opper, M. and Haussler, D. (1991a) In Proceedings of the 4th Annual Workshop on Computational Learning Theory, pp. 75-87, Morgan Kaufmann.

  • Opper, M. and Haussler, D. (1991b) Physics Review Letters, 66, 2677-80.

    Google Scholar 

  • Parzen, E. (1960) Modern Probability Theory and its Applications. New York: Wiley.

    Google Scholar 

  • Schwartz, D., Samalam, V., Solla, S. and Denker, J. (1990) Exhaustive Learning. Neural Computation, 2, 374-85.

    Google Scholar 

  • Tishby, N., Levin, E. and Solla, S. (1989) In International Joint Conference on Neural Networks, Vol. II, pp. 403-9 IEEE.

  • van der Broeck, C. and Kawai, R. (1990) Physics Review A, 42, 6210-18 and in Proceedings of the International AMSE Conference on Neural Networks, San Diego (USA), May 1991, Vol. 1, pp. 151-62

    Google Scholar 

  • Vapnik, V. (1992) Estimation of Dependences Based on Empirical Data, Springer-Verlag.

  • Weiss, S. M. and Kulikowski, C. A. (1991) Computer Systems that Learn. San Mateo, CA: Morgan Kauffman.

    Google Scholar 

  • Wolpert, D. (1994a) In D. Wolpert (ed.) The Mathematics of Generalization, pp. 117-214, Addison-Wesley.

  • Wolpert, D (1994b) In S. Hanson et al.(eds) Neural Information Processing Systems 6. San Mateo, CA: Morgan-Kauffman.

    Google Scholar 

  • Wolpert, D. and Lapedes, A. (1994) In D. Wolpert (ed.) The Mathematics of Generalization, pp. 243-78, Addison-Wesley.

  • Wolpert, D. Off-Training Set Error and a priori distinctions between Learning Algorithms, SFI TR 95-01-003.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

WOLPERT, D.H., KNILL, E. & GROSSMAN, T. Some results concerning off-training-set and IID error for the Gibbs and the Bayes optimal generalizers. Statistics and Computing 8, 35–54 (1998). https://doi.org/10.1023/A:1008867009312

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008867009312

Navigation