Advertisement

Statistics and Computing

, Volume 8, Issue 1, pp 35–54 | Cite as

Some results concerning off-training-set and IID error for the Gibbs and the Bayes optimal generalizers

  • DAVID H. WOLPERT
  • EMANUEL KNILL
  • TAL GROSSMAN
Article
  • 99 Downloads

Abstract

In this paper we analyse the average behaviour of the Bayes-optimal and Gibbs learning algorithms. We do this both for off-training-set error and conventional IID (independent identically distributed) error (for which test sets overlap with training sets). For the IID case we provide a major extension to one of the better known results. We also show that expected IID test set error is a non-increasing function of training set size for either algorithm. On the other hand, as we show, the expected off-training-set error for both learning algorithms can increase with training set size, for non-uniform sampling distributions. We characterize the relationship the sampling distribution must have with the prior for such an increase. We show in particular that for uniform sampling distributions and either algorithm, the expected off-training-set error is a non-increasing function of training set size. For uniform sampling distributions, we also characterize the priors for which the expected error of the Bayes-optimal algorithm stays constant. In addition we show that for the Bayes-optimal algorithm, expected off-training-set error can increase with training set size when the target function is fixed, but if and only if the expected error averaged over all targets decreases with training set size. Our results hold for arbitrary noise and arbitrary loss functions.

Supervised learning learning curves off-training-set Bayes-optimal Gibbs IID 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bayarri, M. J. and Berger, J. O. Applications and limitations of robust Bayesian bounds and type II MLE, Purdue University Department of Statistics TR 93-11C.Google Scholar
  2. Berger, J. (1985) Statistical Decision Theory and Bayesian Analysis. New York: Springer-Verlag.Google Scholar
  3. Blumer, A. et al.(1987) Occam's Razor. Information Processing Letters, 24, 377-80.Google Scholar
  4. Chaloner, K. and Verdinelli, I. Bayesian experimental design: a review. Minneapolis: University of Minnesota, Department of Statistics.Google Scholar
  5. Cussens, J. A Bayesian analysis of algorithms for learning finite functions. Glasgow Caledonian University, Department of Mathematics.Google Scholar
  6. Dietterich, T. (1990) Annual Review of Computer Science, 4, 255-306.Google Scholar
  7. Gustafson, P. The local Sensitivity of posterior expectations. Carnegie Mellon University, Department of Statistics.Google Scholar
  8. Haussler, D., Kearns, M. and Schapire, R. (1994) Machine Learning, 14, 83-115.Google Scholar
  9. Hill, S. D. and Spall, J. C. (1994) Sensitivity of a Bayesian Analysis to the Prior Distribution. IEEE TSMC 24, 216-221.Google Scholar
  10. Kearns, M. J. et al.(1992) Towards efficient agnostic learning. In Proceedings of the 5th Annual Workshop on Computational Learning Theory, ACM Press. New York.Google Scholar
  11. Tom Mitchell (1994) Quote from Machine Learning course at Carnegie Mellon University Computer Science Department.Google Scholar
  12. Opper, M. and Haussler, D. (1991a) In Proceedings of the 4th Annual Workshop on Computational Learning Theory, pp. 75-87, Morgan Kaufmann.Google Scholar
  13. Opper, M. and Haussler, D. (1991b) Physics Review Letters, 66, 2677-80.Google Scholar
  14. Parzen, E. (1960) Modern Probability Theory and its Applications. New York: Wiley.Google Scholar
  15. Schwartz, D., Samalam, V., Solla, S. and Denker, J. (1990) Exhaustive Learning. Neural Computation, 2, 374-85.Google Scholar
  16. Tishby, N., Levin, E. and Solla, S. (1989) In International Joint Conference on Neural Networks, Vol. II, pp. 403-9 IEEE.Google Scholar
  17. van der Broeck, C. and Kawai, R. (1990) Physics Review A, 42, 6210-18 and in Proceedings of the International AMSE Conference on Neural Networks, San Diego (USA), May 1991, Vol. 1, pp. 151-62Google Scholar
  18. Vapnik, V. (1992) Estimation of Dependences Based on Empirical Data, Springer-Verlag.Google Scholar
  19. Weiss, S. M. and Kulikowski, C. A. (1991) Computer Systems that Learn. San Mateo, CA: Morgan Kauffman.Google Scholar
  20. Wolpert, D. (1994a) In D. Wolpert (ed.) The Mathematics of Generalization, pp. 117-214, Addison-Wesley.Google Scholar
  21. Wolpert, D (1994b) In S. Hanson et al.(eds) Neural Information Processing Systems 6. San Mateo, CA: Morgan-Kauffman.Google Scholar
  22. Wolpert, D. and Lapedes, A. (1994) In D. Wolpert (ed.) The Mathematics of Generalization, pp. 243-78, Addison-Wesley.Google Scholar
  23. Wolpert, D. Off-Training Set Error and a priori distinctions between Learning Algorithms, SFI TR 95-01-003.Google Scholar

Copyright information

© Chapman and Hall 1998

Authors and Affiliations

  • DAVID H. WOLPERT
    • 1
  • EMANUEL KNILL
    • 2
  • TAL GROSSMAN
    • 3
  1. 1.NASA Ames Research Centre, Caelum ResearchMoffet FieldUSA email
  2. 2.CIC-3 Computer Research and Applications, MSB265, LANLLos Alamos, NMUSA
  3. 3.Theoretical Division andCNLS, MS B213, LANLLos AlamosUSA

Personalised recommendations