This paper gives PAC guarantees for “Bayesian” algorithms—algorithms that optimize risk minimization expressions involving a prior probability and a likelihood for the training data. PAC-Bayesian algorithms are motivated by a desire to provide an informative prior encoding information about the expected experimental setting but still having PAC performance guarantees over all IID settings. The PAC-Bayesian theorems given here apply to an arbitrary prior measure on an arbitrary concept space. These theorems provide an alternative to the use of VC dimension in proving PAC bounds for parameterized concepts.
- Barron, A.R. (1991). Complexity regularization with application to artificial neural networks. In G. Roussas, (Ed.) Nonparametric Functional Estimation and Related Topics, Kluwer Academic Publishers.Google Scholar
- Barron, A.R., & Cover, T.M. (1991). Minimum complexity density estimation. IEEE Transactions on Information Theory, 37, 1034–1054.Google Scholar
- Kearns, M., Mansour, Y., Ng, A., & Ron, D. (1995). An experimental and theoretical comparison of model selection methods. Proceedings of the Eighth ACM Conference on Computational Learning Theory (pp. 21–30) ACM Press.Google Scholar
- Linial, N., Mansour, Y., & Rivest, R. (1991). Results on learnability and the Vapnik-Chervonenkis dimension. Information and Computation, 90, 33–49.Google Scholar
- Lugosi, G., & Zeger, K. (1996). Concept learning using complexity regularization. IEEE Transactions on Information Theory, 42, 48–54.Google Scholar
- Shawe-Taylor, J., Bartlett, P., Williamson, R., & Anthony, M. (1996). A framework for structural risk minimization. Proceedings of the Ninth Annual conference on Computational Learning Theory (pp. 68–76). ACM Press.Google Scholar
- Shawe-Taylor, J., & Williamson, R. (1997). A PAC analysis of a Bayesian estimator. Proceedings of the Tenth Annual Conference on Computational Learning Theory. ACM Press.Google Scholar