Abstract
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants. In the ‘evidence framework’ the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized hyperparameters are used to define a Gaussian approximation to the posterior distribution. In the alternative ‘MAP’ method, the true posterior probability is found by integrating over the hyperparameters. The true posterior is then maximized over the model parameters, and a Gaussian approximation is made. The similarities of the two approaches, and their relative merits, are discussed, and comparisons are made with the ideal hierarchical Bayesian solution.
In moderately ill-posed problems, integration over hyperparameters yields a probability distribution with a skew peak which causes significant biases to arise in the MAP method. In contrast, the evidence framework is shown to introduce negligible predictive error, under straightforward conditions.
General lessons are drawn concerning the distinctive properties of inference in many dimensions.
“Integrating over a nuisance parameter is very much like estimsting the parameter from the data, and then using that estimate in our equations.” G.L. Bretthorst
“This integration would be counter-productive as far as practical manipulation is concerned.” S.F. Gull
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Box, G. E. P., and TIAO, G. C. (1973) Bayesian inference in statistical analysis Addison—Wesley.
Bretthorst, G. (1988) Bayesian spectrum analysis and parameter estimation. Springer.
Bryan, R. (1990) Solving oversampled data problems by Maximum Entropy. In Maximum Entropy and Bayesian Methods, Dartmouth, U.S.A., 1989, ed. Bryan, R, pp. 221–232. Kluwer.
Buntine, W., and Weigend, A. (1991) Bayesian back—propagation. Complex Systems 5: 603–643.
Gull, S. F. (1988) Bayesian inductive inference and maximum entropy. In Maximum Entropy and Bayesian Methods in Science and Engineering, vol. I: Foundations, ed. by G. Erickson and C. Smith, pp. 53–74, Dordrecht. Kluwer.
Gull, S. F. (1989) Developments in maximum entropy data analysis. In Maximum Entropy and Bayesian Methods, Cambridge 1988, ed. by J. Skilling, pp. 53–71, Dordrecht. Kluwer.
Mackay. D. J. C. (1992a) Bayesian interpolation. Neural Computation 4 (3): 415–447.
Mackay. D. J. C. (1992b) A practical Bayesian framework for backpropagation networks. Neural G’omputation 4 (3): 448–472.
Mackay. D. J. C. (1992c) The evidence framework applied to classification networks. Neural Computation 4 (5): 698–714.
Mackay. D. J. C. (1994) Bayesian non-linear modelling for the 1993 energy prediction competition. In Maximum Entropy and Bayesian Methods, Santa Barbara 1993, ed. by G. Heidbreder, Dordrecht. Kluwer.
Neal, R. M. (1993a) Bayesian learning via stochastic dynamics. In Advances in Neural Information Processing Systems 5, ed. by C. L. Giles, S. J. Hanson, and J. D. Cowan, pp. 475–482, San Mateo, California. Morgan Kaufmann.
Neal, R. M. (1993b) Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG—TR-93–1, Dept. of Computer Science, University of Toronto.
Reif, F. (1965) Fundamentals of statistical and thermal physics McGraw—Hill.
Skilling. J. (1993) Bayesian numerical analysis. In Physics and Probability, ed. by W. T. Grandy, Jr. and P. Milonni, Cambridge. C.U.P.
Strauss. C. E. M., Wolpert, D. H., and Wolf, D. R. (1993) Alpha, evidence, and the entropic prior. In Maximum Entropy and Bayesian Methods, Paris 1992, ed. by A. Mohammed-Djafari, Dordrecht. Kluwer.
Thodberg, H. H. (1993) Ace of Bayes: application of neural networks with pruning. Technical Report 1132 E, Danish meat research institute.
Wahba, G. (1975) A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. Numer. Math 24: 383–393.
Weir, N. (1991) Applications of maxmimum entropy techniques to HST data. In Proceedings of the ESO/ST—ECF Data Analysis Workshop, April 1991.
Wolpert. D. H. (1993) On the use of evidence in neural networks. In Advances in Neural Information Processing Systems 5, ed. by C. L. Giles, S. J. Hanson, and J. D. Cowan, pp. 539–546, San Mateo, California. Morgan Kaufmann.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
MacKay, D.J.C. (1996). Hyperparameters: Optimize, or Integrate Out?. In: Heidbreder, G.R. (eds) Maximum Entropy and Bayesian Methods. Fundamental Theories of Physics, vol 62. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-8729-7_2
Download citation
DOI: https://doi.org/10.1007/978-94-015-8729-7_2
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-4407-5
Online ISBN: 978-94-015-8729-7
eBook Packages: Springer Book Archive