Hyperparameters: Optimize, or Integrate Out?

MacKay, David J. C.

doi:10.1007/978-94-015-8729-7_2

David J. C. MacKay³

Part of the book series: Fundamental Theories of Physics ((FTPH,volume 62))

492 Accesses
29 Citations

Abstract

I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants. In the ‘evidence framework’ the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized hyperparameters are used to define a Gaussian approximation to the posterior distribution. In the alternative ‘MAP’ method, the true posterior probability is found by integrating over the hyperparameters. The true posterior is then maximized over the model parameters, and a Gaussian approximation is made. The similarities of the two approaches, and their relative merits, are discussed, and comparisons are made with the ideal hierarchical Bayesian solution.

In moderately ill-posed problems, integration over hyperparameters yields a probability distribution with a skew peak which causes significant biases to arise in the MAP method. In contrast, the evidence framework is shown to introduce negligible predictive error, under straightforward conditions.

General lessons are drawn concerning the distinctive properties of inference in many dimensions.

“Integrating over a nuisance parameter is very much like estimsting the parameter from the data, and then using that estimate in our equations.” G.L. Bretthorst

“This integration would be counter-productive as far as practical manipulation is concerned.” S.F. Gull

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Box, G. E. P., and TIAO, G. C. (1973) Bayesian inference in statistical analysis Addison—Wesley.
Google Scholar
Bretthorst, G. (1988) Bayesian spectrum analysis and parameter estimation. Springer.
Google Scholar
Bryan, R. (1990) Solving oversampled data problems by Maximum Entropy. In Maximum Entropy and Bayesian Methods, Dartmouth, U.S.A., 1989, ed. Bryan, R, pp. 221–232. Kluwer.
Google Scholar
Buntine, W., and Weigend, A. (1991) Bayesian back—propagation. Complex Systems 5: 603–643.
MATH Google Scholar
Gull, S. F. (1988) Bayesian inductive inference and maximum entropy. In Maximum Entropy and Bayesian Methods in Science and Engineering, vol. I: Foundations, ed. by G. Erickson and C. Smith, pp. 53–74, Dordrecht. Kluwer.
Google Scholar
Gull, S. F. (1989) Developments in maximum entropy data analysis. In Maximum Entropy and Bayesian Methods, Cambridge 1988, ed. by J. Skilling, pp. 53–71, Dordrecht. Kluwer.
Google Scholar
Mackay. D. J. C. (1992a) Bayesian interpolation. Neural Computation 4 (3): 415–447.
Article Google Scholar
Mackay. D. J. C. (1992b) A practical Bayesian framework for backpropagation networks. Neural G’omputation 4 (3): 448–472.
Google Scholar
Mackay. D. J. C. (1992c) The evidence framework applied to classification networks. Neural Computation 4 (5): 698–714.
Article Google Scholar
Mackay. D. J. C. (1994) Bayesian non-linear modelling for the 1993 energy prediction competition. In Maximum Entropy and Bayesian Methods, Santa Barbara 1993, ed. by G. Heidbreder, Dordrecht. Kluwer.
Google Scholar
Neal, R. M. (1993a) Bayesian learning via stochastic dynamics. In Advances in Neural Information Processing Systems 5, ed. by C. L. Giles, S. J. Hanson, and J. D. Cowan, pp. 475–482, San Mateo, California. Morgan Kaufmann.
Google Scholar
Neal, R. M. (1993b) Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG—TR-93–1, Dept. of Computer Science, University of Toronto.
Google Scholar
Reif, F. (1965) Fundamentals of statistical and thermal physics McGraw—Hill.
Google Scholar
Skilling. J. (1993) Bayesian numerical analysis. In Physics and Probability, ed. by W. T. Grandy, Jr. and P. Milonni, Cambridge. C.U.P.
Google Scholar
Strauss. C. E. M., Wolpert, D. H., and Wolf, D. R. (1993) Alpha, evidence, and the entropic prior. In Maximum Entropy and Bayesian Methods, Paris 1992, ed. by A. Mohammed-Djafari, Dordrecht. Kluwer.
Google Scholar
Thodberg, H. H. (1993) Ace of Bayes: application of neural networks with pruning. Technical Report 1132 E, Danish meat research institute.
Google Scholar
Wahba, G. (1975) A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. Numer. Math 24: 383–393.
Article MathSciNet MATH Google Scholar
Weir, N. (1991) Applications of maxmimum entropy techniques to HST data. In Proceedings of the ESO/ST—ECF Data Analysis Workshop, April 1991.
Google Scholar
Wolpert. D. H. (1993) On the use of evidence in neural networks. In Advances in Neural Information Processing Systems 5, ed. by C. L. Giles, S. J. Hanson, and J. D. Cowan, pp. 539–546, San Mateo, California. Morgan Kaufmann.
Google Scholar

Download references

Author information

Authors and Affiliations

Cavendish Laboratory, Cambridge, CB3 0HE, UK
David J. C. MacKay

Authors

David J. C. MacKay
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, University of California, Santa Barbara, Santa Barbara, California, USA
Glenn R. Heidbreder

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

MacKay, D.J.C. (1996). Hyperparameters: Optimize, or Integrate Out?. In: Heidbreder, G.R. (eds) Maximum Entropy and Bayesian Methods. Fundamental Theories of Physics, vol 62. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-8729-7_2

Download citation

DOI: https://doi.org/10.1007/978-94-015-8729-7_2
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-4407-5
Online ISBN: 978-94-015-8729-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics