Skip to main content

Hyperparameters: Optimize, or Integrate Out?

  • Chapter
Maximum Entropy and Bayesian Methods

Part of the book series: Fundamental Theories of Physics ((FTPH,volume 62))

Abstract

I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants. In the ‘evidence framework’ the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized hyperparameters are used to define a Gaussian approximation to the posterior distribution. In the alternative ‘MAP’ method, the true posterior probability is found by integrating over the hyperparameters. The true posterior is then maximized over the model parameters, and a Gaussian approximation is made. The similarities of the two approaches, and their relative merits, are discussed, and comparisons are made with the ideal hierarchical Bayesian solution.

In moderately ill-posed problems, integration over hyperparameters yields a probability distribution with a skew peak which causes significant biases to arise in the MAP method. In contrast, the evidence framework is shown to introduce negligible predictive error, under straightforward conditions.

General lessons are drawn concerning the distinctive properties of inference in many dimensions.

“Integrating over a nuisance parameter is very much like estimsting the parameter from the data, and then using that estimate in our equations.” G.L. Bretthorst

“This integration would be counter-productive as far as practical manipulation is concerned.” S.F. Gull

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Box, G. E. P., and TIAO, G. C. (1973) Bayesian inference in statistical analysis Addison—Wesley.

    Google Scholar 

  • Bretthorst, G. (1988) Bayesian spectrum analysis and parameter estimation. Springer.

    Google Scholar 

  • Bryan, R. (1990) Solving oversampled data problems by Maximum Entropy. In Maximum Entropy and Bayesian Methods, Dartmouth, U.S.A., 1989, ed. Bryan, R, pp. 221–232. Kluwer.

    Google Scholar 

  • Buntine, W., and Weigend, A. (1991) Bayesian back—propagation. Complex Systems 5: 603–643.

    MATH  Google Scholar 

  • Gull, S. F. (1988) Bayesian inductive inference and maximum entropy. In Maximum Entropy and Bayesian Methods in Science and Engineering, vol. I: Foundations, ed. by G. Erickson and C. Smith, pp. 53–74, Dordrecht. Kluwer.

    Google Scholar 

  • Gull, S. F. (1989) Developments in maximum entropy data analysis. In Maximum Entropy and Bayesian Methods, Cambridge 1988, ed. by J. Skilling, pp. 53–71, Dordrecht. Kluwer.

    Google Scholar 

  • Mackay. D. J. C. (1992a) Bayesian interpolation. Neural Computation 4 (3): 415–447.

    Article  Google Scholar 

  • Mackay. D. J. C. (1992b) A practical Bayesian framework for backpropagation networks. Neural G’omputation 4 (3): 448–472.

    Google Scholar 

  • Mackay. D. J. C. (1992c) The evidence framework applied to classification networks. Neural Computation 4 (5): 698–714.

    Article  Google Scholar 

  • Mackay. D. J. C. (1994) Bayesian non-linear modelling for the 1993 energy prediction competition. In Maximum Entropy and Bayesian Methods, Santa Barbara 1993, ed. by G. Heidbreder, Dordrecht. Kluwer.

    Google Scholar 

  • Neal, R. M. (1993a) Bayesian learning via stochastic dynamics. In Advances in Neural Information Processing Systems 5, ed. by C. L. Giles, S. J. Hanson, and J. D. Cowan, pp. 475–482, San Mateo, California. Morgan Kaufmann.

    Google Scholar 

  • Neal, R. M. (1993b) Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG—TR-93–1, Dept. of Computer Science, University of Toronto.

    Google Scholar 

  • Reif, F. (1965) Fundamentals of statistical and thermal physics McGraw—Hill.

    Google Scholar 

  • Skilling. J. (1993) Bayesian numerical analysis. In Physics and Probability, ed. by W. T. Grandy, Jr. and P. Milonni, Cambridge. C.U.P.

    Google Scholar 

  • Strauss. C. E. M., Wolpert, D. H., and Wolf, D. R. (1993) Alpha, evidence, and the entropic prior. In Maximum Entropy and Bayesian Methods, Paris 1992, ed. by A. Mohammed-Djafari, Dordrecht. Kluwer.

    Google Scholar 

  • Thodberg, H. H. (1993) Ace of Bayes: application of neural networks with pruning. Technical Report 1132 E, Danish meat research institute.

    Google Scholar 

  • Wahba, G. (1975) A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. Numer. Math 24: 383–393.

    Article  MathSciNet  MATH  Google Scholar 

  • Weir, N. (1991) Applications of maxmimum entropy techniques to HST data. In Proceedings of the ESO/ST—ECF Data Analysis Workshop, April 1991.

    Google Scholar 

  • Wolpert. D. H. (1993) On the use of evidence in neural networks. In Advances in Neural Information Processing Systems 5, ed. by C. L. Giles, S. J. Hanson, and J. D. Cowan, pp. 539–546, San Mateo, California. Morgan Kaufmann.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

MacKay, D.J.C. (1996). Hyperparameters: Optimize, or Integrate Out?. In: Heidbreder, G.R. (eds) Maximum Entropy and Bayesian Methods. Fundamental Theories of Physics, vol 62. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-8729-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-94-015-8729-7_2

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-4407-5

  • Online ISBN: 978-94-015-8729-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics