Abstract
We consider a logistic regression model with a Gaussian prior distribution over the parameters. We show that an accurate variational transformation can be used to obtain a closed form approximation to the posterior distribution of the parameters thereby yielding an approximate posterior predictive model. This approach is readily extended to binary graphical model with complete observations. For graphical models with incomplete observations we utilize an additional variational transformation and again obtain a closed form approximation to the posterior. Finally, we show that the dual of the regression problem gives a latent variable density model, the variational formulation of which leads to exactly solvable EM updates.
Similar content being viewed by others
References
Bernardo J. and Smith A. 1994. Bayesian Theory. New York, Wiley.
Everitt B. 1984. An Introduction to Latent Variable Models. Cambridge University Press.
Gelman A. 1995. Bayesian Data Analysis. Boca Raton, FL, CRC Press.
Gilks W., Richardson, S., and Spiegelhalter D. (1996). Markov Chain Monte Carlo in Practice. London, Chapman and Hall.
Heckerman D., Geiger D., and Chickering D. 1995. Learning Bayesian networks: the combination of knowledge and statistical data. Machine Learning 20: 197–244.
Hinton G. and van Camp D. 1993. Keeping neural networks simple by minimizing the description length of the weights. In: Proceedings of the 6th Annual Workshop on Computational Learning Theory. New York, ACM Press.
Jordan M., Ghahramani Z., Jaakkola T., and Saul L. 1999. An introduction to variational methods in graphical models. In: Jordan M.I. (Ed.), Learning in Graphical Models. Cambridge, MA, MIT Press.
MacKay D. 1997. Ensemble learning for hidden Markov models. Unpublished manuscript. Department of Physics, University of Cambridge. Available on the web at http://wol.ra.phy.cam.ac.uk/mackay.
McCullagh P. and Nelder J. 1983. Generalized Linear Models. London, Chapman and Hall.
Nadal J-P. and Parga N. 1994. Duality between learning machines: A bridge between supervised and unsupervised learning. Neural Computation 6(3): 491–508.
Neal R. 1992. Connectionist learning of belief networks. Artificial Intelligence 56: 71–113.
Neal R. 1993. Probabilistic inference using Markov chain Monte Carlo methods. Technical report CRG-TR-93-1, University of Toronto.
Parisi G. 1988. Statistical field theory. Redwood City, CA, Addison-Wesley.
Rockafellar R. 1972. Convex Analysis. Princeton University Press.
Rustagi J. 1976. Variational Methods in Statistics. NewYork, Academic Press.
Saul L., Jaakkola T., and Jordan M. 1996. Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research 4: 61–76.
Saul L. and Jordan M. 1996. Exploiting tractable substructures in intractable networks. In: Touretzky D.S., Mozer M.C., and Hasselmo M.E. (Eds.), Advances in Neural Information Processing Systems 8. Cambridge MA, MIT Press.
Spiegelhalter D. and Lauritzen S. 1990. Sequential updating of conditional probabilities on directed graphical structures. Networks 20: 579–605.
Thomas A., Spiegelhalter D., and Gilks W. 1992. BUGS: A program to perform Bayesian inference using Gibbs sampling. In: Bayesian Statistics 4. Clarendon Press.
Tipping M. 1999. Probabilistic visualisation of high-dimensional binary data. Advances in Neural Information Processing Systems 11.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Jaakkola, T.S., Jordan, M.I. Bayesian parameter estimation via variational methods . Statistics and Computing 10, 25–37 (2000). https://doi.org/10.1023/A:1008932416310
Issue Date:
DOI: https://doi.org/10.1023/A:1008932416310