Skip to main content

An Introduction to Variational Methods for Graphical Models

  • Chapter
Learning in Graphical Models

Part of the book series: NATO ASI Series ((ASID,volume 89))

Abstract

This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models. We present a number of examples of graphical models, including the QMR-DT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Markov models, in which it is infeasible to run exact inference algorithms. We then introduce variational methods, showing how upper and lower bounds can be found for local probabilities, and discussing methods for extending these bounds to bounds on global probabilities of interest. Finally we return to the examples and demonstrate how variational algorithms can be formulated in each case.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bathe, K. J. (1996). Finite Element Procedures. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Baum, L.E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occur-ring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41, 164–171.

    Article  MathSciNet  MATH  Google Scholar 

  • Cover, T., & Thomas, J. (1991). Elements of Information Theory. New York: John Wiley.

    Book  MATH  Google Scholar 

  • Cover, T., & Thomas, J. (1991). Elements of Information Theory. New York: John Wiley.

    Book  MATH  Google Scholar 

  • Dagum, P., & Luby, M. (1993). Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artificial Intelligence, 60, 141–153.

    Article  MathSciNet  MATH  Google Scholar 

  • Dayan, P., Hinton, G. E., Neal, R., & Zemel, R. S. (1995). The Helmholtz Machine. Neural Computation, 7, 889–904.

    Article  Google Scholar 

  • Dean, T., & Kanazawa, K. (1989). A model for reasoning about causality and persistence. Computational Intelligence, 5, 142–150.

    Article  Google Scholar 

  • Dean, T., & Kanazawa, K. (1989). A model for reasoning about causality and persistence. Computational Intelligence, 5, 142–150.

    Article  Google Scholar 

  • Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum-likelihood from incom-plete data via the EM algorithm. Journal of the Royal Statistical Society, B39, 1–38.

    MathSciNet  Google Scholar 

  • Draper, D. L., & Hanks, S. (1994). Localized partial evaluation of belief networks. Un-certainty and Artificial Intelligence: Proceedings of the Tenth Conference. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Frey, B. Hinton, G. E., Dayan, P. (1996). Does the wake-sleep algorithm learn good density estimators? In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8. Cambridge, MA: MIT Press.

    Google Scholar 

  • Fung, R. & Favero, B. D. (1994). Backward simulation in Bayesian networks. Uncertainty and Artificial Intelligence: Proceedings of the Tenth Conference. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Galland, C. (1993). The limitations of deterministic Boltzmann machine learning. Network, 4, 355–379.

    Article  MATH  Google Scholar 

  • Ghahramani, Z., & Hinton, G. E. (1996). Switching state-space models. University of Toronto Technical Report CRG-TR-96–3, Department of Computer Science.

    Google Scholar 

  • Ghahramani, Z., & Jordan, M. I. (1997). Factorial Hidden Markov models. Machine Learning, 29, 245–273.

    Article  MATH  Google Scholar 

  • Gilks, W., Thomas, A., & Spiegelhalter, D. (1994). A language and a program for complex Bayesian modelling. The Statistician, 43, 169–178.

    Article  Google Scholar 

  • Gilks, W., Thomas, A., & Spiegelhalter, D. (1994). A language and a program for complex Bayesian modelling. The Statistician, 43, 169–178.

    Article  Google Scholar 

  • Henrion, M. (1991). Search-based methods to bound diagnostic probabilities in very large belief nets. Uncertainty and Artificial Intelligence: Proceedings of the Seventh Conference. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Hinton, G. E., & Sejnowski, T. (1986). Learning and relearning in Boltzmann machines. In D. E. Rumelhart & J. L. McClelland, (Eds.), Parallel distributed processing: Volume 1, Cambridge, МА: MIT Press.

    Google Scholar 

  • Hinton, G.E. & van Camp, D. (1993). Keeping neural networks simple by minimizing the description length of the weights. In Proceedings of the 6th Annual Workshop on Computational Learning Theory, pp 5–13. New York, NY: ACM Press.

    Google Scholar 

  • Hinton, G. E., Dayan, P., Frey, B., and Neal, R. M. (1995). The wake-sleep algorithm for unsupervised neural networks. Science, 268: 1158–1161.

    Article  Google Scholar 

  • Hinton, G. E., Dayan, P., Frey, B., and Neal, R. M. (1995). The wake-sleep algorithm for unsupervised neural networks. Science, 268: 1158–1161.

    Article  Google Scholar 

  • Horvitz, E. J., Suermondt, H. J., & Cooper, G.F. (1989). Bounded conditioning: Flexible inference for decisions under scarce resources. Conference on Uncertainty in Artificial Intelligence: Proceedings of the Fifth Conference. Mountain View, CA: Association for UAI.

    Google Scholar 

  • Jaakkola, T. S., & Jordan, M. I. (1996). Computing upper and lower bounds on likelihoods in intractable networks. Uncertainty and Artificial Intelligence: Proceedings of the Twelth Conference. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Jaakkola, T. S. (1997). Variational methods for inference and estimation in graphical models. Unpublished doctoral dissertation, Massachusetts Institute of Technology.

    Google Scholar 

  • Jaakkola, T. S., & Jordan, M. I. (1997a). Recursive algorithms for approximating probabilities in graphical models. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems 9. Cambridge, MA: MIT Press.

    Google Scholar 

  • Jaakkola, T. S., & Jordan, M. I. (1997b). Bayesian logistic regression: a variational approach. In D. Madigan & P. Smyth (Eds.), Proceedings of the 1997 Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL.

    Google Scholar 

  • Jaakkola, T. S., & Jordan. M. I. (1997c). Variational methods and the QMR-DT database. Submitted to: Journal of Artificial Intelligence Research.

    Google Scholar 

  • Jaakkola, T. S., & Jordan. M. I. (in press). Improving the mean field approximation via the use of mixture distributions. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer Academic Publishers.

    Google Scholar 

  • Jensen, C. S., Kong, A., & Kjærulff, U. (1995). Blocking-Gibbs sampling in very large probabilistic expert systems. International Journal of Human-Computer Studies, 42, 647–666.

    Article  Google Scholar 

  • Jensen, F. V., & Jensen, F. (1994). Optimal junction trees. Uncertainty and Artificial In-telligence: Proceedings of the Tenth Conference. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Jensen, F. V. (1996). An Introduction to Bayesian Networks. London: UCL Press.

    Google Scholar 

  • Jordan, M. I. (1994). A statistical approach to decision tree modeling. In M. Warmuth (Ed.), Proceedings of the Seventh Annual ACM Conference on Computational Learn-ing Theory. New York: ACM Press.

    Google Scholar 

  • Jordan, M. I., Ghahramani, Z., & Saul, L. K. (1997). Hidden Markov decision trees. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems 9. Cambridge, MA: MIT Press.

    Google Scholar 

  • Kanazawa, K., Koller, D., & Russell, S. (1995). Stochastic simulation algorithms for dynamic probabilistic networks. Uncertainty and Artificial Intelligence: Proceedings of the Eleventh Conference. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Kjærulff, U. (1990). Triangulation of graphs—algorithms giving small total state space. Research Report R-90–09, Department of Mathematics and Computer Science, Aalborg University, Denmark.

    Google Scholar 

  • Kjærulff, U. (1994). Reduction of computational complexity in Bayesian networks through removal of weak dependences. Uncertainty and Artificial Intelligence: Proceedings of the Tenth Conference. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Mackay, D.J.C. (1997a). Ensemble learning for hidden Markov models. Unpublished manuscript. Department of Physics, University of Cambridge.

    Google Scholar 

  • Mackay, D.J.C. (1997b). Comparison of approximate methods for handling hyperparameters. Submitted to Neural Computation.

    Google Scholar 

  • Mackay, D.J.C. (1997b). Introduction to Monte Carlo methods. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer Academic Publishers.

    Google Scholar 

  • McEliece, R.J., Mackay, D.J.C., & Cheng, J.-F. (1996) Turbo decoding as an instance of Pearl’s “belief propagation algorithm.” Submitted to: IEEE Journal on SelectedAreas in Communication.

    Google Scholar 

  • Merz, C. J., & Murphy, P. M. (1996). UCI repository of machine learning databases. http://www.ics.uci/~mlearn/MLRepository.html. Irvine, CA: University of California, Department of Information and Computer Science.

    Google Scholar 

  • Neal, R. (1992). Connectionist learning of belief networks, Artificial Intelligence, 56, 71–113.

    Article  MathSciNet  MATH  Google Scholar 

  • Neal, R. (1993). Probabilistic inference using Markov chain Monte Carlo methods. University of Toronto Technical Report CRG-TR-93–1, Department of Computer Science.

    Google Scholar 

  • Neal, R., & Hinton, G. E. (in press). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. I. Jordan (Ed.), Learning in Graphical Models. Norwell, MA: Kluwer Academic Publishers.

    Google Scholar 

  • Parisi, G. (1988). Statistical Field Theory. Redwood City, CA: Addison-Wesley.

    MATH  Google Scholar 

  • Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, San Mateo, CA: Morgan kaufmannn

    Google Scholar 

  • Peterson, C., & Anderson, J. R. (1987). A mean field theory learning algorithm for neural networks. Complex Systems, 1, 995–1019.

    MATH  Google Scholar 

  • Rockafellar, R. (1972). Convex Analysis. Princeton University Press.

    Google Scholar 

  • Rustagi, J. (1976). Variational Methods in Statistics. New York: Academic Press.

    MATH  Google Scholar 

  • Sakurai, J. (1985). Modern Quantum Mechanics. Redwood City, CA: Addison-Wesley.

    Google Scholar 

  • Saul, L. K., & Jordan, M. I. (1994). Learning in Boltzmann trees. Neural Computation, 6, 1173–1183.

    Article  Google Scholar 

  • Saul, L. k., Jaakkola, T. S., & Jordan, M. I. (1996). Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research, 4, 61–76.

    MATH  Google Scholar 

  • Saul, L. K., & Jordan, M. I. (1996). Exploiting tractable substructures in intractable networks. In D. S. Touretzky, M. C. Moser, & M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8. Cambridge, MA: MIT Press.

    Google Scholar 

  • Saul, L. K., & Jordan, M. I. (1996). Exploiting tractable substructures in intractable networks. In D. S. Touretzky, M. C. Moser, & M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8. Cambridge, MA: MIT Press.

    Google Scholar 

  • Seung, S. (1995). Annealed theories of learning. In J.-H Oh, C. Kwon, and S. Cho, (Eds.), Neural Networks: The Statistical Mechanics Perspectives. Singapore: World Scientific.

    Google Scholar 

  • Shachter, R. D., Andersen, S. K., & Szolovits, P. (1994). Global conditioning for probabilistic inference in belief networks. Uncertainty and Artificial Intelligence: Proceedings of the Tenth Conference. San Mateo, CA: Morgan kaufmann.

    Google Scholar 

  • Shenoy, P. P. (1992). Valuation-based systems for Bayesian decision analysis. Operations Research, 40, 463–484.

    Article  MathSciNet  MATH  Google Scholar 

  • Shwe, M. A., Middleton, B., Heckerman, D. E., Henrion, M., Horvitz, E. J., Lehmann, H. P., & Cooper, G. F. (1991). Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base. Meth. Inform. Med., 30, 241–255.

    Google Scholar 

  • Smyth, P., Heckerman, D., & Jordan, M. I. (1997). Probabilistic independence networks for hidden Markov probability models. Neural Computation, 9, 227–270.

    Article  MATH  Google Scholar 

  • Waterhouse, S., Mackay, D.J.C. & Robinson, T. (1996). Bayesian methods for mixtures of experts. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8. Cambridge, MA: MIT Press.

    Google Scholar 

  • Williams, C. K. I., & Hinton, G. E. (1991). Mean field networks that learn to discriminate temporally distorted strings. In Touretzky, D. S., Elman, J., Sejnowski, T., & Hinton, G. E., (Eds.), Proceedings of the 1990 Connectionist Models Summer School. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K. (1998). An Introduction to Variational Methods for Graphical Models. In: Jordan, M.I. (eds) Learning in Graphical Models. NATO ASI Series, vol 89. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-5014-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-94-011-5014-9_5

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-6104-9

  • Online ISBN: 978-94-011-5014-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics