Learning dynamic Bayesian networks

Ghahramani, Zoubin

doi:10.1007/BFb0053999

Zoubin Ghahramani¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1387))

Included in the following conference series:

International School on Neural Networks, Initiated by IIASS and EMFCSC

2191 Accesses
190 Citations
6 Altmetric

Abstract

Bayesian networks are a concise graphical formalism for describing probabilistic models. We have provided a brief tutorial of methods for learning and inference in dynamic Bayesian networks. In many of the interesting models, beyond the simple linear dynamical system or hidden Markov model, the calculations required for inference are intractable. Two different approaches for handling this intractability are Monte Carlo methods such as Gibbs sampling, and variational methods. An especially promising variational approach is based on exploiting tractable substructures in the Bayesian network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

B. D. O. Anderson and J. B. Moore. Optimal Filtering. Prentice-Hall, Englewood Cliffs, NJ, 1979.
MATH Google Scholar
P. Baldi, Y. Chauvin, T. Hunkapiller, and M.A. McClure. Hidden Markov models of biological primary sequence information. Proc. Nat. Acad. Sci. (USA), 91(3):1059–1063, 1994.
Article Google Scholar
L.E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41:164–171, 1970.
MathSciNet MATH Google Scholar
Y. Bengio and P. Frasconi. An input-output HMM architecture. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 427–434. MIT Press, Cambridge, MA, 1995.
Google Scholar
J. Besag. Spatial interaction and the statistical analysis of lattice systems. J. Royal Stat. Soc. B, 36:192–326, 1974.
MATH MathSciNet Google Scholar
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, Belmont, CA, 1984.
MATH Google Scholar
T. W. Cacciatore and S. J. Nowlan. Mixtures of controllers for jump linear and non-linear plants. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6, pages 719–726. Morgan Kaufmann Publishers, San Francisco, CA, 1994.
Google Scholar
C. K. Carter and R. Kohn. Markov chain Monte Carlo in conditionally Gaussian state space models. Australian Graduate School of Management, University of New South Wales, 1996.
Google Scholar
T. Dean and K. Kanazawa. A model for reasoning about persitence and causation. Computational Intelligence, 5(3):142–150, 1989.
Google Scholar
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society Series B, 39:1–38, 1977.
MathSciNet MATH Google Scholar
V. Digalakis, J. R. Rohlicek, and M. Ostendorf. ML estimation of a Stochastic Linear System with the EM Algorithm and its Application to Speech Recognition. IEEE Transactions on Speech and Audio Processing, 1(4):431–442, 1993.
Article Google Scholar
J.H. Friedman. Multivariate adaptive regression splines. The Annals of Statistics, 19:1–141, 1991.
MATH MathSciNet Google Scholar
S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721–741, 1984.
Article MATH Google Scholar
Z. Ghahramani. Factorial learning and the EM algorithm. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 617–624. MIT Press, Cambridge, MA, 1995.
Google Scholar
Z. Ghahramani and G. E. Hinton. Parameter estimation for linear dynamical systems. Technical Report CRG-TR-96-2 [ftp://ftp.cs.toronto.edu/pub/zoubin/tr-96-2.ps.gz], Department of Computer Science, University of Toronto, 1996.
Google Scholar
Z. Ghahramani and G. E. Hinton. Switching state-space models. Technical Report CRG-TR-96-3 [ftp://ftp.cs.toronto.edu/pub/ zoubin/switch.ps.gz], Department of Computer Science, University of Toronto, 1996.
Google Scholar
Z. Ghahramani and M. I. Jordan. Factorial hidden Markov models. Machine Learning, 1997.
Google Scholar
G.C. Goodwin and K.S. Sin. Adaptive filtering prediction and control. Prentice-Hall, 1984.
Google Scholar
D. Heckerman. A tutorial on learning with Bayesian networks. Technical Report MSR-TR-95-06 [ftp://ftp.research.microsoft.com/pub/tr/TR-95-06.PS], Microsoft Research, 1996.
Google Scholar
G. E. Hinton and R. S. Zemel. Autoencoders, minimum description length, and Helmholtz free energy. In J.D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6. Morgan Kaufmann Publishers, San Francisco, CA, 1994.
Google Scholar
T. S. Jaakkola. Variational methods for Inference and estimation in graphical models. Technical Report Ph.D. Thesis, Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 1997.
Google Scholar
R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixture of local experts. Neural Computation, 3:79–87, 1991.
Google Scholar
E. T. Jaynes. Probability Theory: The Logic of Science. 1995.
Google Scholar
F. V. Jensen. Introduction to Bayesian Networks. Springer-Verlag, New York, 1996.
Google Scholar
F. V. Jensen, S. L. Lauritzen, and K. G. Olesen. Bayesian updating in recursive graphical models by local computations. Computational Statistics Quarterly, 4:269–282, 1990.
MathSciNet Google Scholar
M. I. Jordan, Z. Ghahramani, and L. K. Saul. Hidden Markov decision trees. In M.C. Mozer, M.I Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9. MIT Press, Cambridge, MA, 1997.
Google Scholar
M. I. Jordan and R.A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181–214, 1994.
Google Scholar
B. H. Juang and L. R. Rabiner. Hidden Markov models for speech recognition. Technometrics, 33:251–272, 1991.
Article MathSciNet MATH Google Scholar
R. E. Kaiman and R. S. Bucy. New results in linear filtering and prediction. Journal of Basic Engineering (ASME), 83D:95–108, 1961.
Google Scholar
K. Kanazawa, D. Koller, and S. J. Russell. Stochastic simulation algorithms for dynamic probabilistic networks. In P. Besnard and S. Hanks, editors, Uncertainty in Artificial Intelligence. Proceedings of the Eleventh Conference., pages 346–351. Morgan Kaufmann Publishers, San Francisco, CA, 1995.
Google Scholar
J. H. Kim and J. Peal. A computational model for causal and diagnostic reasoning in inference systems. In Proceedings of the 8th International Joint Conference on Artificial Intelligence, pages 190–193. 1983.
Google Scholar
A. Krogh, M. Brown, I. S. Mian, K. Sjölander, and D. Haussler. Hidden Markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology, 235:1501–1531, 1994.
Article Google Scholar
S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. J. Royal Statistical Society B, pages 157–224, 1988.
MathSciNet Google Scholar
L. Ljung and T. Söderström. Theory and Practice of Recursive Identification. MIT Press, Cambridge, MA, 1983.
MATH Google Scholar
D. J. C. MacKay. Probable networks and plausible predictions—a review of practical Bayesian methods for supervised neural networks. Network: Computation in Neural Systems, 6:469–505, 1995.
Article MATH Google Scholar
M. Meila and M. I. Jordan. Learning fine motion by Markov mixtures of experts. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8. MIT Press, 1996.
Google Scholar
R. M. Neal. Probabilistic inference using Markov chain monte carlo methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto, 1993.
Google Scholar
R. M. Neal. Bayesian Learning for Neural Networks. Springer-Verlag, New York, 1996.
MATH Google Scholar
R. M. Neal and G. E. Hinton. A new view of the EM algorithm that justifies incremental and other variants. Technical report, Department of Computer Science, University of Toronto, 1993.
Google Scholar
G. Parisi. Statistical Field Theory. Addison-Wesley, Redwood City, CA, 1988.
MATH Google Scholar
J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA, 1988.
Google Scholar
L. R. Rabiner and B. H. Juang. An Introduction to hidden Markov models. IEEE Acoustics, Speech & Signal Processing Magazine, 3:4–16, 1986.
Google Scholar
H. E. Rauch. Solutions to the linear smoothing problem. IEEE Transactions on Automatic Control, 8:371–372, 1963.
Article Google Scholar
R. Rockafellar. Convex Analysis. Princeton University Press, 1970.
Google Scholar
L. K. Saul and M. I. Jordan. Mixed memory Markov models. In D. Madigan and P. Smyth, editors, Proceedings of the 1997 Conference on Artificial Intelligence and Statistics. Ft. Lauderdale, FL, 1997.
Google Scholar
L.K. Saul and M. I. Jordan. Exploiting tractable substructures in Intractable networks. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8. MIT Press, 1996.
Google Scholar
R. H. Shumway and D. S. Stoffer. An approach to time series smoothing and forecasting using the EM algorithm. J. Time Series Analysis, 3(4):253–264, 1982.
MATH Google Scholar
P. Smyth. Hidden Markov models for fault detection in dynamic systems. Pattern Recognition, 27(1):149–164, 1994.
Article Google Scholar
P. Smyth, D. Heckerman, and M. I. Jordan. Probabilistic independence networks for hidden Markov probability models. Neural Computation, 9:227–269, 1997.
Article MATH Google Scholar
M. A. Tanner and W. H. Wong. The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82:528–550, 1987.
Article MathSciNet MATH Google Scholar
A. J. Viterbi. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Informat. Theory, IT-13:260–269, 1967.
Article Google Scholar
R. S. Zemel. A minimum description length framework for unsupervised learning. Ph.D. Thesis, Dept. of Computer Science, University of Toronto, Toronto, Canada, 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Toronto, M5S 3H5, Toronto, ON, Canada
Zoubin Ghahramani

Authors

Zoubin Ghahramani
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

C. Lee Giles Marco Gori

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ghahramani, Z. (1998). Learning dynamic Bayesian networks. In: Giles, C.L., Gori, M. (eds) Adaptive Processing of Sequences and Data Structures. NN 1997. Lecture Notes in Computer Science, vol 1387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0053999

Download citation

DOI: https://doi.org/10.1007/BFb0053999
Published: 25 May 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64341-8
Online ISBN: 978-3-540-69752-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics