Abstract
Bayesian networks are a concise graphical formalism for describing probabilistic models. We have provided a brief tutorial of methods for learning and inference in dynamic Bayesian networks. In many of the interesting models, beyond the simple linear dynamical system or hidden Markov model, the calculations required for inference are intractable. Two different approaches for handling this intractability are Monte Carlo methods such as Gibbs sampling, and variational methods. An especially promising variational approach is based on exploiting tractable substructures in the Bayesian network.
Preview
Unable to display preview. Download preview PDF.
References
B. D. O. Anderson and J. B. Moore. Optimal Filtering. Prentice-Hall, Englewood Cliffs, NJ, 1979.
P. Baldi, Y. Chauvin, T. Hunkapiller, and M.A. McClure. Hidden Markov models of biological primary sequence information. Proc. Nat. Acad. Sci. (USA), 91(3):1059–1063, 1994.
L.E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41:164–171, 1970.
Y. Bengio and P. Frasconi. An input-output HMM architecture. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 427–434. MIT Press, Cambridge, MA, 1995.
J. Besag. Spatial interaction and the statistical analysis of lattice systems. J. Royal Stat. Soc. B, 36:192–326, 1974.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, Belmont, CA, 1984.
T. W. Cacciatore and S. J. Nowlan. Mixtures of controllers for jump linear and non-linear plants. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6, pages 719–726. Morgan Kaufmann Publishers, San Francisco, CA, 1994.
C. K. Carter and R. Kohn. Markov chain Monte Carlo in conditionally Gaussian state space models. Australian Graduate School of Management, University of New South Wales, 1996.
T. Dean and K. Kanazawa. A model for reasoning about persitence and causation. Computational Intelligence, 5(3):142–150, 1989.
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society Series B, 39:1–38, 1977.
V. Digalakis, J. R. Rohlicek, and M. Ostendorf. ML estimation of a Stochastic Linear System with the EM Algorithm and its Application to Speech Recognition. IEEE Transactions on Speech and Audio Processing, 1(4):431–442, 1993.
J.H. Friedman. Multivariate adaptive regression splines. The Annals of Statistics, 19:1–141, 1991.
S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721–741, 1984.
Z. Ghahramani. Factorial learning and the EM algorithm. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 617–624. MIT Press, Cambridge, MA, 1995.
Z. Ghahramani and G. E. Hinton. Parameter estimation for linear dynamical systems. Technical Report CRG-TR-96-2 [ftp://ftp.cs.toronto.edu/pub/zoubin/tr-96-2.ps.gz], Department of Computer Science, University of Toronto, 1996.
Z. Ghahramani and G. E. Hinton. Switching state-space models. Technical Report CRG-TR-96-3 [ftp://ftp.cs.toronto.edu/pub/ zoubin/switch.ps.gz], Department of Computer Science, University of Toronto, 1996.
Z. Ghahramani and M. I. Jordan. Factorial hidden Markov models. Machine Learning, 1997.
G.C. Goodwin and K.S. Sin. Adaptive filtering prediction and control. Prentice-Hall, 1984.
D. Heckerman. A tutorial on learning with Bayesian networks. Technical Report MSR-TR-95-06 [ftp://ftp.research.microsoft.com/pub/tr/TR-95-06.PS], Microsoft Research, 1996.
G. E. Hinton and R. S. Zemel. Autoencoders, minimum description length, and Helmholtz free energy. In J.D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6. Morgan Kaufmann Publishers, San Francisco, CA, 1994.
T. S. Jaakkola. Variational methods for Inference and estimation in graphical models. Technical Report Ph.D. Thesis, Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 1997.
R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixture of local experts. Neural Computation, 3:79–87, 1991.
E. T. Jaynes. Probability Theory: The Logic of Science. 1995.
F. V. Jensen. Introduction to Bayesian Networks. Springer-Verlag, New York, 1996.
F. V. Jensen, S. L. Lauritzen, and K. G. Olesen. Bayesian updating in recursive graphical models by local computations. Computational Statistics Quarterly, 4:269–282, 1990.
M. I. Jordan, Z. Ghahramani, and L. K. Saul. Hidden Markov decision trees. In M.C. Mozer, M.I Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9. MIT Press, Cambridge, MA, 1997.
M. I. Jordan and R.A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181–214, 1994.
B. H. Juang and L. R. Rabiner. Hidden Markov models for speech recognition. Technometrics, 33:251–272, 1991.
R. E. Kaiman and R. S. Bucy. New results in linear filtering and prediction. Journal of Basic Engineering (ASME), 83D:95–108, 1961.
K. Kanazawa, D. Koller, and S. J. Russell. Stochastic simulation algorithms for dynamic probabilistic networks. In P. Besnard and S. Hanks, editors, Uncertainty in Artificial Intelligence. Proceedings of the Eleventh Conference., pages 346–351. Morgan Kaufmann Publishers, San Francisco, CA, 1995.
J. H. Kim and J. Peal. A computational model for causal and diagnostic reasoning in inference systems. In Proceedings of the 8th International Joint Conference on Artificial Intelligence, pages 190–193. 1983.
A. Krogh, M. Brown, I. S. Mian, K. Sjölander, and D. Haussler. Hidden Markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology, 235:1501–1531, 1994.
S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. J. Royal Statistical Society B, pages 157–224, 1988.
L. Ljung and T. Söderström. Theory and Practice of Recursive Identification. MIT Press, Cambridge, MA, 1983.
D. J. C. MacKay. Probable networks and plausible predictions—a review of practical Bayesian methods for supervised neural networks. Network: Computation in Neural Systems, 6:469–505, 1995.
M. Meila and M. I. Jordan. Learning fine motion by Markov mixtures of experts. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8. MIT Press, 1996.
R. M. Neal. Probabilistic inference using Markov chain monte carlo methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto, 1993.
R. M. Neal. Bayesian Learning for Neural Networks. Springer-Verlag, New York, 1996.
R. M. Neal and G. E. Hinton. A new view of the EM algorithm that justifies incremental and other variants. Technical report, Department of Computer Science, University of Toronto, 1993.
G. Parisi. Statistical Field Theory. Addison-Wesley, Redwood City, CA, 1988.
J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA, 1988.
L. R. Rabiner and B. H. Juang. An Introduction to hidden Markov models. IEEE Acoustics, Speech & Signal Processing Magazine, 3:4–16, 1986.
H. E. Rauch. Solutions to the linear smoothing problem. IEEE Transactions on Automatic Control, 8:371–372, 1963.
R. Rockafellar. Convex Analysis. Princeton University Press, 1970.
L. K. Saul and M. I. Jordan. Mixed memory Markov models. In D. Madigan and P. Smyth, editors, Proceedings of the 1997 Conference on Artificial Intelligence and Statistics. Ft. Lauderdale, FL, 1997.
L.K. Saul and M. I. Jordan. Exploiting tractable substructures in Intractable networks. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8. MIT Press, 1996.
R. H. Shumway and D. S. Stoffer. An approach to time series smoothing and forecasting using the EM algorithm. J. Time Series Analysis, 3(4):253–264, 1982.
P. Smyth. Hidden Markov models for fault detection in dynamic systems. Pattern Recognition, 27(1):149–164, 1994.
P. Smyth, D. Heckerman, and M. I. Jordan. Probabilistic independence networks for hidden Markov probability models. Neural Computation, 9:227–269, 1997.
M. A. Tanner and W. H. Wong. The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82:528–550, 1987.
A. J. Viterbi. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Informat. Theory, IT-13:260–269, 1967.
R. S. Zemel. A minimum description length framework for unsupervised learning. Ph.D. Thesis, Dept. of Computer Science, University of Toronto, Toronto, Canada, 1993.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Ghahramani, Z. (1998). Learning dynamic Bayesian networks. In: Giles, C.L., Gori, M. (eds) Adaptive Processing of Sequences and Data Structures. NN 1997. Lecture Notes in Computer Science, vol 1387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0053999
Download citation
DOI: https://doi.org/10.1007/BFb0053999
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64341-8
Online ISBN: 978-3-540-69752-7
eBook Packages: Springer Book Archive