Skip to main content

Learning dynamic Bayesian networks

  • Chapter
  • First Online:
Adaptive Processing of Sequences and Data Structures (NN 1997)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1387))

Included in the following conference series:

Abstract

Bayesian networks are a concise graphical formalism for describing probabilistic models. We have provided a brief tutorial of methods for learning and inference in dynamic Bayesian networks. In many of the interesting models, beyond the simple linear dynamical system or hidden Markov model, the calculations required for inference are intractable. Two different approaches for handling this intractability are Monte Carlo methods such as Gibbs sampling, and variational methods. An especially promising variational approach is based on exploiting tractable substructures in the Bayesian network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B. D. O. Anderson and J. B. Moore. Optimal Filtering. Prentice-Hall, Englewood Cliffs, NJ, 1979.

    MATH  Google Scholar 

  2. P. Baldi, Y. Chauvin, T. Hunkapiller, and M.A. McClure. Hidden Markov models of biological primary sequence information. Proc. Nat. Acad. Sci. (USA), 91(3):1059–1063, 1994.

    Article  Google Scholar 

  3. L.E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41:164–171, 1970.

    MathSciNet  MATH  Google Scholar 

  4. Y. Bengio and P. Frasconi. An input-output HMM architecture. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 427–434. MIT Press, Cambridge, MA, 1995.

    Google Scholar 

  5. J. Besag. Spatial interaction and the statistical analysis of lattice systems. J. Royal Stat. Soc. B, 36:192–326, 1974.

    MATH  MathSciNet  Google Scholar 

  6. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, Belmont, CA, 1984.

    MATH  Google Scholar 

  7. T. W. Cacciatore and S. J. Nowlan. Mixtures of controllers for jump linear and non-linear plants. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6, pages 719–726. Morgan Kaufmann Publishers, San Francisco, CA, 1994.

    Google Scholar 

  8. C. K. Carter and R. Kohn. Markov chain Monte Carlo in conditionally Gaussian state space models. Australian Graduate School of Management, University of New South Wales, 1996.

    Google Scholar 

  9. T. Dean and K. Kanazawa. A model for reasoning about persitence and causation. Computational Intelligence, 5(3):142–150, 1989.

    Google Scholar 

  10. A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society Series B, 39:1–38, 1977.

    MathSciNet  MATH  Google Scholar 

  11. V. Digalakis, J. R. Rohlicek, and M. Ostendorf. ML estimation of a Stochastic Linear System with the EM Algorithm and its Application to Speech Recognition. IEEE Transactions on Speech and Audio Processing, 1(4):431–442, 1993.

    Article  Google Scholar 

  12. J.H. Friedman. Multivariate adaptive regression splines. The Annals of Statistics, 19:1–141, 1991.

    MATH  MathSciNet  Google Scholar 

  13. S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721–741, 1984.

    Article  MATH  Google Scholar 

  14. Z. Ghahramani. Factorial learning and the EM algorithm. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 617–624. MIT Press, Cambridge, MA, 1995.

    Google Scholar 

  15. Z. Ghahramani and G. E. Hinton. Parameter estimation for linear dynamical systems. Technical Report CRG-TR-96-2 [ftp://ftp.cs.toronto.edu/pub/zoubin/tr-96-2.ps.gz], Department of Computer Science, University of Toronto, 1996.

    Google Scholar 

  16. Z. Ghahramani and G. E. Hinton. Switching state-space models. Technical Report CRG-TR-96-3 [ftp://ftp.cs.toronto.edu/pub/ zoubin/switch.ps.gz], Department of Computer Science, University of Toronto, 1996.

    Google Scholar 

  17. Z. Ghahramani and M. I. Jordan. Factorial hidden Markov models. Machine Learning, 1997.

    Google Scholar 

  18. G.C. Goodwin and K.S. Sin. Adaptive filtering prediction and control. Prentice-Hall, 1984.

    Google Scholar 

  19. D. Heckerman. A tutorial on learning with Bayesian networks. Technical Report MSR-TR-95-06 [ftp://ftp.research.microsoft.com/pub/tr/TR-95-06.PS], Microsoft Research, 1996.

    Google Scholar 

  20. G. E. Hinton and R. S. Zemel. Autoencoders, minimum description length, and Helmholtz free energy. In J.D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6. Morgan Kaufmann Publishers, San Francisco, CA, 1994.

    Google Scholar 

  21. T. S. Jaakkola. Variational methods for Inference and estimation in graphical models. Technical Report Ph.D. Thesis, Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 1997.

    Google Scholar 

  22. R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixture of local experts. Neural Computation, 3:79–87, 1991.

    Google Scholar 

  23. E. T. Jaynes. Probability Theory: The Logic of Science. 1995.

    Google Scholar 

  24. F. V. Jensen. Introduction to Bayesian Networks. Springer-Verlag, New York, 1996.

    Google Scholar 

  25. F. V. Jensen, S. L. Lauritzen, and K. G. Olesen. Bayesian updating in recursive graphical models by local computations. Computational Statistics Quarterly, 4:269–282, 1990.

    MathSciNet  Google Scholar 

  26. M. I. Jordan, Z. Ghahramani, and L. K. Saul. Hidden Markov decision trees. In M.C. Mozer, M.I Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9. MIT Press, Cambridge, MA, 1997.

    Google Scholar 

  27. M. I. Jordan and R.A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181–214, 1994.

    Google Scholar 

  28. B. H. Juang and L. R. Rabiner. Hidden Markov models for speech recognition. Technometrics, 33:251–272, 1991.

    Article  MathSciNet  MATH  Google Scholar 

  29. R. E. Kaiman and R. S. Bucy. New results in linear filtering and prediction. Journal of Basic Engineering (ASME), 83D:95–108, 1961.

    Google Scholar 

  30. K. Kanazawa, D. Koller, and S. J. Russell. Stochastic simulation algorithms for dynamic probabilistic networks. In P. Besnard and S. Hanks, editors, Uncertainty in Artificial Intelligence. Proceedings of the Eleventh Conference., pages 346–351. Morgan Kaufmann Publishers, San Francisco, CA, 1995.

    Google Scholar 

  31. J. H. Kim and J. Peal. A computational model for causal and diagnostic reasoning in inference systems. In Proceedings of the 8th International Joint Conference on Artificial Intelligence, pages 190–193. 1983.

    Google Scholar 

  32. A. Krogh, M. Brown, I. S. Mian, K. Sjölander, and D. Haussler. Hidden Markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology, 235:1501–1531, 1994.

    Article  Google Scholar 

  33. S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. J. Royal Statistical Society B, pages 157–224, 1988.

    MathSciNet  Google Scholar 

  34. L. Ljung and T. Söderström. Theory and Practice of Recursive Identification. MIT Press, Cambridge, MA, 1983.

    MATH  Google Scholar 

  35. D. J. C. MacKay. Probable networks and plausible predictions—a review of practical Bayesian methods for supervised neural networks. Network: Computation in Neural Systems, 6:469–505, 1995.

    Article  MATH  Google Scholar 

  36. M. Meila and M. I. Jordan. Learning fine motion by Markov mixtures of experts. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8. MIT Press, 1996.

    Google Scholar 

  37. R. M. Neal. Probabilistic inference using Markov chain monte carlo methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto, 1993.

    Google Scholar 

  38. R. M. Neal. Bayesian Learning for Neural Networks. Springer-Verlag, New York, 1996.

    MATH  Google Scholar 

  39. R. M. Neal and G. E. Hinton. A new view of the EM algorithm that justifies incremental and other variants. Technical report, Department of Computer Science, University of Toronto, 1993.

    Google Scholar 

  40. G. Parisi. Statistical Field Theory. Addison-Wesley, Redwood City, CA, 1988.

    MATH  Google Scholar 

  41. J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA, 1988.

    Google Scholar 

  42. L. R. Rabiner and B. H. Juang. An Introduction to hidden Markov models. IEEE Acoustics, Speech & Signal Processing Magazine, 3:4–16, 1986.

    Google Scholar 

  43. H. E. Rauch. Solutions to the linear smoothing problem. IEEE Transactions on Automatic Control, 8:371–372, 1963.

    Article  Google Scholar 

  44. R. Rockafellar. Convex Analysis. Princeton University Press, 1970.

    Google Scholar 

  45. L. K. Saul and M. I. Jordan. Mixed memory Markov models. In D. Madigan and P. Smyth, editors, Proceedings of the 1997 Conference on Artificial Intelligence and Statistics. Ft. Lauderdale, FL, 1997.

    Google Scholar 

  46. L.K. Saul and M. I. Jordan. Exploiting tractable substructures in Intractable networks. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8. MIT Press, 1996.

    Google Scholar 

  47. R. H. Shumway and D. S. Stoffer. An approach to time series smoothing and forecasting using the EM algorithm. J. Time Series Analysis, 3(4):253–264, 1982.

    MATH  Google Scholar 

  48. P. Smyth. Hidden Markov models for fault detection in dynamic systems. Pattern Recognition, 27(1):149–164, 1994.

    Article  Google Scholar 

  49. P. Smyth, D. Heckerman, and M. I. Jordan. Probabilistic independence networks for hidden Markov probability models. Neural Computation, 9:227–269, 1997.

    Article  MATH  Google Scholar 

  50. M. A. Tanner and W. H. Wong. The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82:528–550, 1987.

    Article  MathSciNet  MATH  Google Scholar 

  51. A. J. Viterbi. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Informat. Theory, IT-13:260–269, 1967.

    Article  Google Scholar 

  52. R. S. Zemel. A minimum description length framework for unsupervised learning. Ph.D. Thesis, Dept. of Computer Science, University of Toronto, Toronto, Canada, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

C. Lee Giles Marco Gori

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ghahramani, Z. (1998). Learning dynamic Bayesian networks. In: Giles, C.L., Gori, M. (eds) Adaptive Processing of Sequences and Data Structures. NN 1997. Lecture Notes in Computer Science, vol 1387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0053999

Download citation

  • DOI: https://doi.org/10.1007/BFb0053999

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64341-8

  • Online ISBN: 978-3-540-69752-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics