Learning dynamic Bayesian networks

  • Zoubin Ghahramani
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1387)


Bayesian networks are a concise graphical formalism for describing probabilistic models. We have provided a brief tutorial of methods for learning and inference in dynamic Bayesian networks. In many of the interesting models, beyond the simple linear dynamical system or hidden Markov model, the calculations required for inference are intractable. Two different approaches for handling this intractability are Monte Carlo methods such as Gibbs sampling, and variational methods. An especially promising variational approach is based on exploiting tractable substructures in the Bayesian network.


Hide Markov Model Posterior Distribution Bayesian Network Hide Variable Hide State 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    B. D. O. Anderson and J. B. Moore. Optimal Filtering. Prentice-Hall, Englewood Cliffs, NJ, 1979.zbMATHGoogle Scholar
  2. 2.
    P. Baldi, Y. Chauvin, T. Hunkapiller, and M.A. McClure. Hidden Markov models of biological primary sequence information. Proc. Nat. Acad. Sci. (USA), 91(3):1059–1063, 1994.CrossRefGoogle Scholar
  3. 3.
    L.E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41:164–171, 1970.MathSciNetzbMATHGoogle Scholar
  4. 4.
    Y. Bengio and P. Frasconi. An input-output HMM architecture. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 427–434. MIT Press, Cambridge, MA, 1995.Google Scholar
  5. 5.
    J. Besag. Spatial interaction and the statistical analysis of lattice systems. J. Royal Stat. Soc. B, 36:192–326, 1974.zbMATHMathSciNetGoogle Scholar
  6. 6.
    L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, Belmont, CA, 1984.zbMATHGoogle Scholar
  7. 7.
    T. W. Cacciatore and S. J. Nowlan. Mixtures of controllers for jump linear and non-linear plants. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6, pages 719–726. Morgan Kaufmann Publishers, San Francisco, CA, 1994.Google Scholar
  8. 8.
    C. K. Carter and R. Kohn. Markov chain Monte Carlo in conditionally Gaussian state space models. Australian Graduate School of Management, University of New South Wales, 1996.Google Scholar
  9. 9.
    T. Dean and K. Kanazawa. A model for reasoning about persitence and causation. Computational Intelligence, 5(3):142–150, 1989.Google Scholar
  10. 10.
    A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society Series B, 39:1–38, 1977.MathSciNetzbMATHGoogle Scholar
  11. 11.
    V. Digalakis, J. R. Rohlicek, and M. Ostendorf. ML estimation of a Stochastic Linear System with the EM Algorithm and its Application to Speech Recognition. IEEE Transactions on Speech and Audio Processing, 1(4):431–442, 1993.CrossRefGoogle Scholar
  12. 12.
    J.H. Friedman. Multivariate adaptive regression splines. The Annals of Statistics, 19:1–141, 1991.zbMATHMathSciNetGoogle Scholar
  13. 13.
    S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721–741, 1984.CrossRefzbMATHGoogle Scholar
  14. 14.
    Z. Ghahramani. Factorial learning and the EM algorithm. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 617–624. MIT Press, Cambridge, MA, 1995.Google Scholar
  15. 15.
    Z. Ghahramani and G. E. Hinton. Parameter estimation for linear dynamical systems. Technical Report CRG-TR-96-2 [], Department of Computer Science, University of Toronto, 1996.Google Scholar
  16. 16.
    Z. Ghahramani and G. E. Hinton. Switching state-space models. Technical Report CRG-TR-96-3 [ zoubin/], Department of Computer Science, University of Toronto, 1996.Google Scholar
  17. 17.
    Z. Ghahramani and M. I. Jordan. Factorial hidden Markov models. Machine Learning, 1997.Google Scholar
  18. 18.
    G.C. Goodwin and K.S. Sin. Adaptive filtering prediction and control. Prentice-Hall, 1984.Google Scholar
  19. 19.
    D. Heckerman. A tutorial on learning with Bayesian networks. Technical Report MSR-TR-95-06 [], Microsoft Research, 1996.Google Scholar
  20. 20.
    G. E. Hinton and R. S. Zemel. Autoencoders, minimum description length, and Helmholtz free energy. In J.D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6. Morgan Kaufmann Publishers, San Francisco, CA, 1994.Google Scholar
  21. 21.
    T. S. Jaakkola. Variational methods for Inference and estimation in graphical models. Technical Report Ph.D. Thesis, Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 1997.Google Scholar
  22. 22.
    R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixture of local experts. Neural Computation, 3:79–87, 1991.Google Scholar
  23. 23.
    E. T. Jaynes. Probability Theory: The Logic of Science. 1995.Google Scholar
  24. 24.
    F. V. Jensen. Introduction to Bayesian Networks. Springer-Verlag, New York, 1996.Google Scholar
  25. 25.
    F. V. Jensen, S. L. Lauritzen, and K. G. Olesen. Bayesian updating in recursive graphical models by local computations. Computational Statistics Quarterly, 4:269–282, 1990.MathSciNetGoogle Scholar
  26. 26.
    M. I. Jordan, Z. Ghahramani, and L. K. Saul. Hidden Markov decision trees. In M.C. Mozer, M.I Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9. MIT Press, Cambridge, MA, 1997.Google Scholar
  27. 27.
    M. I. Jordan and R.A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181–214, 1994.Google Scholar
  28. 28.
    B. H. Juang and L. R. Rabiner. Hidden Markov models for speech recognition. Technometrics, 33:251–272, 1991.CrossRefMathSciNetzbMATHGoogle Scholar
  29. 29.
    R. E. Kaiman and R. S. Bucy. New results in linear filtering and prediction. Journal of Basic Engineering (ASME), 83D:95–108, 1961.Google Scholar
  30. 30.
    K. Kanazawa, D. Koller, and S. J. Russell. Stochastic simulation algorithms for dynamic probabilistic networks. In P. Besnard and S. Hanks, editors, Uncertainty in Artificial Intelligence. Proceedings of the Eleventh Conference., pages 346–351. Morgan Kaufmann Publishers, San Francisco, CA, 1995.Google Scholar
  31. 31.
    J. H. Kim and J. Peal. A computational model for causal and diagnostic reasoning in inference systems. In Proceedings of the 8th International Joint Conference on Artificial Intelligence, pages 190–193. 1983.Google Scholar
  32. 32.
    A. Krogh, M. Brown, I. S. Mian, K. Sjölander, and D. Haussler. Hidden Markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology, 235:1501–1531, 1994.CrossRefGoogle Scholar
  33. 33.
    S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. J. Royal Statistical Society B, pages 157–224, 1988.MathSciNetGoogle Scholar
  34. 34.
    L. Ljung and T. Söderström. Theory and Practice of Recursive Identification. MIT Press, Cambridge, MA, 1983.zbMATHGoogle Scholar
  35. 35.
    D. J. C. MacKay. Probable networks and plausible predictions—a review of practical Bayesian methods for supervised neural networks. Network: Computation in Neural Systems, 6:469–505, 1995.zbMATHCrossRefGoogle Scholar
  36. 36.
    M. Meila and M. I. Jordan. Learning fine motion by Markov mixtures of experts. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8. MIT Press, 1996.Google Scholar
  37. 37.
    R. M. Neal. Probabilistic inference using Markov chain monte carlo methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto, 1993.Google Scholar
  38. 38.
    R. M. Neal. Bayesian Learning for Neural Networks. Springer-Verlag, New York, 1996.zbMATHGoogle Scholar
  39. 39.
    R. M. Neal and G. E. Hinton. A new view of the EM algorithm that justifies incremental and other variants. Technical report, Department of Computer Science, University of Toronto, 1993.Google Scholar
  40. 40.
    G. Parisi. Statistical Field Theory. Addison-Wesley, Redwood City, CA, 1988.zbMATHGoogle Scholar
  41. 41.
    J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA, 1988.Google Scholar
  42. 42.
    L. R. Rabiner and B. H. Juang. An Introduction to hidden Markov models. IEEE Acoustics, Speech & Signal Processing Magazine, 3:4–16, 1986.Google Scholar
  43. 43.
    H. E. Rauch. Solutions to the linear smoothing problem. IEEE Transactions on Automatic Control, 8:371–372, 1963.CrossRefGoogle Scholar
  44. 44.
    R. Rockafellar. Convex Analysis. Princeton University Press, 1970.Google Scholar
  45. 45.
    L. K. Saul and M. I. Jordan. Mixed memory Markov models. In D. Madigan and P. Smyth, editors, Proceedings of the 1997 Conference on Artificial Intelligence and Statistics. Ft. Lauderdale, FL, 1997.Google Scholar
  46. 46.
    L.K. Saul and M. I. Jordan. Exploiting tractable substructures in Intractable networks. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8. MIT Press, 1996.Google Scholar
  47. 47.
    R. H. Shumway and D. S. Stoffer. An approach to time series smoothing and forecasting using the EM algorithm. J. Time Series Analysis, 3(4):253–264, 1982.zbMATHGoogle Scholar
  48. 48.
    P. Smyth. Hidden Markov models for fault detection in dynamic systems. Pattern Recognition, 27(1):149–164, 1994.CrossRefGoogle Scholar
  49. 49.
    P. Smyth, D. Heckerman, and M. I. Jordan. Probabilistic independence networks for hidden Markov probability models. Neural Computation, 9:227–269, 1997.CrossRefzbMATHGoogle Scholar
  50. 50.
    M. A. Tanner and W. H. Wong. The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82:528–550, 1987.CrossRefMathSciNetzbMATHGoogle Scholar
  51. 51.
    A. J. Viterbi. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Informat. Theory, IT-13:260–269, 1967.CrossRefGoogle Scholar
  52. 52.
    R. S. Zemel. A minimum description length framework for unsupervised learning. Ph.D. Thesis, Dept. of Computer Science, University of Toronto, Toronto, Canada, 1993.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Zoubin Ghahramani
    • 1
  1. 1.Department of Computer ScienceUniversity of TorontoTorontoCanada

Personalised recommendations