Machine Learning

, Volume 37, Issue 1, pp 75–87 | Cite as

Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones

  • Lawrence K. Saul
  • Michael I. Jordan


We study Markov models whose state spaces arise from the Cartesian product of two or more discrete random variables. We show how to parameterize the transition matrices of these models as a convex combination—or mixture—of simpler dynamical models. The parameters in these models admit a simple probabilistic interpretation and can be fitted iteratively by an Expectation-Maximization (EM) procedure. We derive a set of generalized Baum-Welch updates for factorial hidden Markov models that make use of this parameterization. We also describe a simple iterative procedure for approximately computing the statistics of the hidden states. Throughout, we give examples where mixed memory models provide a useful representation of complex stochastic processes.

Markov models mixture models discrete time series 


  1. Baldi, P., & Chauvin, Y. (1996). Hybrid modeling, HMM/NN architectures, and protein applications. Neural Computation, 8, 1541–1565.Google Scholar
  2. Baum, L. (1972). An inequality and associated maximization technique in statistical estimation for probabilistic functions of a Markov process. In O. Shisha (Ed.), Inequalities (Vol. 3, pp. 1–8). New York: Academic Press.Google Scholar
  3. Bestavros, A., & Cunha, C. (1995). A prefetching protocol using client speculation for the WWW. (Technical Report TR–95–011). Boston, MA: Boston University, Department of Computer Science.Google Scholar
  4. Binder, J., Koller, D., Russell, S., & Kanazawa, K. (1997). Adaptive probabilistic networks with hidden variables. Machine Learning, 29, 213–244.Google Scholar
  5. Bourland, H., & Dupont, S. (1996). A new ASR approach based on independent processing and recombination of partial frequency bands. In H. Bunnell, & W. Idsardi (Eds.), Proceedings of the Fourth International Conference on Speech and Language Processing (pp. 426–429). Newcastle, DE: Citation Delaware.Google Scholar
  6. Bregler, C., & Omohundro, S. (1995). Nonlinear manifold learning for visual speech recognition. In E. Grimson (Ed.), Proceedings of the Fifth International Conference on Computer Vision (pp. 494–499). Los Alamitos, CA: IEEE Computer Society Press.Google Scholar
  7. Chen, S., & Goodman, J. (1996). An empirical study of smoothing techniques for language modeling. Proceedings of the Thirty Fourth Annual Meeting of the Association for Computational Linguistics (pp. 310–318). San Francisco, CA: Morgan Kaufmann.Google Scholar
  8. Cunha, C., Bestavros, A., & Crovella, M. (1995). Characteristics of WWW client-based traces. (Technical Report TR–95–010). Boston, MA: Boston University, Department of Computer Science.Google Scholar
  9. Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5(3), 142–150.Google Scholar
  10. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39, 1–38.Google Scholar
  11. Dirst, M., & Weigend, A. (1993). Baroque forecasting: on completing J. S. Bach 's last fugue. In A. Weigend, & N. Gershenfeld (Eds.), Time series prediction: Forecasting the future and understanding the past. Reading, MA: Addison-Wesley.Google Scholar
  12. Ghahramani, Z., & Jordan, M. (1997). Factorial hidden Markov models. Machine Learning, 29, 245–273.Google Scholar
  13. Haussler, D., Krogh, A., Mian, I., & Sjolander,K. (1993). Protein modeling using hidden Markov models: Analysis of globins. Proceedings of the Hawaii International Conference on System Sciences (Vol. 1, pp. 792–802). Los Alamitos, CA: IEEE Computer Society Press.Google Scholar
  14. MacDonald, I., & Zucchini,W. (1997). Hidden Markov and other models for discrete-valued time series. Chapman and Hall.Google Scholar
  15. Nadas, A. (1984). Estimation of probabilities in the language model of the IBM speech recognition system. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(4), 859–861.Google Scholar
  16. Ney, H., Essen, U., & Kneser, R. (1994). On structuring probabilistic dependences in stochastic language modeling. Computer Speech and Language, 8, 1–38.Google Scholar
  17. Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.Google Scholar
  18. Raftery, A. (1985). A model for high-order Markov chains. Journal of the Royal Statistical Society B, 47, 528–539.Google Scholar
  19. Ron, D., Singer, Y., & Tishby, N. (1996). The power of amnesia: Learning probabilistic automata with variable memory length. Machine Learning, 25, 117–150.Google Scholar
  20. Saul, L., & Jordan, M. (1996). Exploiting tractable substructures in intractable networks. In D. Touretzky, M. Mozer, & M. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 486–492). Cambridge, MA: MIT Press.Google Scholar
  21. Saul, L., & Pereira, F. (1997). Aggregate and mixed-order Markov models for statistical language processing. In C. Cardie, & R. Weischedel (Eds.), Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (pp. 81–89). Somerset, NJ: ACL Press.Google Scholar
  22. Williams, C., & Hinton, G. (1990) Mean field networks that learn to discriminate temporally distorted strings. In D. Touretzky, J. Elman, T. Sejnowski, & G. Hinton (Eds.), Connectionist Models: Proceedings of the 1990 Summer School (pp. 18–22). San Francisco, CA: Morgan Kaufmann.Google Scholar
  23. Zeevi, A., Meir, R., & Adler, R. (1997). Time series prediction using mixtures of experts. In M. Mozer, M. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems (Vol. 9, pp. 309–315). Cambridge, MA: MIT Press.Google Scholar

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • Lawrence K. Saul
    • 1
  • Michael I. Jordan
    • 2
  1. 1.AT&T LabsFlorham Park
  2. 2.University of CaliforniaBerkeley

Personalised recommendations