A Discrete Probabilistic Memory Model for Discovering Dependencies in Time

  • Sepp Hochreiter
  • Michael C. Mozer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2130)

Abstract

Many domains of machine learning involve discovering dependencies and structure over time. In the most complex of domains, long-term temporal dependencies are present. Neural network models such as lstm have been developed to deal with long-term dependencies, but the continuous nature of neural networks is not well suited to discrete symbol processing tasks. Further, the mathematical underpinnings of neural networks are unclear, and gradient descent learning of recurrent neural networks seems particularly susceptible to local optima. We introduce a novel architecture for discovering dependencies in time. The architecture is formed by combining two variants of a hidden Markov model (HMM) - the factorial HMM and the input-output HMM - and adding a further strong constraint that requires the model to behave as a latch-and-store memory (the same constraint exploited in lstm). This model, called an miofhmm, can learn structure that other variants of the hmm cannot, and can generalize better than lstm on test sequences that have different statistical properties (different lengths, different types of noise) than training sequences. However, the miofhmm is slower to train and is more susceptible to local optima than LSTM.

Keywords

Local Optimum Hide Markov Model Input Sequence Recurrent Neural Network Neural Information Processing System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    L. E. Baum. An inequality and associated maximization technique in statistical estimation for probabilistic functions of a Markov process. Inequalities, 3:1–8, 1972.Google Scholar
  2. 2.
    Y. Bengio and P. Frasconi. Diffusion of context and credit information in markovian models. Journal of Artificial Intelligence Research, 3:249–270, 1995.MATHGoogle Scholar
  3. 3.
    Y. Bengio and P. Frasconi. An input output HMM architecture. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 427–434. MIT Press, Cambridge MA, 1995.Google Scholar
  4. 4.
    Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. on Neural Networks, 5(2):157–166, 1994.CrossRefGoogle Scholar
  5. 5.
    Z. Ghahramani and M. I. Jordan. Factorial hidden markov models. Machine Learning. 29:245, 1997.Google Scholar
  6. 6.
    S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.CrossRefGoogle Scholar
  7. 7.
    S. Hochreiter and J. Schmidhuber. LSTM can solve hard long time lag problems. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 473–479. MIT Press, Cambridge MA, 1997.Google Scholar
  8. 8.
    M. C. Mozer. Induction of multiscale temporal structure. In J. E. Moody, S. J. Hanson, and R. P. Lippman, editors, Advances in Neural Information Processing Systems 4, pages 275–282. San Mateo, CA: Morgan Kaufmann, 1992.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Sepp Hochreiter
    • 1
  • Michael C. Mozer
    • 1
  1. 1.Department of Computer ScienceUniversity of ColoradoBoulder

Personalised recommendations