Abstract
Many domains of machine learning involve discovering dependencies and structure over time. In the most complex of domains, long-term temporal dependencies are present. Neural network models such as lstm have been developed to deal with long-term dependencies, but the continuous nature of neural networks is not well suited to discrete symbol processing tasks. Further, the mathematical underpinnings of neural networks are unclear, and gradient descent learning of recurrent neural networks seems particularly susceptible to local optima. We introduce a novel architecture for discovering dependencies in time. The architecture is formed by combining two variants of a hidden Markov model (HMM) - the factorial HMM and the input-output HMM - and adding a further strong constraint that requires the model to behave as a latch-and-store memory (the same constraint exploited in lstm). This model, called an miofhmm, can learn structure that other variants of the hmm cannot, and can generalize better than lstm on test sequences that have different statistical properties (different lengths, different types of noise) than training sequences. However, the miofhmm is slower to train and is more susceptible to local optima than LSTM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
L. E. Baum. An inequality and associated maximization technique in statistical estimation for probabilistic functions of a Markov process. Inequalities, 3:1–8, 1972.
Y. Bengio and P. Frasconi. Diffusion of context and credit information in markovian models. Journal of Artificial Intelligence Research, 3:249–270, 1995.
Y. Bengio and P. Frasconi. An input output HMM architecture. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 427–434. MIT Press, Cambridge MA, 1995.
Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. on Neural Networks, 5(2):157–166, 1994.
Z. Ghahramani and M. I. Jordan. Factorial hidden markov models. Machine Learning. 29:245, 1997.
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
S. Hochreiter and J. Schmidhuber. LSTM can solve hard long time lag problems. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 473–479. MIT Press, Cambridge MA, 1997.
M. C. Mozer. Induction of multiscale temporal structure. In J. E. Moody, S. J. Hanson, and R. P. Lippman, editors, Advances in Neural Information Processing Systems 4, pages 275–282. San Mateo, CA: Morgan Kaufmann, 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hochreiter, S., Mozer, M.C. (2001). A Discrete Probabilistic Memory Model for Discovering Dependencies in Time. In: Dorffner, G., Bischof, H., Hornik, K. (eds) Artificial Neural Networks — ICANN 2001. ICANN 2001. Lecture Notes in Computer Science, vol 2130. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44668-0_92
Download citation
DOI: https://doi.org/10.1007/3-540-44668-0_92
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42486-4
Online ISBN: 978-3-540-44668-2
eBook Packages: Springer Book Archive
