Unsupervised Learning in LSTM Recurrent Neural Networks
While much work has been done on unsupervised learning in feedforward neural network architectures, its potential with (theoretically more powerful) recurrent networks and time-varying inputs has rarely been explored. Here we train Long Short-Term Memory (LSTM) recurrent networks to maximize two information-theoretic objectives for unsupervised learning: Binary Information Gain Optimization (BINGO) and Nonparametric Entropy Optimization (NEO). LSTM learns to discriminate different types of temporal sequences and group them according to a variety of features.
KeywordsRecurrent Neural Network Unsupervised Learning Network Output Memory Block Recurrent Network
Unable to display preview. Download preview PDF.
- 1.S. Lindstädt, “Comparison of unsupervised neural networks for redundancy reduction,” Master’s thesis, University of Colorado at Boulder, 1993.Google Scholar
- 2.P. Tiňo, M. Stancik, and L. Beňušková, “Building predictive models on complex symbolic sequences with a second-order recurrent BCM network with lateral inhibition,” in Proc. Int. Joint Conf. Neural Networks, vol. 2, pp. 265–270, 2000.Google Scholar
- 3.A. J. Robinson and F. Fallside, “The utility driven dynamic error propagation network,” Tech. Rep. CUED/F-INFENG/TR. 1, Cambridge University Engineering Department, 1987.Google Scholar
- 4.P. J. Werbos, “Generalization of backpropagation with application to a recurrent gas market model,” Neural Networks, vol. 1, 1988.Google Scholar
- 5.R. J. Williams and D. Zipser, “A learning algorithm for continually running fully recurrent networks,” Tech. Rep. ICS 8805, Univ. of California, San Diego, 1988.Google Scholar
- 6.S. Hochreiter, “ Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut für Informatik, Lehrstuhl Prof. Brauer, Technische Universität München,” 1991. See http://www.7.informatik.tu-muenchen.de/~hochreit.
- 10.N. N. Schraudolph and T. J. Sejnowski, “Unsupervised discrimination of clustered data via optimization of binary information gain,” in Advances in Neural Information Processing Systems (S. J Hanson, J. D. Cowan, and C. L. Giles, eds.), vol. 5, pp. 499–506, Morgan Kaufmann, San Mateo, CA, 1993.Google Scholar
- 11.N. N. Schraudolph, Optimization of Entropy with Neural Networks. PhD thesis, University of California, San Diego, 1995.Google Scholar
- 12.P. A. Viola, N. N. Schraudolph, and T. J. Sejnowski, “Empirical entropy manipulation for real-world problems,” in Advances in Neural Information Processing Systems (D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, eds.), vol. 8, pp. 851–857, The MIT Press, Cambridge, MA, 1996.Google Scholar
- 13.F. A. Gers and J. Schmidhuber, “Long Short-Term Memory learns simple context free and context sensitive languages,” IEEE Transactions on Neural Networks, 2001 (forthcoming).Google Scholar
- 14.F. A. Gers and J. Schmidhuber, “Recurrent nets that time and count,” in Proc. IJCNN’2000, Int. Joint Conf. on Neural Networks, IEEE Computer Society, 2000.Google Scholar