Unsupervised Learning in LSTM Recurrent Neural Networks

  • Magdalena Klapper-Rybicka
  • Nicol N. Schraudolph
  • Jürgen Schmidhuber
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2130)


While much work has been done on unsupervised learning in feedforward neural network architectures, its potential with (theoretically more powerful) recurrent networks and time-varying inputs has rarely been explored. Here we train Long Short-Term Memory (LSTM) recurrent networks to maximize two information-theoretic objectives for unsupervised learning: Binary Information Gain Optimization (BINGO) and Nonparametric Entropy Optimization (NEO). LSTM learns to discriminate different types of temporal sequences and group them according to a variety of features.


Recurrent Neural Network Unsupervised Learning Network Output Memory Block Recurrent Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    S. Lindstädt, “Comparison of unsupervised neural networks for redundancy reduction,” Master’s thesis, University of Colorado at Boulder, 1993.Google Scholar
  2. 2.
    P. Tiňo, M. Stancik, and L. Beňušková, “Building predictive models on complex symbolic sequences with a second-order recurrent BCM network with lateral inhibition,” in Proc. Int. Joint Conf. Neural Networks, vol. 2, pp. 265–270, 2000.Google Scholar
  3. 3.
    A. J. Robinson and F. Fallside, “The utility driven dynamic error propagation network,” Tech. Rep. CUED/F-INFENG/TR. 1, Cambridge University Engineering Department, 1987.Google Scholar
  4. 4.
    P. J. Werbos, “Generalization of backpropagation with application to a recurrent gas market model,” Neural Networks, vol. 1, 1988.Google Scholar
  5. 5.
    R. J. Williams and D. Zipser, “A learning algorithm for continually running fully recurrent networks,” Tech. Rep. ICS 8805, Univ. of California, San Diego, 1988.Google Scholar
  6. 6.
    S. Hochreiter, “ Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut für Informatik, Lehrstuhl Prof. Brauer, Technische Universität München,” 1991. See
  7. 7.
    Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157–166, 1994.CrossRefGoogle Scholar
  8. 8.
    S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.CrossRefGoogle Scholar
  9. 9.
    F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Continual prediction with LSTM,” Neural Computation, vol. 12, no. 10, pp. 2451–2471, 2000.CrossRefGoogle Scholar
  10. 10.
    N. N. Schraudolph and T. J. Sejnowski, “Unsupervised discrimination of clustered data via optimization of binary information gain,” in Advances in Neural Information Processing Systems (S. J Hanson, J. D. Cowan, and C. L. Giles, eds.), vol. 5, pp. 499–506, Morgan Kaufmann, San Mateo, CA, 1993.Google Scholar
  11. 11.
    N. N. Schraudolph, Optimization of Entropy with Neural Networks. PhD thesis, University of California, San Diego, 1995.Google Scholar
  12. 12.
    P. A. Viola, N. N. Schraudolph, and T. J. Sejnowski, “Empirical entropy manipulation for real-world problems,” in Advances in Neural Information Processing Systems (D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, eds.), vol. 8, pp. 851–857, The MIT Press, Cambridge, MA, 1996.Google Scholar
  13. 13.
    F. A. Gers and J. Schmidhuber, “Long Short-Term Memory learns simple context free and context sensitive languages,” IEEE Transactions on Neural Networks, 2001 (forthcoming).Google Scholar
  14. 14.
    F. A. Gers and J. Schmidhuber, “Recurrent nets that time and count,” in Proc. IJCNN’2000, Int. Joint Conf. on Neural Networks, IEEE Computer Society, 2000.Google Scholar
  15. 15.
    G. W. Flake, “Square unit augmented, radially extended, multilayer perceptrons,” in Neural Networks: Tricks of the Trade (G. B. Orr & K.-R. Müller, eds.), vol. 1524 of Lecture Notes in Computer Science, pp. 145–163, Berlin: Springer Verlag, 1998.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Magdalena Klapper-Rybicka
    • 1
  • Nicol N. Schraudolph
    • 2
  • Jürgen Schmidhuber
    • 3
  1. 1.Institute of Computer ScienceUniversity of Mining and MetallurgyKrakówPoland
  2. 2.Institute of Computational SciencesEidgenössische Technische Hochschule (ETH)ZürichSwitzerland
  3. 3.Istituto Dalle Molle di Studi sull’Intelligenza Artificiale (IDSIA)MannoSwitzerland

Personalised recommendations