Learning Context Sensitive Languages with LSTM Trained with Kalman Filters

  • Felix A. Gers
  • Juan Antonio Pérez-Ortiz
  • Douglas Eck
  • Jürgen Schmidhuber
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2415)


Unlike traditional recurrent neural networks, the Long Short-Term Memory (LSTM) model generalizes well when presented with training sequences derived from regular and also simple nonregular languages. Our novel combination of LSTM and the decoupled extended Kalman filter, however, learns even faster and generalizes even better, requiring only the 10 shortest exemplars (n ≤ 10) of the context sensitive language anbncn to deal correctly with values of n up to 1000 and more. Even when we consider the relatively high update complexity per timestep, in many cases the hybrid offers faster learning than LSTM by itself.


Gradient Descent Recurrent Neural Network Training Sequence Memory Block Input Gate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Boden, M., Wiles, J.: Context-free and context-sensitive dynamics in recurrent neural networks. Connection Science 12,3 (2000).Google Scholar
  2. 2.
    Chalup, S., Blair, A.: Hill climbing in recurrent neural networks for learning the anbn nn language. Proc. 6th Conf. on Neural Information Processing (1999) 508–513.Google Scholar
  3. 3.
    Gers, F. A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Computation 12,10 (2000) 2451–2471.CrossRefGoogle Scholar
  4. 4.
    Gers, F. A., Schmidhuber, J.: LSTM recurrent networks learn simple context free and context sensitive languages. IEEE Transactions on Neural Networks 12,6 (2001) 1333–1340.CrossRefGoogle Scholar
  5. 5.
    Haykin, S. (ed.): Kalman filtering and neural networks. Wiley (2001).Google Scholar
  6. 6.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9,8 (1997) 1735–1780.CrossRefGoogle Scholar
  7. 7.
    Puskorius, G. V., Feldkamp, L. A.: Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks. IEEE Transactions on Neural Networks 5,2 (1994) 279–297.CrossRefGoogle Scholar
  8. 8.
    Rodriguez, P., Wiles, J., Elman, J.: A recurrent neural network that learns to count. Connection Science 11,1 (1999) 5–40.CrossRefGoogle Scholar
  9. 9.
    Rodriguez, P., Wiles, J.: Recurrent neural networks can learn to implement symbol-sensitive counting. Advances in Neural Information Processing Systems 10 (1998) 87–93. The MIT Press.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Felix A. Gers
    • 1
  • Juan Antonio Pérez-Ortiz
    • 2
  • Douglas Eck
    • 3
  • Jürgen Schmidhuber
    • 3
  1. 1.Mantik Bioinformatik GmbHBerlinGermany
  2. 2.DLSIUniversitat d’AlacantAlacantSpain
  3. 3.IDSIAMannoSwitzerland

Personalised recommendations