Advertisement

Dynamic Cortex Memory: Enhancing Recurrent Neural Networks for Gradient-Based Sequence Learning

  • Sebastian Otte
  • Marcus Liwicki
  • Andreas Zell
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8681)

Abstract

In this paper a novel recurrent neural network (RNN) model for gradient-based sequence learning is introduced. The presented dynamic cortex memory (DCM) is an extension of the well-known long short term memory (LSTM) model. The main innovation of the DCM is the enhancement of the inner interplay of the gates and the error carousel due to several new and trainable connections. These connections enable a direct signal transfer from the gates to one another. With this novel enhancement the networks are able to converge faster during training with back-propagation through time (BPTT) than LSTM under the same training conditions. Furthermore, DCMs yield better generalization results than LSTMs. This behaviour is shown for different supervised problem scenarios, including storing precise values, adding and learning a context-sensitive grammar.

Keywords

Dynamic Cortex Memory (DCM) Recurrent Neural Networks (RNN) Neural Networks Long Short Term Memory (LSTM) 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bayer, J., Wierstra, D., Togelius, J., Schmidhuber, J.: Evolving memory cell structures for sequence learning. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009, Part II. LNCS, vol. 5769, pp. 755–764. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  2. 2.
    Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: Continual prediction with LSTM. Neural Computation 12, 2451–2471 (1999)CrossRefGoogle Scholar
  3. 3.
    Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Ph.D. thesis, Technische Universitaet Muenchen (2008)Google Scholar
  4. 4.
    Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional lstm. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE (2013)Google Scholar
  5. 5.
    Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(5), 855–868 (2009)CrossRefGoogle Scholar
  6. 6.
    Hochreiter, S., Schmidhuber, J.: Long Short-Term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  7. 7.
    Hochreiter, S., Schmidhuber, J.: LSTM can solve hard long time lag problems. Advances in Neural Information Processing Systems 9, 473–479 (1997)Google Scholar
  8. 8.
    Otte, S., Krechel, D., Liwicki, M.: JANNLab neural network framework for java. In: MLDM 2013, pp. 39–46. Ibai-Publishing, New York (2013)Google Scholar
  9. 9.
    Otte, S., Otte, C., Schlaefer, A., Wittig, L., Hüttmann, G., Drömann, D., Zell, A.: A-Scan based lung tumor tissue classification with bidirectional long short term memory networks. In: 2013 IEEE International Workshop on Machine Learning for Signal Processing, MLSP (2013)Google Scholar
  10. 10.
    Ul-Hasan, A., Breuel, T.M.: Can we build language-independent OCR using LSTM networks? In: Proceedings of the 4th International Workshop on Multilingual OCR, p. 9. ACM (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Sebastian Otte
    • 1
  • Marcus Liwicki
    • 2
  • Andreas Zell
    • 1
  1. 1.Cognitive Systems GroupUniversity of TübingenTübingenGermany
  2. 2.German Research Center for Artificial IntelligenceKaiserslauternGermany

Personalised recommendations