Abstract
In this paper a novel recurrent neural network (RNN) model for gradient-based sequence learning is introduced. The presented dynamic cortex memory (DCM) is an extension of the well-known long short term memory (LSTM) model. The main innovation of the DCM is the enhancement of the inner interplay of the gates and the error carousel due to several new and trainable connections. These connections enable a direct signal transfer from the gates to one another. With this novel enhancement the networks are able to converge faster during training with back-propagation through time (BPTT) than LSTM under the same training conditions. Furthermore, DCMs yield better generalization results than LSTMs. This behaviour is shown for different supervised problem scenarios, including storing precise values, adding and learning a context-sensitive grammar.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bayer, J., Wierstra, D., Togelius, J., Schmidhuber, J.: Evolving memory cell structures for sequence learning. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009, Part II. LNCS, vol. 5769, pp. 755–764. Springer, Heidelberg (2009)
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: Continual prediction with LSTM. Neural Computation 12, 2451–2471 (1999)
Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Ph.D. thesis, Technische Universitaet Muenchen (2008)
Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional lstm. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE (2013)
Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(5), 855–868 (2009)
Hochreiter, S., Schmidhuber, J.: Long Short-Term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hochreiter, S., Schmidhuber, J.: LSTM can solve hard long time lag problems. Advances in Neural Information Processing Systems 9, 473–479 (1997)
Otte, S., Krechel, D., Liwicki, M.: JANNLab neural network framework for java. In: MLDM 2013, pp. 39–46. Ibai-Publishing, New York (2013)
Otte, S., Otte, C., Schlaefer, A., Wittig, L., Hüttmann, G., Drömann, D., Zell, A.: A-Scan based lung tumor tissue classification with bidirectional long short term memory networks. In: 2013 IEEE International Workshop on Machine Learning for Signal Processing, MLSP (2013)
Ul-Hasan, A., Breuel, T.M.: Can we build language-independent OCR using LSTM networks? In: Proceedings of the 4th International Workshop on Multilingual OCR, p. 9. ACM (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Otte, S., Liwicki, M., Zell, A. (2014). Dynamic Cortex Memory: Enhancing Recurrent Neural Networks for Gradient-Based Sequence Learning. In: Wermter, S., et al. Artificial Neural Networks and Machine Learning – ICANN 2014. ICANN 2014. Lecture Notes in Computer Science, vol 8681. Springer, Cham. https://doi.org/10.1007/978-3-319-11179-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-11179-7_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11178-0
Online ISBN: 978-3-319-11179-7
eBook Packages: Computer ScienceComputer Science (R0)