Biologically Plausible Speech Recognition with LSTM Neural Nets

Graves, Alex; Eck, Douglas; Beringer, Nicole; Schmidhuber, Juergen

doi:10.1007/978-3-540-27835-1_10

Alex Graves¹⁸,
Douglas Eck¹⁸,
Nicole Beringer¹⁸ &
…
Juergen Schmidhuber¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3141))

Included in the following conference series:

International Workshop on Biologically Inspired Approaches to Advanced Information Technology

996 Accesses
43 Citations

Abstract

Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) are local in space and time and closely related to a biological model of memory in the prefrontal cortex. Not only are they more biologically plausible than previous artificial RNNs, they also outperformed them on many artificially generated sequential processing tasks. This encouraged us to apply LSTM to more realistic problems, such as the recognition of spoken digits. Without any modification of the underlying algorithm, we achieved results comparable to state-of-the-art Hidden Markov Model (HMM) based recognisers on both the TIDIGITS and TI46 speech corpora. We conclude that LSTM should be further investigated as a biologically plausible basis for a bottom-up, neural net-based approach to speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Williams, R.J., Zipser, D.: Gradient-based learning algorithms for recurrent networks and their computational complexity. In: Chauvin, Y., Rumelhart, D.E. (eds.) Back-propagation: Theory, Architectures and Applications, pp. 433–486. Lawrence Erlbaum Publishers, Hillsdale (1995)
Google Scholar
Werbos, P.J.: Generalization of backpropagation with application to a recurrent gas market model. Neural Networks 1 (1988)
Google Scholar
Robinson, A.J., Fallside, F.: The utility driven dynamic error propagation network. Technical Report CUED/F-INFENG/TR.1, Cambridge University Engineering Department (1987)
Google Scholar
Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press, Los Alamitos (2001)
Google Scholar
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989)
Article Google Scholar
Bourlard, H., Morgan, N.: Connnectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, Dordrecht (1994)
Google Scholar
Robinson, A.J.: An application of recurrent nets to phone probability estimation. IEEE Transactions on Neural Networks 5, 298–305 (1994)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. Neural Computation 9, 1735–1780 (1997)
Article Google Scholar
Gers, F.: Long Short-Term Memory in Recurrent Neural Networks. PhD thesis (2001)
Google Scholar
O’Reilly, R.: Making working memory work: A computational model of learning in the prefrontal cortex and basal ganglia. Technical Report ICS-03-03, ICS (2003)
Google Scholar
Gers, F.A., Schmidhuber, J.: LSTM recurrent networks learn simple context free and context sensitive languages. IEEE Transactions on Neural Networks 12, 1333–1340 (2001)
Article Google Scholar
Eck, D., Schmidhuber, J.: Finding temporal structure in music: Blues improvisation with LSTM recurrent networks. In: Bourlard, H. (ed.) Proceedings of the 2002 IEEE Workshop in Neural Networks for Signal Processing XII, pp. 747–756. IEEE, New York (2002)
Chapter Google Scholar
Young, S.: The HTK Book. Cambridge University Press, Cambridge (1995/1996)
Google Scholar
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Inc., Oxford (1995)
Google Scholar
Plaut, D.C., Nowlan, S.J., Hinton, G.E.: Experiments on learning back propagation. Technical Report CMU–CS–86–126, Carnegie–Mellon University, Pittsburgh, PA (1986)
Google Scholar
Zheng, F., Picone, J.: Robust low perplexity voice interfaces. Technical report, MITRE Corporation (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Istituto Dalle Molle di Studi sull’Intelligenza Artificiale, Galleria 2, 6928, Manno-Lugano, Switzerland
Alex Graves, Douglas Eck, Nicole Beringer & Juergen Schmidhuber

Authors

Alex Graves
View author publications
You can also search for this author in PubMed Google Scholar
Douglas Eck
View author publications
You can also search for this author in PubMed Google Scholar
Nicole Beringer
View author publications
You can also search for this author in PubMed Google Scholar
Juergen Schmidhuber
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Biologically Inspired Robotic Group (BIRG), Ecole Polytechnique Fédérale de Lausanne (EPFL), Station 14, CH-1015, Lausanne, Switzerland
Auke Jan Ijspeert
Graduate School of Information Science and Technology, Osaka University, 1-5 Yamadaoka, 565-0871, Suita, Osaka, Japan
Masayuki Murata & Naoki Wakamiya &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Graves, A., Eck, D., Beringer, N., Schmidhuber, J. (2004). Biologically Plausible Speech Recognition with LSTM Neural Nets. In: Ijspeert, A.J., Murata, M., Wakamiya, N. (eds) Biologically Inspired Approaches to Advanced Information Technology. BioADIT 2004. Lecture Notes in Computer Science, vol 3141. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27835-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-27835-1_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23339-8
Online ISBN: 978-3-540-27835-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics